Agentforce RAG (Retrieval-Augmented Generation): Architecture and Implementation Best Practices - Oct 21, 2025

varshatiwari72 2 views 23 slides Oct 24, 2025
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

Key Takeaways To Expect At The End Of This Session:





Design smarter embeddings — why custom chunking strategies can make or break accuracy when handling complex technical documents.



Optimize retrieval performance — how hybrid search (keyword + semantic) outperforms pure semantic search wh...


Slide Content

Trailblazer Community
Template2024
Last Edited: 4/23/2024

Varsha Tiwari
Senior Principal,
Client Delivery Enterprise
Nicolas Lebel
Senior Principal,
Client Delivery Enterprise

AgentforceRAG
Architecture and ImplementationBest
Practices
Nicolas Lebel
2025-10-21, Salesforce Architecture User Group, Montreal

RAG definition
The process of optimizingthe output of a large languagemodel, soitreferencesan authoritative
knowledgebase outsideof itstraining data sources beforegeneratinga response. Large Language
Models(LLMs) are trainedon vastvolumes of data and use billions of parametersto generate
original output for taskslike answeringquestions, translatinglanguages, and completing
sentences. RAG extendsthe alreadypowerfulcapabilitiesof LLMsto specificdomainsor an
organization'sinternalknowledgebase, all withoutthe needto retrainthe model.
Source: https://aws.amazon.com/what-is/retrieval-augmented-generation/

Manydifferentcompanycontextswhenitcomesto documentation
○Accumulatedcontent in multiple sources
○Mature Salesforce Knowledgeusers
○Differentdocuments types(PDFs, Word docs, HTML) or Salesforce records, or a combination?
○Document content: FAQ, manuals, Conversations, Case resolutions, etc…
○Type of audience for yourcontent

The AgentforceAgent Flow

RAG architecture: All about Offline Preparation

Initial content optimizations
User manualsand theirlimits
●Typicaluser manual, the semanticmatch betweenquestions and content islessclear.
●If youcan authoryourcontent withQ&A, itwillgetvectorizedand leveragedby the LLM for Context.
●Lots of structureddata insideuser manuals(thinkof tables). Need to givecontextof the information.
○Withoutthe use of TabularEmbeddingModels, embeddingtabularinformation isa nightmare.
○Right now, withAgentforceand Data 360 (DataCloud), the recommendationisto extracttables intoJSON or HTML files.
●Content governanceneedsto beimplemented
○LeverageCMS, training, productteams to makesure content isup to date
○A properlymanagedSalesforce KnowledgeBase ismore manageablethan5000 PDFssittingin a folder.
Content relevance and reliability
Load Chunk Vectorize Index Retrieve

Chunkingstrategies
●Section-AwareChunking
●Semantic-basedPassage Extraction
●Conversation-basedChunking
●PrependField Chunking
Load Chunk Vectorize Index Retrieve

Section awarechunking
Load Chunk Vectorize Index Retrieve
●Default mode for embeddedPDF and HTML files whenusingData 360 advancedbuilder
●HTML tags(H1-H6) strippedoff by default
●Max token: whendocuments have short paragraphsor listitems
●Overlaptokens: Contextwindowaroundthe chunk

Semantic-basedPassage Extraction
Load Chunk Vectorize Index Retrieve

Conversation basedchunking
Load Chunk Vectorize Index Retrieve

PrependField Chunking
Use additionalfieldsor metadatato providecontextfor a chunk
512 tokenlimit
Ex: Description fieldon Knowledgearticle
How to use prepending
1.Access data preparationsettings:Navigateto the settings for your
unstructureddata in Salesforce Data Cloud.
2.Select a chunkingstrategy:Choosethe appropriatechunkingstrategybasedon
yourdata.
3.Enable fieldprepending:Select the "Prependfieldsto eachchunk" option.
4.Addfields:Select the fieldsyouwantto prepend.For example, youcouldselect
"ProductName" and "ProductSKU" to addto eachchunkof productdescription
data.
5.Proceedwithvectorization:Continue the process by selectingan embedding
model for vectorizationand indexing.
Load Chunk Vectorize Index Retrieve

Enrichedindexingor chunking
●Especiallyusefulin UDMOswherefieldprependingisnot possible
●Plain, Question, Metadatachunktypes
●Chunk enrichmentincreasescostand latency
●Open AI ADA 002 needto beused
oNot supportedon MultilingualE5-Large or E5-Large V2
Load Chunk Vectorize Index Retrieve
ContextIndexing(justreleased)
●Be able to analyse how the chunkinggetsexecutedfor one specificdocument

Differentoptions for RAG in Agentforce
●AgentforceService Agent / Agentforcefor Employees
○Answerquestions withKnowledgeUsingADL
■Knowledgearticles
■Uploadedfiles
○Use custom prompt templatewithADL
○Use custom prompt templatewithcustom connector
Load Chunk Vectorize Index Retrieve

The limitsof AgentforceData Library (ADL)
●SearchIndex
○512 tokensper chunk
○E5 Large Multilingualembeddingmodel
○Hybridsearchmode
○No enrichedchunkingfor now
●Retriever (1 per ADL withfilteron Source Id)
○Returns10 resultsby default
○Advanced retrievalmode isOFF
Load Chunk Vectorize Index Retrieve

The resurganceof KnowledgeArticles for RAG
●Whenchunkingand vectorizingSalesforce knowledgearticles, the searchindex isbuilt
againsta structuredDMO. Takeadvantageof thisstructure by spreadinglong-textcontent
acrossmultiple fields, suchas Question, Description, Resolution, and Exceptions. Annotate
the knowledgearticle withmetadatafor filteringand prepending
Load Chunk Vectorize Index Retrieve

Ensemble Retriever
●Combine multiple retrievers to dynamicallyreranksearchresultsfor a sameprompt template.
●Onlyone retriever isusedto groundthe prompt template.
●The mostrelevant queryresultsare surfacedat the top of the ranking
●No irrelevantresultsare addedto the prompt.
●The prompt consumes fewerEinstein Requests, whichreduceslatencyand cost.
Load Chunk Vectorize Index RetrieveLoad Chunk Vectorize Index Retrieve

Multi-languageuse cases
●You couldembedcontent in manydifferentsupportedlanguagesand have the usersprompt in otherlanguages
●Need to thinkof
○Trust layer: PII information configuration
○Supportedlanguagesin the Embeddingmodel
○LLM model supportedlanguages(bothfor input and output).
○AgentforceLanguages
○Reasoningengine Languages
Load Chunk Vectorize Index Retrieve

Key links
Both authored by Reiniervan Leuken,Senior Directorof Product
Management-Agentforce
https://www.salesforce.com/agentforce/agentforce-and-rag/
https://www.salesforce.com/plus/experience/tdx_london_2025/series
/tdx_2025_london_highlights/episode/episode-s1e5

Q&A

Quick Survey