AgentforceRAG
Architecture and ImplementationBest
Practices
Nicolas Lebel
2025-10-21, Salesforce Architecture User Group, Montreal
RAG definition
The process of optimizingthe output of a large languagemodel, soitreferencesan authoritative
knowledgebase outsideof itstraining data sources beforegeneratinga response. Large Language
Models(LLMs) are trainedon vastvolumes of data and use billions of parametersto generate
original output for taskslike answeringquestions, translatinglanguages, and completing
sentences. RAG extendsthe alreadypowerfulcapabilitiesof LLMsto specificdomainsor an
organization'sinternalknowledgebase, all withoutthe needto retrainthe model.
Source: https://aws.amazon.com/what-is/retrieval-augmented-generation/
Manydifferentcompanycontextswhenitcomesto documentation
○Accumulatedcontent in multiple sources
○Mature Salesforce Knowledgeusers
○Differentdocuments types(PDFs, Word docs, HTML) or Salesforce records, or a combination?
○Document content: FAQ, manuals, Conversations, Case resolutions, etc…
○Type of audience for yourcontent
The AgentforceAgent Flow
RAG architecture: All about Offline Preparation
Initial content optimizations
User manualsand theirlimits
●Typicaluser manual, the semanticmatch betweenquestions and content islessclear.
●If youcan authoryourcontent withQ&A, itwillgetvectorizedand leveragedby the LLM for Context.
●Lots of structureddata insideuser manuals(thinkof tables). Need to givecontextof the information.
○Withoutthe use of TabularEmbeddingModels, embeddingtabularinformation isa nightmare.
○Right now, withAgentforceand Data 360 (DataCloud), the recommendationisto extracttables intoJSON or HTML files.
●Content governanceneedsto beimplemented
○LeverageCMS, training, productteams to makesure content isup to date
○A properlymanagedSalesforce KnowledgeBase ismore manageablethan5000 PDFssittingin a folder.
Content relevance and reliability
Load Chunk Vectorize Index Retrieve
Section awarechunking
Load Chunk Vectorize Index Retrieve
●Default mode for embeddedPDF and HTML files whenusingData 360 advancedbuilder
●HTML tags(H1-H6) strippedoff by default
●Max token: whendocuments have short paragraphsor listitems
●Overlaptokens: Contextwindowaroundthe chunk
Semantic-basedPassage Extraction
Load Chunk Vectorize Index Retrieve
Conversation basedchunking
Load Chunk Vectorize Index Retrieve
PrependField Chunking
Use additionalfieldsor metadatato providecontextfor a chunk
512 tokenlimit
Ex: Description fieldon Knowledgearticle
How to use prepending
1.Access data preparationsettings:Navigateto the settings for your
unstructureddata in Salesforce Data Cloud.
2.Select a chunkingstrategy:Choosethe appropriatechunkingstrategybasedon
yourdata.
3.Enable fieldprepending:Select the "Prependfieldsto eachchunk" option.
4.Addfields:Select the fieldsyouwantto prepend.For example, youcouldselect
"ProductName" and "ProductSKU" to addto eachchunkof productdescription
data.
5.Proceedwithvectorization:Continue the process by selectingan embedding
model for vectorizationand indexing.
Load Chunk Vectorize Index Retrieve
Enrichedindexingor chunking
●Especiallyusefulin UDMOswherefieldprependingisnot possible
●Plain, Question, Metadatachunktypes
●Chunk enrichmentincreasescostand latency
●Open AI ADA 002 needto beused
oNot supportedon MultilingualE5-Large or E5-Large V2
Load Chunk Vectorize Index Retrieve
ContextIndexing(justreleased)
●Be able to analyse how the chunkinggetsexecutedfor one specificdocument
Differentoptions for RAG in Agentforce
●AgentforceService Agent / Agentforcefor Employees
○Answerquestions withKnowledgeUsingADL
■Knowledgearticles
■Uploadedfiles
○Use custom prompt templatewithADL
○Use custom prompt templatewithcustom connector
Load Chunk Vectorize Index Retrieve
The limitsof AgentforceData Library (ADL)
●SearchIndex
○512 tokensper chunk
○E5 Large Multilingualembeddingmodel
○Hybridsearchmode
○No enrichedchunkingfor now
●Retriever (1 per ADL withfilteron Source Id)
○Returns10 resultsby default
○Advanced retrievalmode isOFF
Load Chunk Vectorize Index Retrieve
The resurganceof KnowledgeArticles for RAG
●Whenchunkingand vectorizingSalesforce knowledgearticles, the searchindex isbuilt
againsta structuredDMO. Takeadvantageof thisstructure by spreadinglong-textcontent
acrossmultiple fields, suchas Question, Description, Resolution, and Exceptions. Annotate
the knowledgearticle withmetadatafor filteringand prepending
Load Chunk Vectorize Index Retrieve
Ensemble Retriever
●Combine multiple retrievers to dynamicallyreranksearchresultsfor a sameprompt template.
●Onlyone retriever isusedto groundthe prompt template.
●The mostrelevant queryresultsare surfacedat the top of the ranking
●No irrelevantresultsare addedto the prompt.
●The prompt consumes fewerEinstein Requests, whichreduceslatencyand cost.
Load Chunk Vectorize Index RetrieveLoad Chunk Vectorize Index Retrieve
Multi-languageuse cases
●You couldembedcontent in manydifferentsupportedlanguagesand have the usersprompt in otherlanguages
●Need to thinkof
○Trust layer: PII information configuration
○Supportedlanguagesin the Embeddingmodel
○LLM model supportedlanguages(bothfor input and output).
○AgentforceLanguages
○Reasoningengine Languages
Load Chunk Vectorize Index Retrieve
Key links
Both authored by Reiniervan Leuken,Senior Directorof Product
Management-Agentforce
https://www.salesforce.com/agentforce/agentforce-and-rag/
https://www.salesforce.com/plus/experience/tdx_london_2025/series
/tdx_2025_london_highlights/episode/episode-s1e5