introductiontoragretrievalaugmentedgenerationanditsapplication-240312101523-64ccae07.pdf

nikkishrija 236 views 26 slides Jul 31, 2024
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

Software Release System
is similar to a customer service project, with explicit customer value. Internally, for the product department, it functions more like "plumbing," akin to a telecom service, where people only care if it works smoothly, lacking other values and limited in driving cul...


Slide Content

Introduction to RAG
and It's Application
Presented By: Aayush Srivastava

Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
▪Punctuality
Join the session 5 minutes prior to the session start time. We start on
time and conclude on time!
▪Feedback
Make sure to submit a constructive feedback for all sessions as it is very
helpful for the presenter.
▪Silent Mode
Keep your mobile devices in silent mode, feel free to move out of session
in case you need to attend an urgent call.
▪Avoid Disturbance
Avoid unwanted chit chat during the session.

Agenda
1.Introduction
▪What is LLM
▪What is RAG?
2.LLM And It's Limitation
▪WhyRAG is important?
3.RAGArchitecture
4.How Does RAG Work
5.RAG Vs Fine-Tuning
6.Benefits Of RAG
7.Applications
8.Demo

Introduction

What is LLM
•Alargelanguagemodel(LLM)isatypeofartificialintelligenceprogramthatcanrecognizeandgeneratetext,amongother
tasks.
•LLMareverylargemodelsthatarepre-trainedonvastamountsofdata.
•Builtontransformerarchitectureisasetofneuralnetworkthatconsistofanencoderandadecoderwithself-attention
capabilities.
•Itcanperformcompletelydifferenttaskssuchasansweringquestions,summarizingdocuments,translatinglanguagesand
completingsentences.
Open AI's GPT-3 model has 175 billion parameters. Also it can take inputs up to 100K tokens in each prompt

What is LLM
•Insimplerterms,anLLMisacomputerprogramthathasbeenfedenoughexamplestobeabletorecognizeandinterpret
humanlanguageorothertypesofcomplexdata.
•QualityofthesamplesimpactshowwellLLMswilllearnnaturallanguage,soanLLM'sprogrammersmayuseamore
curateddataset.

LLM's And It's Limitations
•NotUpdatedtothelatestinformation:GenerativeAIuseslargelanguagemodelstogeneratetextsandthesemodels
haveinformationonlytodatetheyaretrained.Ifdataisrequestedbeyondthatdate,accuracy/outputmaybecompromised.
•Hallucinations:Hallucinationsrefertotheoutputwhichisfactuallyincorrectornonsensical.However,theoutputlooks
coherentandgrammaticallycorrect.Thisinformationcouldbemisleadingandcouldhaveamajorimpactonbusiness
decision-making.
•Domain-specificmostaccurateinformation:LLM'soutputlacksaccurateinformationmanytimeswhenspecificityis
moreimportantthangeneralizedoutput.Forinstance,organizationalHRpoliciestailoredtospecificemployeesmaynotbe
accuratelyaddressedbyLLM-basedAIduetoitstendencytowardsgenericresponses.
•SourceCitations:InGenerativeAIresponses,wedon’tknowwhatsourceitisreferringtogenerateaparticularresponse.
Socitationsbecomedifficultandsometimesitisnotethicallycorrecttonotcitethesourceofinformationandgivedue
credit.
•UpdatestakeLongtrainingtime:informationischangingveryfrequentlyandifyouthinktore-trainthosemodelswith
newinformationitrequireshugeresourcesandlongtrainingtimewhichisacomputationallyintensivetask.
•Presentingfalseinformationwhenitdoesnothavetheanswer.

What is RAG?
•RAGstandsforRetrieval-AugmentedGeneration
•It'sanadvancedtechniqueusedinLargeLanguageModels(LLMs)
•RAGcombinesretrievalandgenerationprocessestoenhancethecapabilitiesofLLMs
•InRAG,themodelretrievesrelevantinformationfromaknowledgebaseorexternalsources
•Thisretrievedinformationisthenusedinconjunctionwiththemodel'sinternalknowledgetogeneratecoherentand
contextuallyrelevantresponses
•RAGenablesLLMstoproducehigher-qualityandmorecontext-awareoutputscomparedtotraditionalgenerationmethods
•Essentially,RAGempowersLLMstoleverageexternalknowledgeforimprovedperformanceinvariousnaturallanguage
processingtasks
RetrievalAugmentedGeneration(RAG)isanadvancedartificialintelligence(AI)techniquethatcombines
informationretrievalwithtextgeneration,allowingAImodelstoretrieverelevantinformationfromaknowledge
sourceandincorporateitintogeneratedtext.

Why is Retrieval-Augmented Generation important
▪You can think of the LLM as an over-enthusiastic new employee who refuses to stay informed with current events but
willalways answer every question with absolute confidence.
▪Unfortunately, such an attitude can negatively impact user trust and is not something you want your chatbots to emulate!
▪RAG is one approach to solving some of these challenges. It redirects the LLM to retrieve relevant information from
authoritative, pre-determined knowledge sources.
▪Organizations have greater control over the generated text output, and users gain insights into how the LLM generates the
response.

RAG Architecture

Generalized RAG Approach
Let'sdelveintoRAG'sframeworktounderstandhowitmitigatesthesechallenges.

How Does RAG Work?

Overview
•Retrieval Augmented Generation (RAG) can be likened to a detective and storyteller duo. Imagine you are trying to solve a
complex mystery. The detective's role is to gather clues, evidence, and historical records related to the case.
•Once the detective has compiled this information, the storyteller designs a compelling narrative that weaves together the facts and
presents a coherent story. In the context of AI, RAG operates similarly.
•The Retriever Componentacts as the detective, scouring databases, documents, and knowledge sources for relevant information
and evidence. It compiles a comprehensive set of facts and data points.
•The Generator Componentassumes the role of the storyteller.Taking the collected information and transforming it into a
coherent and engaging narrative, presenting a clear and detailed account of the mystery, much like a detective novel author.
ThisanalogyillustrateshowRAGcombinestheinvestigativepowerofretrievalwiththecreativeskillsoftextgenerationtoproduce
informativeandengagingcontent,justasourdetectiveandstorytellerworktogethertounravelandpresentacompellingmystery.

RAG Components
•RAGisanAIframeworkthatallowsagenerativeAImodeltoaccessexternalinformationnotincludedinitstrainingdataor
modelparameterstoenhanceitsresponsestoprompts.
▪RAGseekstocombinethestrengthsofbothretrieval-basedandgenerativemethods.
▪Ittypicallyinvolvesusingaretrievercomponenttofetchrelevantpassagesordocumentsfromalargecorpusof
knowledge.
•Theretrievedinformationisthenusedtoaugmentthegenerativemodel’sunderstandingandimprovethequalityof
generatedresponses.
RAGComponents
•Retriever
▪Ranker
▪Generator
▪ExternalData
WhatisaPrompt.
Apromptistheinputprovidedbytheusertogeneratearesponse.Itcouldbeaquestion,astatement,oranytextthatserves
asthestartingpointforthemodeltogeneratearelevantandcoherentcontinuation

RAG Components
Let'sunderstandeachcomponentindetail
Externaldata
ThenewdataoutsideoftheLLM'soriginaltrainingdatasetiscalledexternaldata.Itcancomefrommultipledatasources,
suchasaAPIs,databases,ordocumentrepositories.Thedatamayexistinvariousformatslikefiles,databaserecords,or
long-formtext.
Vectorembeddings
▪MLmodelscannotinterpretinformationintelligiblyintheirrawformatandrequirenumericaldataasinput.Theyuseneural
networkembeddingstoconvertreal-wordinformationintonumericalrepresentationscalledvectors.
▪Vectorsarenumericalvaluesthatrepresentinformationinamulti-dimensionalspace
▪Embeddingvectorsencodenon-numericaldataintoaseriesofvaluesthatMLmodelscanunderstandandrelate.Example:
TheConference(Horror,2023,Movie) TheConference(1.2,2023,20.0)
TalesfromtheCrypt(Horror,1989,TVShow,Season7) TalesfromtheCrypt(1.2,1989,36.7)
▪Thefirstnumberinthevectorcorrespondstoaspecificgenre.AnMLmodelwouldfindthatTheConferenceandTalesfromthe
Cryptsharethesamegenre.Likewise,themodelwillfindmorerelationships

RAG Components
VectorDB
▪VectorDBisadatabasethatstoresembeddingsofwords,phrases,ordocumentsalongwiththeircorrespondingidentifiers.
▪Itallowsforfastandscalableretrievalofsimilaritemsbasedontheirvectorrepresentations.
▪VectorDBsenableefficientretrievalofrelevantinformationduringtheretrievalphaseofRAG,improvingthecontextual
relevanceandqualityofgeneratedresponses.
▪DataChunking:Beforetheretrievalmodelcansearchthroughthedata,it'stypicallydividedintomanageable"chunks"or
segments.
▪VectorDBExamples:Chroma,Pinecone,Weaviate,Elasticsearch

RAG Components
RAGRetriever
▪Thenextstepistoperformarelevancysearch.Theuserqueryisconvertedtoavectorrepresentationandmatchedwiththe
vectordatabases
▪Theretrievercomponentisresponsibleforefficientlyidentifyingandextractingrelevantinformationfromavastamountofdata.
Forexample,considerasmartchatbotforhumanresourcequestionsforanorganization.Ifanemployeesearches,"Howmuch
annualleavedoIhave?"thesystemwillretrieveannualleavepolicydocumentsalongsidetheindividualemployee'spastleave
record.Thesespecificdocumentswillbereturnedbecausetheyarehighly-relevanttowhattheemployeehasinput.Therelevancy
wascalculatedandestablishedusingmathematicalvectorcalculationsandrepresentations

RAG Components
RAGRanker
▪TheRAGrankercomponentrefinestheretrievedinformationbyassessingitsrelevanceandimportance.Itassignsscoresor
rankstotheretrieveddatapoints,helpingprioritizethemostrelevantones.
▪Theretrievercomponentisresponsibleforefficientlyidentifyingandextractingrelevantinformationfromavastamountofdata.
Forexample,considerasmartchatbotthatcananswerhumanresourcequestionsforanorganization.Ifanemployee
searches,"HowmuchannualleavedoIhave?"thesystemwillretrieveannualleavepolicydocumentsalongsidetheindividual
employee'spastleaverecord.
AugmenttheLLMprompt
▪Next,theRAGmodelaugmentstheuserinput(orprompts)byaddingtherelevantretrieveddataincontext.Thisstepuses
promptengineeringtechniquestocommunicateeffectivelywiththeLLM.
▪Theaugmentedpromptallowsthelargelanguagemodelstogenerateanaccurateanswertouserqueries.

RAG Components
RAGGenerator
▪TheRAGgeneratorcomponentisbasicallytheLLMModelsucha(GPT)
▪TheRAGgeneratorcomponentisresponsiblefortakingtheretrievedandrankedinformation,alongwiththeuser's
originalquery,andgeneratingthefinalresponseoroutput.
▪Thegeneratorensuresthattheresponsealignswiththeuser'squeryandincorporatesthefactualknowledgeretrievedfrom
externalsources.
Updateexternaldata
▪Tomaintaincurrentinformationforretrieval,asynchronouslyupdatethedocumentsandupdateembeddingrepresentationof
thedocuments.
▪Automated Real-time Processes: Updates to documents and embeddings occur in real-time as soon as new information
becomes available. This ensures that the system always reflects the most recent data.
▪Periodic Batch Processing: Updates are performed at regular intervals (e.g., daily, weekly) in batches. This approach may be
more efficient for systems with large volumes of data or where real-time updates are not necessary.

RAG Based ChatApplication
SimplifiedsequencediagramillustratingtheprocessofaRAGchatapplication
Step1-Usersendsquery:Theprocessbeginswhentheusersendsaqueryormessagetothechatapplication.

Understanding RAG Architecture
Step2-ChatAppforwardsquery:Uponreceivingtheuser'squery,thechatapplication(ChatApp)forwardsthisqueryto
theRetrievalAugmentedGeneration(RAG)modelforprocessing.
Step3-RAGretrieves+generatesresponse:TheRAGmodel,whichintegratesretrievalandgenerationcapabilities,processes
theuser'squery.Itfirstretrievesrelevantinformationfromalargecorpusofdata,usingtheLLMtogenerateacoherent
andcontextuallyrelevantresponsebasedontheretrievedinformationandtheuser'squery.
Step4-LLMreturnsresponse:Oncetheresponseisgenerated,theLLMsendsitbacktothechatapplication(ChatApp).
Step5-ChatAppdisplaysresponses:Finally,thechatapplicationdisplaysthegeneratedresponsetotheuser,completing
theinteraction.

RAG Vs Fine Tuning
▪Objective
−Fine-tuningaimstoadaptapre-trainedLLMtoaspecifictaskordomainbyadjustingitsparametersbasedontask-
specificdata.
−RAGfocusesonimprovingthequalityandrelevanceofgeneratedtextbyincorporatingretrievedinformationfromexternal
sourcesduringthegenerationprocess.
▪TrainingData
−Fine-tuningrequirestask-specificlabeleddata/examplestoupdatethemodel'sparametersandoptimize,leadingtomore
time&cost
−RAGreliesonacombinationofpre-trainedLLMandexternalknowledgebases
▪Adaptability
−Fine-tuningmakestheLLMmorespecializedandtailoredtoaspecifictaskordomain
−RAGmaintainsthegeneralizabilityofthepre-trainedLLMbyleveragingexternalknowledgeallowingittoadapttoawide
rangeoftasks
▪ModelArchitecture
−Fine-tuningtypicallyinvolvesmodifyingtheparametersofthepre-trainedLLMwhilekeepingitsarchitectureunchanged.
−RAGcombinestheretrievalandgenerationcomponents,withthestandardLLMarchitecturetoincorporatetheretrieval
mechanism.

RAG Benefits
•EnhancedRelevance:
•Incorporatesexternalknowledgeformorecontextuallyrelevantresponses.
•ImprovedQuality:
•Enhancesthequalityandaccuracyofgeneratedoutput.
•Versatility:
•Adaptabletovarioustasksanddomainswithouttask-specificfine-tuning.
•EfficientRetrieval:
•Leveragesexistingknowledgebases,reducingtheneedforlargelabeleddatasets.
•DynamicUpdates:
•Allowsforreal-timeorperiodicupdatestomaintaincurrentinformation.
•TrustandTransparency
•Accurateandreliableresponses,underpinnedbycurrentandauthoritativedata,significantlyenhanceusertrustinAI-driven
applications.
•CustomizationandControl:
•OrganizationscantailortheexternalsourcesRAGdrawsfrom,allowingcontroloverthetypeandscopeofinformationintegratedinto
themodel’sresponses
•CostEffective

Applications
•ConversationalAI:
•RAGenableschatbotstoprovidemoreaccurateandcontextuallyrelevantresponsestouserqueries..
•AdvancedQuestionAnswering:
•RAGenhancesquestionansweringsystemsbyretrievingrelevantpassagesordocumentscontaininganswerstouserqueries.
•ContentGeneration:
•Incontentgenerationtaskssuchassummarization,articlewriting,andcontentrecommendation,RAGcanaugmentthegeneration
processwithretrievedinformation,incorporatingrelevantfacts,statistics,andexamplesfromexternalsources.
•Healthcare:
•RAGcanassisthealthcareprofessionalsinaccessingrelevant/latestmedicalliterature,guidelines.

Demo

Thank you
Tags