RAG Explainations presentation work in progress

bpadmaraj186 67 views 22 slides Aug 29, 2024
Slide 1
Slide 1 of 22
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22

About This Presentation

Test


Slide Content

RAG WALTHROGH

RAG Overview [Date] Copyright © 2023, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 2 Traditional AI systems struggle to navigate this vast sea of data and provide relevant, accurate responses. RAG addresses this challenge by seamlessly integrating retrieval and generation techniques, enabling efficient access to knowledge and synthesis of coherent responses. RAG consists of two main components: the R etriever: The retriever employs advanced information retrieval techniques to swiftly identify relevant passages from vast repositories of data. G enerator : The generator synthesizes coherent responses based on the retrieved information, leveraging large-scale language models and sophisticated generation strategies. RAG harnesses state-of-the-art techniques in natural language understanding, enabling it to interpret and analyze complex queries with precision. By understanding the nuances of human language, RAG delivers contextually relevant responses that meet the user's information needs effectively.

RAG Architecture [Date] Copyright © 2023, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 3 Raw Data Data Extraction (PDF, OCR, DOCX, PPTX, TXT, Web Crawler Data Preparation Data Source Clustering and Classification Relevant Document List Large Language Model Embedding Model Vector Database Answer Query Query

Indexing Indexing is the process of organizing data in a way that makes retrieval efficient. In the context of large language models and retrieval-augmented generation (RAG), indexing allows the system to quickly locate relevant pieces of information from vast datasets. Engineering in Langchain: - Chunking: Dividing documents into smaller, manageable chunks. - Recursive Chunking: Iteratively breaking down text until optimal chunk sizes are achieved. - Hierarchical Indexing: Creating a multi-level index where each level provides increasingly granular access to data. - Domain-Specific Indexing: Customizing indexes based on specific domains to enhance retrieval relevance.

Chunking Sherpa Chunking Sherpa refers to a sophisticated method of text segmentation where the goal is to break down large documents into coherent and contextually meaningful segments, facilitating easier indexing and retrieval. Engineering in Langchain: Custom algorithms that analyze document structure, semantics, and context to create optimal chunks. Custom logic has been written to append the section in each chunk to maintain the context

Recursive Chunking Recursive chunking involves repeatedly breaking down text into smaller chunks until they are of an appropriate size for processing. This method ensures that even very large documents can be efficiently indexed and retrieved. Engineering in Langchain: Recursive functions that check chunk sizes and further divide them if they exceed a predefined limit. This is flat chunking we have done along with sherpa to leverage the larger context and use as ensembled retriever

Page Level Chunking Page level chunking breaks documents into chunks based on their pagination, which is particularly useful for structured documents like PDFs, where each page can be treated as an individual chunk. Engineering in Langchain : Using PDF parsing libraries to detect page boundaries and segment the text accordingly. If the document is flat and do not have section or paragraph information, we switch to page level chunking

Semantic Chunking Semantic chunking involves breaking text into chunks based on semantic meaning, ensuring each chunk is contextually meaningful. This enhances the retrieval process by maintaining the coherence of information. Engineering in Langchain: Using natural language processing (NLP) techniques such as topic modeling and semantic analysis to determine chunk boundaries. Semantic chucking a LLM to segregate the chucks with similar semantic meanings, it helps to collect the section of docs with similar meaning at one place Uses the SemanticChunker calss

Character Chunking Character chunking breaks text into chunks based on character count, ensuring each chunk is within a specific length limit. This is useful for maintaining uniform chunk sizes. Engineering in Langchain: Setting a maximum character limit for each chunk and splitting the text accordingly. References: - "Character-Based Text Chunking" - Technical Report - "Langchain Character Chunking" - Langchain Docs

Hierarchical Indexing Hierarchical indexing creates multiple levels of indexes, with each level providing more detailed access to the data. This structure improves search efficiency and precision. Engineering in Langchain: Creating indexes at different granularity levels, such as document-level, paragraph-level, and sentence-level.

Document Types Different document types require tailored processing and indexing methods. PDFs, DOCX, PPTX, and TXT files each have unique structures and formats that influence how they should be processed for indexing and retrieval. Engineering in Langchain: Specialized parsers and converters that extract and normalize content from each format.

Custom RAG Custom RAG refers to the ability to tailor the retrieval-augmented generation (RAG) system to specific requirements and use cases, enhancing its effectiveness and relevance. Engineering in Langchain: Configuring retrieval pipelines, adjusting generation models, and fine-tuning parameters to meet specific application needs.

Systems and Services These systems and services provide the infrastructure and tools necessary to support the retrieval and generation processes in a RAG system. They handle data storage, search, generation, and caching functionalities. Engineering in Langchain: Integrating with various systems and services to provide a robust RAG framework. - OpenSearch Cluster: Scalable search and retrieval. - OCI GenAI RAG Agent: Handles generation tasks using AI models. - Redis Cluster: Provides caching to speed up retrieval operations.

Vector DB and Embedding Storage Techniques Vector databases store embeddings, which are numerical representations of data that capture semantic meaning. These embeddings enable efficient similarity searches and retrieval in RAG systems. Engineering in Langchain: - Using vector databases to store and manage embeddings. - Techniques for storing embeddings include Dense Vectors and Sparse Vectors. - Multiple vectors for the same document is been created

Retriever - Single, Multiple, and Ensemble Retrievers are critical components in RAG systems that fetch relevant documents or data chunks based on a query. Single retrievers use one method, multiple retrievers use various methods, and ensemble retrievers combine the results of several retrievers for improved accuracy. Engineering in Langchain: - Single Retriever: Uses one retrieval strategy (e.g., keyword-based). - Multiple Retrievers: Employs different strategies simultaneously (e.g., keyword BM25 and semantic). - Ensemble Retrievers: Combines the outputs of multiple retrievers to enhance result relevance and accuracy.

Retriever - Single, Multiple, and Ensemble Reciprocal Rank Fusion Combining results from multiple retrieval models to improve ranking. Engineering in LangChain : Implementing reciprocal rank fusion strategies. Single Query Retriever Context: Using context from a single query to generate responses. Engineering: Context management in LangChain . Multiquery Retriever Context: Leveraging multiple queries for better context in generation. Engineering: Multiquery context handling in LangChain . Semantic Retriever with Multi Chunking: Combining semantic retrieval with multi-level chunking for better results. Engineering: Integrating semantic retrieval and chunking in LangChain . BM25 Retriever: Using BM25 algorithm for document ranking. Engineering: Implementing BM25 in LangChain .

User Interactions and Memory Buffer in LangChain User interactions in a RAG system involve generating answers to user queries and handling batch queries for efficiency. The system must be responsive and accurate in providing information. Engineering in Langchain: APIs and interfaces that manage query input, processing, and response generation. Ensuring the system can handle individual queries as well as batch processing. Interaction acts like a conversation used MessagePlaceholder

RAGAS Evaluation Framework An evaluation framework is essential for assessing the performance and effectiveness of a RAG system. It involves metrics, benchmarks, and testing methodologies to ensure the system meets the desired standards. Engineering in Langchain: Setting up evaluation metrics like context_precision , faithfulness, answer_relevancy context_recall , answer_correctness - Implementing benchmark datasets for consistent testing. - Continuous monitoring and testing to improve system performance.

Prompt Augmentation Prompt augmentation involves modifying or enhancing the initial user query to improve retrieval and generation results. Techniques include adding context, rephrasing, or expanding the query. Engineering in Langchain: Implementing pre-processing steps that enrich the query before passing it to the retrieval or generation modules. Prompt Templates: Multiquery Prompt Generation RAG Fusion Rerank

Hyde and Query Improvement Techniques Hyde (Hybrid Deep Retrieval) combines various retrieval techniques to enhance performance. Query improvement techniques involve refining the query to yield better search results, such as rephrasing or adding contextual information. Engineering in Langchain: - Supporting Hyde by integrating different retrieval methods. - Allowing for dynamic query improvement based on initial search outcomes.

Generator for RAG The generator in a RAG system produces coherent and relevant responses based on retrieved data. It leverages advanced language models to generate human-like text. Engineering in Langchain: Integrating with state-of-the-art language models like Cohere Command R Plus model to generate responses.
Tags