LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant

AnantCorp 252 views 11 slides Jun 12, 2024
Slide 1
Slide 1 of 11
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11

About This Presentation

Slides for the 4th Presentation on LLM Fine-Tuning with QLoRA Presented by Anant, featuring DataStax Astra


Slide Content

LLM Fine Tuning with QLoRA - Evaluation vs RAG Comparing our fine-tuned Llama 2 model to using Retrieval Augmented Generation alongside base Llama 2. Evaluated using similar statistical measures the we used previously. Obioma Anomnachi Engineer @ Anant

RAG Overview What is Retrieval-Augmented Generation (RAG)? Hybrid NLP Approach: Combines information retrieval and text generation. Creates more comprehensive and contextually accurate outputs. Uses External Knowledge Sources: Leverages large corpora or databases. Augments generative capabilities of language models. How RAG Works: Retrieval Stage: Model retrieves relevant information from a pre-existing corpus or knowledge base. Generation Stage: Uses retrieved information as input. Generates a coherent and contextually appropriate response. Produces more informed and accurate results. Especially effective for complex tasks requiring in-depth knowledge.

Traditional Language Models: Data Dependency: Rely solely on the data they were trained on. Text Generation: Generate high-quality text based on learned patterns. Limitations: Struggle with tasks requiring up-to-date information. May lack specific factual knowledge not present in training data. RAG Models: Enhanced Generative Process: Incorporate real-time information retrieval. Dynamic Information Retrieval: Fetch and utilize the most relevant information available at the time of generation. Improved Performance: Significantly better at tasks requiring recent, detailed, or domain-specific information. RAG vs Language Models

RAG Components

Retrievers Knowledge Sources External Corpora: Large datasets, databases, and documents. Domain-Specific Databases: Specialized knowledge bases tailored to specific fields (e.g., medical, legal). Real-Time Data: Up-to-date information from live sources such as news feeds or databases. Search Mechanisms Dense Vector Representations: Utilize neural embeddings to find semantically similar documents. Sparse Vector Representations: Use traditional methods like TF-IDF or BM25 to retrieve relevant passages. Hybrid Techniques: Combine dense and sparse methods for more accurate retrieval. Relevance Scoring: Assign scores to documents based on relevance to the query. Filtering and Ranking: Select and rank the most pertinent information for generation.

Retrievers - Embeddings and Similarity Search What are Neural Embeddings? Definition: Neural embeddings are dense vector representations of words, phrases, sentences, or documents, generated using neural network models. They capture semantic meaning in a continuous vector space where similar items are placed closer together. Purpose: Semantic Similarity: Encodes semantic information, making it easier to measure similarity between different pieces of text. Allows models to understand and retrieve information based on meaning, not just exact word matching. Output: Generates dense vectors (embeddings) with fixed dimensions, typically high-dimensional (e.g., 300, 768).

RAG Advantages Enhanced Accuracy: Incorporation of External Knowledge: Leverages up-to-date and domain-specific information. Improved Factuality: Accesses and integrates verified data sources. Reduces the risk of generating incorrect or outdated information. Increased Relevance: Context-Aware Responses: Dynamic retrieval of pertinent information based on the query. Ensures responses are highly relevant to the user's needs. Domain-Specific Expertise: Customizable to access specialized knowledge bases (e.g., medical, legal). Real-Time Information: Capable of retrieving the latest data, adapting to changes and new developments. Useful for applications requiring up-to-date information, like news or trend analysis. Versatile Applications: Adapts to various tasks such as question answering, summarization, and conversational agents.

RAG Enhanced Accuracy and Relevance: Incorporates up-to-date, domain-specific information dynamically. Provides contextually relevant responses leveraging real-time data retrieval. Scalability and Flexibility: Adaptable to various tasks without the need for extensive retraining. Easy to update knowledge base for different domains or new information. Cost Efficiency: Reduces the need for large-scale dataset creation and extensive retraining. Utilizes existing knowledge sources, lowering computational and resource expenses. Fine Tuning Customization and Specialization: Tailors the model to specific tasks or domains Results in highly specialized models fine-tuned to particular use cases. Improved Performance for Specific Tasks: Fine-tuning on curated datasets produces models optimized for particular applications. Enhances performance in narrow domains with specialized requirements. Control Over Output: Fine-grained adjustments to the model improve accuracy and reduce errors. Allows for better control over generated content style. RAG vs Fine Tuning

Evaluation Because the answer is ultimately generated via LLM, the performance of a RAG model is evaluated the same way as for LLMs, fine tuned or not. Domain specific tests, benchmarks, statistical measures, human and llm evaluation all work the same as in the previous presentation. Performance will depend on the sophistication of the retriever mechanism as well as the capabilities of the LLM used, and the the quality of the data backing it.

Demo

Strategy: Scalable Fast Data Architecture: Cassandra, Spark, Kafka Engineering: Node, Python, JVM,CLR Operations: Cloud, Container Rescue: Downtime!! I need help.  www.anant.us | [email protected] | (855) 262-6826 3 Washington Circle, NW | Suite 301 | Washington, DC 20037