Evaluating RAG pipelines built on unstructured data

chloewilliams62 118 views 11 slides Sep 18, 2024

Slide 1 of 11

About This Presentation

This talk will cover different techniques for evaluating a RAG pipeline built on unstructured data. Standing up a basic RAG pipeline is becoming easier every day, however identifying weak points in your application or dataset remains a challenge. We'll review how you can use traditional assertio...

Size: 1.67 MB

Language: en

Added: Sep 18, 2024

Slides: 11 pages

Slide Content

Evaluating Agentic RAG Pipelines
September 2024
Hakan Tekgul
Arize AI
Solution Architect

© All Rights Reserved | We Make Models Work
One Big Pain: Even Small Changes Can Cause Performance
Regressions
Reality of AI Engineering this past year:
Change a prompt or model, break a use case.
Repeat.
You are an assistant debugging
RAG, investigate the retrieved
results and evals…
LGTM!

© All Rights Reserved | We Make Models Work
Solution: Evaluation Driven Development
Examples
Curate Dataset Track Changes as an Experiment
(Model, Prompt, Retriever)
Evaluate the Experiment
New
Output
Score
0.8
You’re a helpful
assistant. When
user asks about
return policy
respond with
{vars} …
LLM APPS REQUIRE ITERATIVE PERFORMANCE IMPROVEMENTS

© All Rights Reserved | We Make Models Work
RAG Architecture
User query LLM Response
User
feedback
Query
embedding
Vector store
Prompt
With Context and
User Query
Search & Retrieval
Vector DB OrchestrationAgent
LLM Evaluation and Observability
LLM INFRA STACK
LLM Providers

© All Rights Reserved | We Make Models Work
Output
Data
LLM Evals for RAG Applications (LLM-As-A-Judge)

Eval Library
(Phoenix)
Eval
Template
Model
Params
Eval LLM
Input Data
Eval
Chain
EmbRetriever
LLM Span/Chain Under Test

© All Rights Reserved | We Make Models Work
How do Evals work? (LLM-As-A-Judge)

Eval
Example
“relevant”
“irrelevant”

span
span
retrieval span
span
Phoenix Library
Model Params
Eval LLM
Eval Template
Example: Retrieval
retrieval span
Span we want to evaluate
Output
User Query
Input
Documents
Eval Template
You are comparing a reference text to a question and trying to determine
if the reference text contains information relevant to answering the
question. Here is the data:
[BEGIN DATA]
************
[Question]: {query}
************
[Reference text]: {reference}
[END DATA]

Compare the Question above to the Reference text. Determine whether
the Reference text contains information that can answer the Question.

© All Rights Reserved | We Make Models Work
RAG Evaluation Overview
For evaluating a RAG application, you need to consider 2 types of evaluations:
-Relevancy Evaluation: Is the retrieved context relevant to user query?
-Response Evaluations:
-Hallucination: Is the response hallucinated based on retrieved data? Is it faithful to
context?
-Q&A Correctness: Is the response correct based on question and context?

© All Rights Reserved | We Make Models Work
Unstructured RAG - Knowledge Base Analysis with Embeddings
●Leverage query and knowledge
embeddings for RAG
performance

●Understand gaps within your
knowledge base

●Essential for
Unstructured/MultiModal RAG

Evaluating RAG pipelines built on unstructured data

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Evaluating RAG pipelines built on unstructured data

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

DTI BPI Pivot Small Business - BUSINESS START UP PLAN

CATHOLIC EDUCATIONAL Corporate Responsibilities

Karin Schaupp – Evocation; lançamento: 2000

Pillars of Biblical Oneness in the Book of Acts

7-10. STP + Branding and Product &amp; Services Strategies.pptx

Business Legislation PPT - UNIT 1 jimllpkggg

7-10. STP + Branding and Product & Services Strategies.pptx