Open-source Observability for Your LLM Applications... Tracing your chains!.pdf

FilipeOliveira715501 110 views 70 slides Oct 18, 2024
Slide 1
Slide 1 of 70
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70

About This Presentation

Observability is a critical aspect of modern software engineering, especially for complex systems powered by LLMs. Simple monitoring solutions often fall short in providing the depth of insight required to fully understand and optimize these systems for production usage. The good news is that we alr...


Slide Content

ⓒ 2024 Redis Ltd. All rights reserved.
Open-source Observability for
Your LLM Applications...
Tracing your chains!

ⓒ 2024 Redis Ltd. All rights reserved. 2
•Portuguese

•Principal Performance Engineer @ Redis

•improve/develop open source performance and
observability tools & act afterwards:

redis, vector-db-benchmark, memtier_benchmark,
hdr-histogram-go, t-digest-c, among others…


check my blog at: https://blog.codeperf.io/

github: filipecosta90 and @fcosta_oliveira





> whoami

ⓒ 2024 Redis Ltd. All rights reserved.
2 disclaimers
2
important points

ⓒ 2024 Redis Ltd. All rights reserved. 4
Work done together with Martin Dimitrov. and Marcin Poźniak from Intel Corp.

●opentelemetry-python-contrib #2635

●Billion scale vector datasets (more about this in the next weeks)

●Optimizing vector libs for Vector extensions

●a lot more….

ping us at: performance <at> redis <dot> com

5 ⓒ 2024 Redis Ltd. All rights reserved.
let's make this interactive

ⓒ 2024 Redis Ltd. All rights reserved. 6
> today
1.takeaway
2.covering some basics of distributed tracing
3.creating a simple Redis RAG template to test
tracing
4.tracing it
5.closing remarks

ⓒ 2024 Redis Ltd. All rights reserved. 7
> takeaway

ⓒ 2024 Redis Ltd. All rights reserved. 8
> takeaway
$> Fully example at:
https://github.com/redis-performance/opentelemetry-redis-llm-sample

ⓒ 2024 Redis Ltd. All rights reserved. 9
> transaction visibility across multiple
components that make your system

ⓒ 2024 Redis Ltd. All rights reserved. 10
> some basics of distributed tracing

ⓒ 2024 Redis Ltd. All rights reserved. 11
> some basics of distributed tracing
FIRST THINGS FIRST
observability

ⓒ 2024 Redis Ltd. All rights reserved. 12
> some basics of observability
“how well the internal states of a
system can be deduced from
knowledge of its external
outputs”

ⓒ 2024 Redis Ltd. All rights reserved. 13
> some basics of observability

ⓒ 2024 Redis Ltd. All rights reserved. 14
> some basics of observability
REQUEST
SCOPE
LOGGING
EVENTS
METRICS
AGGREGATE VIEW
TRACING
** REQUEST
SCOPE

ⓒ 2024 Redis Ltd. All rights reserved. 15
> why metrics?
aggregate view

ⓒ 2024 Redis Ltd. All rights reserved. 16
> why logs?
more context on individual element of system

ⓒ 2024 Redis Ltd. All rights reserved. 17
good > individual/aggregate state
bad > correlating it

ⓒ 2024 Redis Ltd. All rights reserved. 18
> some basics of tracing
REQUEST
SCOPE
LOGGING
EVENTS
METRICS
AGGREGATE VIEW
TRACING
** TRANSACTION
SCOPE

ⓒ 2024 Redis Ltd. All rights reserved. 19
> some basics of tracing
REQUEST
SCOPE
TRACING
** REQUEST
SCOPE
“analyze and debug a transaction across
multiple software components”
real-time RAG with Redis’ capabilities for AI sample diagram.
https://redis.io/blog/using-redis-for-real-time-rag-goes-beyond-a-vector-database/

ⓒ 2024 Redis Ltd. All rights reserved. 20
TRACING
** REQUEST
SCOPE
real-time RAG with Redis’ capabilities for AI sample diagram.
https://redis.io/blog/using-redis-for-real-time-rag-goes-beyond-a-vector-database/
VENDOR1
VENDOR2
VENDOR3
> some basics of tracing
“analyze and debug a transaction across
multiple software components”

ⓒ 2024 Redis Ltd. All rights reserved. 21
> some basics of tracing
REQUEST
SCOPE
TRACING
** REQUEST
SCOPE
“analyze and debug a transaction across
multiple software components”
Online Boutique microservices demo application composed of 11 microservices written in different languages. Google uses this
application to demonstrate the use of technologies like the now old — OpenCensus.
VENDOR1
VENDOR2
VENDOR3
VENDOR4

ⓒ 2024 Redis Ltd. All rights reserved. 22

ⓒ 2024 Redis Ltd. All rights reserved. 23
video link

medium link

ⓒ 2024 Redis Ltd. All rights reserved. 24
> As you can imagine, in multi-vendor
environments, this causes
interoperability problems, like...

ⓒ 2024 Redis Ltd. All rights reserved. 25
> no unique identifier propagation
between vendors

ⓒ 2024 Redis Ltd. All rights reserved. 26
> vendor-specific metadata was not
ensured to be retained between
systems interchange.

ⓒ 2024 Redis Ltd. All rights reserved. 27
> Cloud platform vendors, intermediaries
and service providers, could not truly
guarantee to support trace context
propagation

ⓒ 2024 Redis Ltd. All rights reserved. 28

ⓒ 2024 Redis Ltd. All rights reserved. 29
> where does OpenTelemetry fit into the
observability effort? why??

ⓒ 2024 Redis Ltd. All rights reserved. 30
> universally[1] agreed-upon format for
the exchange of trace context
propagation data — i.e. trace context.


[1] W3C trace context definition in
https://www.w3.org/TR/trace-context/#design-overview.

ⓒ 2024 Redis Ltd. All rights reserved. 31
SERVICE BOUNDARY provides Request, Error and
Duration (RED) details

ⓒ 2024 Redis Ltd. All rights reserved. 32
Simon Zeltser diagram on “Distributed tracing and Monitoring with Opentelemtry”

ⓒ 2024 Redis Ltd. All rights reserved. 33
> end of part 1 <

ⓒ 2024 Redis Ltd. All rights reserved. 34
> creating a simple Redis RAG template
to test tracing

ⓒ 2024 Redis Ltd. All rights reserved. 35
> Retrieval Augmented Generation (RAG) is a way to
connect LLMs to external sources of data

what we will do
-Create embeddings and store them in vector store ;
-Query DB using simple similarity search ;
-Query LLMs using relevant information from a knowledge base;
> Vector Store
and knowledge base
> Embeddings
and generative capabilities

ⓒ 2024 Redis Ltd. All rights reserved. 36
> Retrieval Augmented Generation (RAG) is a way to
connect LLMs to external sources of data

what we will do
-Create embeddings and store them in vector store ;
-Query DB using simple similarity search ;
-Query LLMs using relevant information from a knowledge base;
> Vector Store
and knowledge base
> Embeddings
and generative capabilities
SWAP
other
Vector DB /
knowledge
base
SWAP
other LLM

ⓒ 2024 Redis Ltd. All rights reserved. 37
push based
instrumentation
how we will gain visibility
(...)

ⓒ 2024 Redis Ltd. All rights reserved. 38
push based
instrumentation
how we will gain visibility
(...)

ⓒ 2024 Redis Ltd. All rights reserved. 39
> follow along with:
> extra details we’re not able to cover:

url: https://python.langchain.com/docs/integrations/vectorstores/redis/#usage-for-retrieval-augmented-generation
git clone https://github.com/redis-performance/opentelemetry-redis-llm-sample
cd opentelemetry-redis-llm-sample
poetry install

# spin up redis, otel-collector, and jaeger

docker-compose up

# open jaeger UI
http://localhost:16686/search

ⓒ 2024 Redis Ltd. All rights reserved. 40
> demo1: Create embeddings and store
them in vector store: Redis

ⓒ 2024 Redis Ltd. All rights reserved. 41
> demo1: Create embeddings and store
them in vector store: Redis

> using same data from langchain’s redis’s 101:

https://python.langchain.com/docs/integrations/vectorstores/redis/#sample-data
Sample Data
The 20 newsgroups dataset comprises around 18000 newsgroups posts on 20 topics. We'll
use a subset for this demonstration and focus on two categories: 'alt.atheism' and
'sci.space':

ⓒ 2024 Redis Ltd. All rights reserved. 42
> demo1: Create embeddings and store
them in vector store: Redis

Sample Data
The 20 newsgroups dataset comprises around 18000 newsgroups posts on 20 topics. We'll use a
subset for this demonstration and focus on two categories: 'alt.atheism' and 'sci.space':

ⓒ 2024 Redis Ltd. All rights reserved. 43
> demo1: Create embeddings and store
them in vector store: Redis

Sample Data
The 20 newsgroups dataset comprises around 18000 newsgroups posts on 20 topics. We'll use a
subset for this demonstration and focus on two categories: 'alt.atheism' and 'sci.space':

ⓒ 2024 Redis Ltd. All rights reserved. 44
> follow along with:
$ poetry run python3 create_embeddings.py
Set up the tracer provider and the OTLP exporter...
Configure OTLP exporter...
Instrument Redis with OpenTelemetry...
Fetching docs...
There are a total of 250 docs
Using OpenAI to create the embeddings
Add items to vector store

ⓒ 2024 Redis Ltd. All rights reserved. 45
demo 1
> explore data with redis-cli

> find index creation trace in jaeger

ⓒ 2024 Redis Ltd. All rights reserved. 46
> demo2: test the entire DB flow using
simple similarity search

ⓒ 2024 Redis Ltd. All rights reserved. 47
> demo2: test the entire DB flow using
simple similarity search
Sample Query
Tell me about space exploration
2
3
5
4

ⓒ 2024 Redis Ltd. All rights reserved. 48
> demo2: test the entire DB flow using
simple similarity search
Sample Query
Tell me about space exploration

ⓒ 2024 Redis Ltd. All rights reserved. 49
> demo2: test the entire DB flow using
simple similarity search
Sample Query
Tell me about space exploration
$ poetry run python3 simple_similarity_search.py
(...)
Simple Similarity Search Results:
Content: From: [email protected] (T. Giaquinto)
Subject: General Information Request
Organization: Worcest...
Metadata: {'category': 'sci.space'}
—--
Content: From: [email protected] (Jeff Bytof - SIO)
Subject: End of the Space Age?
Organization: San Di...
Metadata: {'category': 'sci.space'}
(...)

ⓒ 2024 Redis Ltd. All rights reserved. 50
> demo2: test the entire DB flow using
simple similarity search
Sample Query
Tell me about space exploration

ⓒ 2024 Redis Ltd. All rights reserved. 51
end of demo 2
> explore multiple component spans on trace

ⓒ 2024 Redis Ltd. All rights reserved. 52
> demo 3: Query LLMs using relevant
information from a knowledge base;

ⓒ 2024 Redis Ltd. All rights reserved. 53
> reviving our sample app architecture

ⓒ 2024 Redis Ltd. All rights reserved. 54
> demo 3: Query LLMs using relevant information from a
knowledge base;

ⓒ 2024 Redis Ltd. All rights reserved. 55
fetches relevant information from a
pre-built knowledge base.
Ensure information is both relevant and
diverse
> demo 3: Query LLMs using relevant information from a
knowledge base;

ⓒ 2024 Redis Ltd. All rights reserved. 56
Use the following pieces of context from The 20 newsgroups dataset
which
comprises around 18000 newsgroups posts on 20 topics.
We're using a subset for this demonstration and focus on two
categories:
'alt.atheism' and 'sci.space'. Use that dataset to answer the question.
Do not make up an answer if there is no context provided to help
answer it.
Include the 'source' and 'start_index'
from the metadata included in the context you used to answer the
question

Context:
---------
{context}

---------
Question: {question}
---------

Answer:
> demo 3: Query LLMs using relevant information from a
knowledge base;

ⓒ 2024 Redis Ltd. All rights reserved. 57
Use the following pieces of context from The 20 newsgroups dataset
which
comprises around 18000 newsgroups posts on 20 topics.
We're using a subset for this demonstration and focus on two
categories:
'alt.atheism' and 'sci.space'. Use that dataset to answer the question.
Do not make up an answer if there is no context provided to help
answer it.
Include the 'source' and 'start_index'
from the metadata included in the context you used to answer the
question
use of metadata like 'source' and 'start_index' ensures
transparency about the retrieved information.
> demo 3: Query LLMs using relevant information from a
knowledge base;

ⓒ 2024 Redis Ltd. All rights reserved. 58
Sample Query
Tell me about space exploration. Do you have any important person to mention?
> demo 3: Query LLMs using relevant information from a
knowledge base;
$ poetry run python3 rag-with-openllmetry.py
(...)
One important person in the field of space exploration is Dr. Mae Jemison. She was the first black
woman in space and flew on the Endeavour spacecraft. She will also be appearing as a transporter
operator on an episode of "Star Trek: The Next Generation" that airs the week of May 31.

[Source: Document(metadata={'category': 'sci.space'}, page_content='From: [email protected]
(Doug Loss)')]

ⓒ 2024 Redis Ltd. All rights reserved. 59
Sample Query
Tell me about space exploration. Do you have any important person to mention?
> demo 3: Query LLMs using relevant information from a
knowledge base;

ⓒ 2024 Redis Ltd. All rights reserved. 60
Sample Query
Tell me about space exploration. Do you have any important person to mention?
> demo 3: Query LLMs using relevant information from a
knowledge base;
relevant information from a knowledge base

ⓒ 2024 Redis Ltd. All rights reserved. 61
end of demo 3
> explored multiple component spans on trace

ⓒ 2024 Redis Ltd. All rights reserved. 62
Best Practices for Distributed Tracing in Production


provide better insights into failures
in your application.


auto-instrumentation or
out-of-the-box code won’t give
you all the context of your
business.


avoid overwhelming your tracing
backend and finance dpts.



Business / Critical data
Context Propagation:

Error Handling: Sampling:

ⓒ 2024 Redis Ltd. All rights reserved. 63
> mission, scope, and implementation
examples of distributed tracing <

ⓒ 2024 Redis Ltd. All rights reserved. 64
> tracing by itself does not get the
job done <

ⓒ 2024 Redis Ltd. All rights reserved. 65
> analyze the 99th percentile
latency of your application? <

ⓒ 2024 Redis Ltd. All rights reserved. 66
> provide the error rate of you app <

ⓒ 2024 Redis Ltd. All rights reserved. 67
> what’s the % of clients being
served up to 100ms? ( CDF ) <

ⓒ 2024 Redis Ltd. All rights reserved. 68
> three pillars of observability
REQUEST
SCOPE
LOGGING
EVENTS
METRICS
AGGREGATE VIEW
TRACING
** REQUEST
SCOPE

ⓒ 2024 Redis Ltd. All rights reserved. 69
Further references and recommendations:
-OpenTelemetry CNCF website: https://opentelemetry.io/

-Ted Young - The Fundamentals of OpenTelemetry
https://www.youtube.com/watch?v=X8w4yCmAMos
-
-The reading of “Distributed Tracing in Practice” by Austin Parker, Daniel Spoonhower,
Jonathan Mace, Ben Sigelman, and Rebecca Isaacs.

-Langfuse: Open Source LLM Engineering Platform:
https://github.com/langfuse/langfuse?tab=readme-ov-file

-Tracing Langchain apps with Elastic, OpenLLMetry, and OpenTelemetry:
https://www.elastic.co/observability-labs/blog/elastic-opentelemetry-langchain-tracing

70
questions?