[DSC DACH 24] Evalution and Observability of Gen AI application - Igor Nikolaienko

DataScienceConferenc1 137 views 13 slides Sep 16, 2024

Slide 1 of 13

About This Presentation

Discuss the importance of evaluation and observability in generative AI applications, focusing on metrics, monitoring tools, and methodologies to ensure AI app performance and reliability.

Size: 188.04 MB

Language: en

Added: Sep 16, 2024

Slides: 13 pages

Slide Content

Post & Parcel Germany | September 2024 Evalution and Observability of GenERATIVE AI application Generative AI Architect P&P Innovation Management Vienna, 12 September 2024 Igor Nikolaienko

Key points Why we use GenAI ? Why building GenAI App is complex? Why Observability is essential for GenAI ? How does GenAI Evaluation work? What Evaluation Frameworks are available?

3 Productivity 45% of organizations report improved productivity User Experience 85% Security 82% Revenue increase 86% of organizations report an improved ability to identify threats of organizations report improved user experience of organizations seeing revenue growth above 6% Business Benefits of GenAI

Building GenAI App Full -Stack Web App Back-End & Front-End Development Access Control & User Management API Integration DevOps Networking Penetration tests LLM-Driven (RAG) App Logic Vector Database Management Data Chunking, Re-ranking, Filtering Metadata Enrichment Guardrails Implementation Agentic Functionality Tracing and Monitoring My Midjourney prompt: “Badly observable spaghetti-code system”

5 Tackling GenAI Complexity Predictability Guardrails , Policies Evaluation Metrics / KPI‘s , Production Monitoring Explainability False Outputs Analysis, Tracing, Audit logging Testing A/B Testing “Introduce AI Obsevability to Supervise Generative AI” Modular Architecture is a key. M odular O pen System A pproach (MOSA) Observability

Testing and Evaluation Methods Prompts and Parameters Testing Objective: Optimize LLM performance by testing different prompts and application parameters. Method: A/B Testing Evaluation Objective: Assess LLM performance based on reference answers using KPIs and metrics. Method: LLM- as -a-Judge Reference Answers: Comparison of LLM output to ground-truth or manually curated Q&A’s as benchmarks Synthetic Q&A: Leveraging of automatically LLM-generated Q&A’s.

Metrics for Talk-to-Your-Data (RAG) App User Query 1 Knowledge Base 2 Relevant Context 3 LLM Call 4 LLM Responce 5 -„RAG is the Taylor Swift of Gen AI“

Metrics for Talk-to-Your-Data (RAG) App User Query 1 Knowledge Base 2 Relevant Context 3 LLM Call 4 LLM Responce 5 Reference Responce Correctness Context Relevance Faithfullness Latency Cost User Feedback -„RAG is the Taylor Swift of Gen AI“

Evaluation Frameworks not exhaustive list

GenAI Platform: LangSmith

Summary GenAI is an opportunity. Building GenAI is complex. Modular architecture is a key. Observability is essential. Establish evaluation methods. Define metrics and KPI’s. Choose Evaluation framework.

[DSC DACH 24] Evalution and Observability of Gen AI application - Igor Nikolaienko

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

[DSC DACH 24] Evalution and Observability of Gen AI application - Igor Nikolaienko

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx