The Journey of Large Language Models at GetYourGuide
chloewilliams62
97 views
27 slides
Sep 11, 2024
Slide 1 of 27
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
About This Presentation
"Integrating Large Language Models (LLMs) into our workflows at GetYourGuide has been quite the adventure. In this talk, I’ll share our experience with LLMs, focusing on the products we’ve built , the challenges we faced, and the impact on our business.
We’ll explore the exciting use cas...
"Integrating Large Language Models (LLMs) into our workflows at GetYourGuide has been quite the adventure. In this talk, I’ll share our experience with LLMs, focusing on the products we’ve built , the challenges we faced, and the impact on our business.
We’ll explore the exciting use cases, technical hurdles like integration and scaling, as well as our architectural decisions. Additionally, I’ll discuss our approach to dealing with hallucinations, a common downside of LLMs.
By sharing real examples from GetYourGuide, I’ll highlight what worked well and what didn’t, offering a handy guide for other organisations looking to tap into the power of LLMs."
Size: 7.5 MB
Language: en
Added: Sep 11, 2024
Slides: 27 pages
Slide Content
LLMs - The journey
2 32|
Agenda
●Intro to GYG
●Data Products and Machine Learning Platform
●LLMs at GYG : Practical Use Cases
●Deep Dive into a product
●Challenges
●Conclusion
3
Intro to GYG
Data products
We unlock the world’s most unforgettable travel experiences
Chase the Northern Lights in Norway
around a fire with a hot drink
Experience a tour of the vibrant Maeklong
Railway Market and Damnoen Saduak
Floating Market in Bangkok
Go on a desert safari in Dubai
with sandboarding and
a camel ride
Enjoy a delicious experience at the
Museum of Ice Cream in New York
4
5 35|
Data Products and MLP
What is Data Products
●Organization composed of Data Scientists and ML Engineers.
●Develop data driven capabilities to support scaling and
optimization of products in GYG.
●We focus on solving problems and improving customer experience
using Machine Learning
Machine Learning Platform?
●Enable delivery of production-ready data products faster and
reliably by leveraging MLOps best practices providing self-service
tools and automation.
6
LLMs in Getyouguide
37 |
Use Cases
●Content Generation
○Localization: Translation of articles in various languages
○AI assisted tour creation
38 |
●FAQs:
○Section in our tours post-booking to identify and answer most common questions
Use Cases
39 |
Shipping LLMs at
GYG
●Novel Trip items
○better exploration of inventory
●Flagging inappropriate content
And the list goes on ……..
10 310|
ML Platform and LLMs
Role of MLP was to provide a stable way to ship, maintain,
and observe LLMs in production, along with figuring out the
best practices
11 311|
Use Case 1: Localization
●Explorer Articles
○Blog posts if you like
○Attached are link to GYG activities
312 |
Use Case 1: Localization
●Explorer Articles
313 |
Use Case 1: Localization
Chat GPT or DeepL ?
314 |
Use Case 1: Localization
Chat GPT and DeepL !
315 |
Use Case 1: Localization
50%
32%
English
European
Spanish
Asian
6%
18%
We need:
1)Good translation quality
2)Tone of Voice
Hybrid (post-editing) is the way to
go
ChatGPT was trained on a dataset of 300 billion words. The dataset was 570 GB in size and consisted of
crawled web data, books, Wikipedia, etc. (Source: https://nerdynav.com/chatgpt-statistics/)
Scaling:
ML Stack
cost monitoring
dashboard with
datadog
Open AI integration to
make batch calls
1)Parallel processing
2)Rate limiting
3)Prompt templates
Bootstrapping project
with LLM Templates
End-end
automated-post-editing
running on a schedule
on Airflow and DBFS for
storage
Model evaluation on
arize phoenix
Prompt development
and testing on
databricks notebook
and db-rocket
17 317|
18 318|
Quality
1)machine translation: DeepL has already a good performance
2)Post Editing (Chat GPT) : Hallucinations!
a)Making up places and times
i)Adding information that is not there in the original text
19 319|
Quality
1)machine translation: DeepL has already a good performance
2)Post Edting (Chat GPT) : Hallucinations!
a)Making up places and times
i)Adding information that is not there in the original text
b)prompt leakage
c)
20 320|
Quality
1)machine translation: DeepL has already a good performance
2)Post Edting (Chat GPT) : Hallucinations!
a)Making up places and times
i)Adding information that is not there in the original text
b)prompt leakage
c)fails to remain in role, and comments on task
321 |
Methodology to evaluate LLMs
pyproject.toml it enhanced,
3. Model Graded/ evals
○This is inspired by the Open AI
evals
○(AI to evaluate AI)
○results reveal that strong LLM
judges like GPT-4 achieving
over 80% agreement, the same
level of agreement between
humans.(Judging LLM as a
judge)
322 |
Methodology to evaluate LLMs
pyproject.toml it enhanced,
3. Model Graded/ evals
○This is inspired by the Open AI
evals
○(AI to evaluate AI)
○results reveal that strong LLM
judges like GPT-4 achieving
over 80% agreement, the same
level of agreement between
humans.(Judging LLM as a
judge)
323 |
Methodology to evaluate LLMs
pyproject.toml it enhanced,
Arize phoenix
●Open source tool for
tracing and evaluation of
Gen AI applications
●Quickly evolving landscape:
○Many new tools, since the start of project
●Prompt engineering is time consuming:
○The evaluator prompt, even more,
●Ensuring quality
○Dynamic nature of data and obsolescence of prompts - needs revisiting
○Human in the loop, cannot completely be automated
326 |
Challenges
●Organizational Challenges
○Prompt engineering: Close collaboration b/w business and DS
○Dissolving team boundaries, necessitating roles that extend beyond
traditional responsibilities.
○Idea of making LLMs decentralized