The Journey of Large Language Models at GetYourGuide

chloewilliams62 97 views 27 slides Sep 11, 2024
Slide 1
Slide 1 of 27
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27

About This Presentation

"Integrating Large Language Models (LLMs) into our workflows at GetYourGuide has been quite the adventure. In this talk, I’ll share our experience with LLMs, focusing on the products we’ve built , the challenges we faced, and the impact on our business.

We’ll explore the exciting use cas...


Slide Content

LLMs - The journey

2 32|
Agenda
●Intro to GYG

●Data Products and Machine Learning Platform

●LLMs at GYG : Practical Use Cases

●Deep Dive into a product

●Challenges

●Conclusion

3
Intro to GYG

Data products

We unlock the world’s most unforgettable travel experiences
Chase the Northern Lights in Norway
around a fire with a hot drink
Experience a tour of the vibrant Maeklong
Railway Market and Damnoen Saduak
Floating Market in Bangkok

Go on a desert safari in Dubai
with sandboarding and
a camel ride
Enjoy a delicious experience at the
Museum of Ice Cream in New York

4

5 35|
Data Products and MLP
What is Data Products
●Organization composed of Data Scientists and ML Engineers.
●Develop data driven capabilities to support scaling and
optimization of products in GYG.
●We focus on solving problems and improving customer experience
using Machine Learning

Machine Learning Platform?
●Enable delivery of production-ready data products faster and
reliably by leveraging MLOps best practices providing self-service
tools and automation.

6
LLMs in Getyouguide

37 |
Use Cases

●Content Generation
○Localization: Translation of articles in various languages
○AI assisted tour creation

38 |
●FAQs:
○Section in our tours post-booking to identify and answer most common questions

Use Cases

39 |
Shipping LLMs at
GYG

●Novel Trip items
○better exploration of inventory
●Flagging inappropriate content

And the list goes on ……..

10 310|
ML Platform and LLMs



Role of MLP was to provide a stable way to ship, maintain,
and observe LLMs in production, along with figuring out the
best practices

11 311|
Use Case 1: Localization
●Explorer Articles
○Blog posts if you like
○Attached are link to GYG activities

312 |
Use Case 1: Localization
●Explorer Articles

313 |
Use Case 1: Localization

Chat GPT or DeepL ?

314 |
Use Case 1: Localization

Chat GPT and DeepL !

315 |
Use Case 1: Localization
50%
32%
English
European
Spanish
Asian
6%
18%
We need:

1)Good translation quality
2)Tone of Voice

Hybrid (post-editing) is the way to
go
ChatGPT was trained on a dataset of 300 billion words. The dataset was 570 GB in size and consisted of
crawled web data, books, Wikipedia, etc. (Source: https://nerdynav.com/chatgpt-statistics/)

Scaling:
ML Stack

cost monitoring
dashboard with
datadog
Open AI integration to
make batch calls
1)Parallel processing
2)Rate limiting
3)Prompt templates
Bootstrapping project
with LLM Templates
End-end
automated-post-editing
running on a schedule
on Airflow and DBFS for
storage
Model evaluation on
arize phoenix
Prompt development
and testing on
databricks notebook
and db-rocket

17 317|

18 318|
Quality
1)machine translation: DeepL has already a good performance



2)Post Editing (Chat GPT) : Hallucinations!
a)Making up places and times
i)Adding information that is not there in the original text

19 319|
Quality
1)machine translation: DeepL has already a good performance



2)Post Edting (Chat GPT) : Hallucinations!
a)Making up places and times
i)Adding information that is not there in the original text

b)prompt leakage






c)

20 320|
Quality
1)machine translation: DeepL has already a good performance



2)Post Edting (Chat GPT) : Hallucinations!
a)Making up places and times
i)Adding information that is not there in the original text

b)prompt leakage

c)fails to remain in role, and comments on task

321 |
Methodology to evaluate LLMs

pyproject.toml it enhanced,



3. Model Graded/ evals
○This is inspired by the Open AI
evals
○(AI to evaluate AI)
○results reveal that strong LLM
judges like GPT-4 achieving
over 80% agreement, the same
level of agreement between
humans.(Judging LLM as a
judge)

322 |
Methodology to evaluate LLMs

pyproject.toml it enhanced,



3. Model Graded/ evals
○This is inspired by the Open AI
evals
○(AI to evaluate AI)
○results reveal that strong LLM
judges like GPT-4 achieving
over 80% agreement, the same
level of agreement between
humans.(Judging LLM as a
judge)

323 |
Methodology to evaluate LLMs

pyproject.toml it enhanced,



Arize phoenix
●Open source tool for
tracing and evaluation of
Gen AI applications

324 |
Model Monitoring: Arize
Trigger
Databricks job
Evaluator
Publish articles

325 |
Challenges


●Quickly evolving landscape:
○Many new tools, since the start of project
●Prompt engineering is time consuming:
○The evaluator prompt, even more,
●Ensuring quality
○Dynamic nature of data and obsolescence of prompts - needs revisiting
○Human in the loop, cannot completely be automated

326 |
Challenges



●Organizational Challenges
○Prompt engineering: Close collaboration b/w business and DS
○Dissolving team boundaries, necessitating roles that extend beyond
traditional responsibilities.
○Idea of making LLMs decentralized
Tags