Langid with ChatGPT benchmark comarision with fasttext.pdf

bazevgenii 14 views 15 slides Sep 19, 2024
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

LangID with LLM


Slide Content

ChatGPT: Jack of all trades,
master of none
Paper Discussion





03/2023
Evgeny BAZAROV, [email protected]

arXiv:2302.10724v1 [cs.CL] 21 Feb 2023

Let’s go to paper

Small After Party
Language identification with Chat-GPT

ChatGPT API (gpt-3.5-turbo)
BESEDO

Old Benchmark

1.Filter data (1 < n_chars < 15)
2.Sample 10k (stratified) - should be ok to be representative
3.Create prompt
4.Run API with gpt-3.5-turbo
5.Postprocess results
6.Benchmark fasttext vs chatgpt
Steps

Prompting
V1: "You are an API to a Language Identification service. I will give you text as
input, you will return to me the ISO 639-1 code of a language in which text was
written. Your response should be in json format."

Creativity?

Prompting
V2:

Result

Result

Result
5-7 points better than fasttext