Small After Party
Language identification with Chat-GPT
ChatGPT API (gpt-3.5-turbo)
BESEDO
Old Benchmark
1.Filter data (1 < n_chars < 15)
2.Sample 10k (stratified) - should be ok to be representative
3.Create prompt
4.Run API with gpt-3.5-turbo
5.Postprocess results
6.Benchmark fasttext vs chatgpt
Steps
Prompting
V1: "You are an API to a Language Identification service. I will give you text as
input, you will return to me the ISO 639-1 code of a language in which text was
written. Your response should be in json format."