A Survey of Techniques for Maximizing LLM Performance.pptx

thanhdowork 827 views 12 slides Jun 04, 2024
Slide 1
Slide 1 of 12
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12

About This Presentation

A Survey of Techniques for Maximizing LLM Performance


Slide Content

A Survey of Techniques for Maximizing LLM Performance 2023. OpenAI YANJUN WU Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail:[email protected]

1. Introduction Optimizing LLMs is hard: Extracting signal from the noise is not easy Performance can be abstract and difficult to measure When to use what optimization Contributions: A mental model of what the options are An appreciation of when to use one over the other The confidence to continue on this optimization journey yourself

2. Optimizing LLM performance is not always linear This is problematic becasue retrieval-augmented generation ( RAG) and fine-tuning(FT) solve different problems. Sometimes you need retrieval-augmented generation(RAG), sometimes you need fine-tune(FT) , somtimes you need both, depending on the category of issue you are dealing with

3. The Optimization flow First Step: Start from prompt engineering Figure how you're going to consistently evaluate your outputs Determine if this is a context problem or how we need the model to act problem Second Step: determine optimization objectives If you need more relevant context, use RAG If you need more consistent instructions, use fine-tuning If your problem requires both, stack together

4. Prompt engineering advantages disadvantages Early try to clear up the needs Combined with evaluation to provide a baseline and prepare for further optimization Introducing new knowledge Reliably replicate complex styles (e.g., learning a new programming language) - Use fewer tokens Or - Few shot - Using external tools

5. Retrieval-Augmented Generation(RAG) The role of the RAG is mainly to provide the model with contextual knowledge based on a particular problem, introducing new information that may not be available in the LLM.

5. Retrieval-Augmented Generation(RAG) advantages Introducing new information to the model and updating knowledge Reduce hallucinations by controlling content disadvantages Models understand wide-ranging domains, such as law or medicine. Models learn new languages, formats, or styles, which is more the domain of model fine-tuning

6. RAG E valuation

7. F ine-tuning Fine-tuning: Large models have more knowledge. Specialize large models and fine-tune their ability to be better suited for a particular task. Why need model fine-tuning? Uses prompt technology to take up the input token. There is no need to use complex prompting techniques or to provide complex commands, clear models, or contextual examples. Good for highlighting knowledge that already exists in the base model. Great for modifying or customizing the structure or tone of model output.

7. F ine-tuning Why does this use case work? First, there is no need for new knowledge. All the knowledge needed to solve the problem already existed in the base model, but the model needed to output results with a very particular structure. Then they used very high quality training data and had a good baseline to compare the two. They evaluated 3.5 Turbo and GPT-4 and understood where they succeeded and where they fell short, so they knew that model fine-tuning would be a good way to solve this task.

8 . Conclusion The first place to start looking to improve the performance of LLM is with prompt engineering techniques. You can continue to improve prompts until performance reaches a bottleneck, at which point an in-depth analysis of the types of errors encountered is required. If new knowledge or richer context needs to be introduced to the model, the RAG path is a possible option. If the model has problems with following instructions consistently, if there is a need to follow a rigorous or innovative output structure, or if you wish to interact with the model in a more efficient way, then Fine-tuning the model may need to be considered at this point. It is worth noting that this process is not linear. Multiple iterations may be required to achieve truly satisfactory accuracy, and there may be iterative switching between these techniques during the optimization process.

4. Question Q&A
Tags