What will we learn Updates on LLM Prompt Techniques Hands-on Prompt Techniques Serving LLM for end-user: Chainlit Supervised Fine Tuning Langsmith
LLM
Artificial Intelligence Large language models (LLMs) are a category of foundation models trained on immense amounts of data making them capable of understanding and generating natural language and other types of content to perform a wide range of tasks. (IBM) Large Language Model Definition
LLM for Bahasa Indonesia (or SEA languages) Komodo-7B , Yellow.ai : EN, ID, 11 Regional Language Wiz.ai 13B : Bahasa Indonesia SEA-LION 3B/7B, AI Singapore : SEA Languages
Popular LLMs GPT 4 Claude Gemini Llama 3.1
GPT 4 Estimated parameters : 1.76 trillion Max. Context Window : 128k token GPT 4 Launch Date : March 14 2023 GPT-4 Turbo and GPT-4 Turbo with Vision : November 2023 GPT-4o (omni) Launch Date : May 13 2024 GPT-4o mini Launch Date : July 18 2024
GPT 4
GPT 4o the same high intelligence as GPT-4 Turbo but is much more efficient—it generates text 2x faster and is 50% cheaper. multimodal (accepting text or image inputs and outputting text) GPT-4o has the best vision and performance across non-English languages of any of OpenAI models gpt-4o-2024-08-06 provides 16,384 max output token (from previously only 4096 tokens)
GPT 4o mini OpenAI’s most advanced model in the small models category, and their cheapest model yet multimodal (accepting text or image inputs and outputting text), has higher intelligence than gpt-3.5-turbo but is just as fast. It is meant to be used for smaller tasks, including vision tasks. We recommend choosing gpt-4o-mini where you would have previously used gpt-3.5-turbo as this model is more capable and cheaper.
Claude Feature : Advanced Reasoning, Vision Analysis, Code Generation, Multilingual Processing Latest : Claude 3.5 Sonnet (Jun 21 2024) Context Window : 200k+ tokens (about 500 pages of text). Claude Model Family Haiku : Light and Fast Sonnet : Optimal Opus : Powerful
Gemini Different Sizes: Gemini Nano (1.0) : on-device Gemini Flash (1.5) : lightweight Gemini Pro (1.5) : best model for general performance Gemini Ultra (1.0) : largest model Gemini Pro (1.5) has 2 Million (!) context window
Llama 3.1 Llama trained on 2048 A100 Llama 3.1 trained on 16000 H100 GPU 3 different parameter size : 7B, 80B, 405B Context Window : 128k Capabilities : Code , Multilinguality , Math and Reasoning , Long Context , Tool Use, Factuality , Steerability
AlpacaEval GPT 4 still leads Llama 3.1 (8B Instruct) is readily available at Telkom to try (ranked 29)
Cost of Training LLM 2024 AI Index Report by Stanford
“Open” AI Llama 3.1 405B vs GPT-4 Comparison for similar number of parameters
Telkom Internal Deployed LLM https://api.postman.com/collections/15922710-c4fb5d7c-55bc-4eb4-9929-c0a2d26418cf?access_key=PMAT-01J4GY8WP86NFW4XRSQEAKCWYW https://telkom-ai-dag.api.apilogy.id/Telkom-LLM/0.0.3/chat/completion/telkomai key-auth key jadi x- api -key For this demo, RAHASIA = gRwiTy4K1RnHEdT89pTFLSy3WtuvpAW6
LLM Optimization Flow Highlights consideration of LLM Optimization
LLMOps LLMOps allows for the efficient deployment, monitoring and maintenance of large language models.
Prompt Techniques
Paper that we’re referring to Survey of Prompting Techniques This presentation will only cover English text prompting techniques Alternative : https://www.promptingguide.ai/
Definition Prompt : A prompt is an input to a Generative AI model, that is used to guide its output. Prompt Template : Prompts are often constructed via a prompt template. A prompt template is a function that contains one or more variables which will be replaced by some media (usually text) to create a prompt.
Definition Prompting : The process of providing a prompt to a GenAI , which then generates a response. Prompt Engineering : the iterative process of developing a prompt by modifying or changing the prompting technique that you are using. Prompting Technique : prompting technique is a blueprint that describes how to structure a prompt, prompts, or dynamic sequencing of multiple prompts.
Components of A Prompt Component Explanation Example Directive Intent of Prompt Tell me five good books to read. Examples act as demonstrations that guide the GenAI to accomplish a task Night: Noch Morning: Output Formatting Instruction to output in certain formats {PARAGRAPH} Summarize this into a CSV Style Instructions modify the output stylistically rather than structurally Write a clear and curt paragraph about llamas. Role “Persona” Pretend you are a shepherd and write a limerick about llamas. Additional Information “Context” if the directive is to write an email, you might include information such as your name and position so the GenAI can properly sign the email
In Context Learning (ICL) ICL refers to the ability of GenAIs to learn skills and tasks by providing them with exemplars and or relevant instructions within the prompt, without the need for weight updates/retraining Skills can be learned from exemplars and/or instructions
1. Zero Shot Zero-Shot Prompting uses zero exemplars. Can be combined with other concept : Chain of Thought A few examples include Role Prompting Style Prompting Emotion Prompting System 2 Attention SimToM Rephrase and Respond ( RaR ) Re-reading (RE2) Self-Ask
Example : Role Prompting Role Prompt prompts that assign the role to the LLM “You are [role]” Audience Prompt prompts that specify the audience of the conversation “You are talking to a [role]” Interpersonal Prompt prompts that connotate the relationship between the speaker and listener. “You are talking to your [role]”
2. Few-Shot Prompting GenAI learns to complete a task with only a few examples (exemplars).
Few-Shot Prompting Design Decision Design Decision Explanation Exemplar Quantity Increasing the quantity of exemplars in the prompt generally improves model performance, particularly in larger models However, in some cases, the benefits may diminish beyond 20 exemplars Exemplar Ordering The order of exemplars affects model behavior Exemplar Label Distribution if 10 exemplars from one class and 2 exemplars of another class are included, this may cause the model to be biased toward the first class Exemplar Label Quality Despite the general benefit of multiple exemplars, the necessity of strictly valid demonstrations is unclear. Some research shows that exemplars with incorrect labels may negatively diminish performance, but some research don’t support this. Exemplar Format Common format : “Q: {input}, A: {label}” Exemplar Similarity Selecting exemplars that are similar to the test sample is generally beneficial for performance
3. Thought Generation Zero Shot CoT Contain 0 Exemplars Thought Inducer "Let’s think step by step.“ "Let’s work this out in a step by step way to be sure we have the right answer“ Characteristic No example Task Agnostic Few Shot CoT Contain multiple exemplars
3. Thought Generation
4. Ensembling In GenAI , ensembling is the process of using multiple prompts to solve the same problem, then aggregating these responses into a final output.
5. Self Criticism When creating GenAI systems, it can be useful to have LLMs criticize their own outputs
6. Decomposition decomposing complex problems into simpler sub-questions.
Hands On
Hands-on : Langsmith
LangSmith Sign up for LangSmith
LangSmith User API Keys Settings > API Keys
LangSmith Projects Settings > API Keys
LangSmith Dataset
Hands-on : Supervised Fine Tuning
Supervised Fine Tuning Use GPU for Google Colab Environment. Example : T4 Using LLM : Mistral-7B-Instruct-v0.2 With limited time, only limited epoch and data will be tuned on
Hands-on : Chainlit
Ngrok Sign up for ngrok : https://dashboard.ngrok.com/signup navigate to your dashboard and locate the Authtoken