Large Language Models Updates and Hands On

bigvisiondeveloper 22 views 49 slides Sep 16, 2025
Slide 1
Slide 1 of 49
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49

About This Presentation

LLM Hands on


Slide Content

Tools Jupyter Notebook / Google Colab

What will we learn Updates on LLM Prompt Techniques Hands-on Prompt Techniques Serving LLM for end-user: Chainlit Supervised Fine Tuning Langsmith

LLM

Artificial Intelligence Large language models (LLMs) are a category of foundation models trained on immense amounts of data making them capable of understanding and generating natural language and other types of content to perform a wide range of tasks. (IBM) Large Language Model Definition

LLM for Bahasa Indonesia (or SEA languages) Komodo-7B , Yellow.ai : EN, ID, 11 Regional Language Wiz.ai 13B : Bahasa Indonesia SEA-LION 3B/7B, AI Singapore : SEA Languages

Popular LLMs GPT 4 Claude Gemini Llama 3.1

GPT 4 Estimated parameters : 1.76 trillion Max. Context Window : 128k token GPT 4 Launch Date : March 14 2023 GPT-4 Turbo and GPT-4 Turbo with Vision : November 2023 GPT-4o (omni) Launch Date : May 13 2024 GPT-4o mini Launch Date : July 18 2024

GPT 4

GPT 4o the same high intelligence as GPT-4 Turbo but is much more efficient—it generates text 2x faster and is 50% cheaper. multimodal (accepting text or image inputs and outputting text) GPT-4o has the best vision and performance across non-English languages of any of OpenAI models gpt-4o-2024-08-06 provides 16,384 max output token (from previously only 4096 tokens)

GPT 4o mini OpenAI’s most advanced model in the small models category, and their cheapest model yet multimodal (accepting text or image inputs and outputting text), has higher intelligence than gpt-3.5-turbo but is just as fast. It is meant to be used for smaller tasks, including vision tasks. We recommend choosing gpt-4o-mini where you would have previously used gpt-3.5-turbo as this model is more capable and cheaper.

Claude Feature : Advanced Reasoning, Vision Analysis, Code Generation, Multilingual Processing Latest : Claude 3.5 Sonnet (Jun 21 2024) Context Window : 200k+ tokens (about 500 pages of text). Claude Model Family Haiku : Light and Fast Sonnet : Optimal Opus : Powerful

Gemini Different Sizes: Gemini Nano (1.0) : on-device Gemini Flash (1.5) : lightweight Gemini Pro (1.5) : best model for general performance Gemini Ultra (1.0) : largest model Gemini Pro (1.5) has 2 Million (!) context window

Llama 3.1 Llama trained on 2048 A100 Llama 3.1 trained on 16000 H100 GPU 3 different parameter size : 7B, 80B, 405B Context Window : 128k Capabilities : Code , Multilinguality , Math and Reasoning , Long Context , Tool Use, Factuality , Steerability

LLM Comparison Pricing | https://huggingface.co/spaces/philschmid/llm-pricing Leaderboard | https://tatsu-lab.github.io/alpaca_eval/

Pricing for “Large” Models

Pricing for “Medium” Models

Pricing for “Small” Models

AlpacaEval GPT 4 still leads Llama 3.1 (8B Instruct) is readily available at Telkom to try (ranked 29)

Cost of Training LLM 2024 AI Index Report by Stanford

“Open” AI Llama 3.1 405B vs GPT-4 Comparison for similar number of parameters

Telkom Internal Deployed LLM https://api.postman.com/collections/15922710-c4fb5d7c-55bc-4eb4-9929-c0a2d26418cf?access_key=PMAT-01J4GY8WP86NFW4XRSQEAKCWYW https://telkom-ai-dag.api.apilogy.id/Telkom-LLM/0.0.3/chat/completion/telkomai key-auth key jadi x- api -key For this demo, RAHASIA = gRwiTy4K1RnHEdT89pTFLSy3WtuvpAW6

LLM Optimization Flow Highlights consideration of LLM Optimization

LLMOps LLMOps allows for the efficient deployment, monitoring and maintenance of large language models.

Prompt Techniques

Paper that we’re referring to Survey of Prompting Techniques This presentation will only cover English text prompting techniques Alternative : https://www.promptingguide.ai/

Definition Prompt : A prompt is an input to a Generative AI model, that is used to guide its output. Prompt Template : Prompts are often constructed via a prompt template. A prompt template is a function that contains one or more variables which will be replaced by some media (usually text) to create a prompt.

Definition Prompting : The process of providing a prompt to a GenAI , which then generates a response. Prompt Engineering : the iterative process of developing a prompt by modifying or changing the prompting technique that you are using. Prompting Technique : prompting technique is a blueprint that describes how to structure a prompt, prompts, or dynamic sequencing of multiple prompts.

Components of A Prompt Component Explanation Example Directive Intent of Prompt Tell me five good books to read. Examples act as demonstrations that guide the GenAI to accomplish a task Night: Noch Morning: Output Formatting Instruction to output in certain formats {PARAGRAPH} Summarize this into a CSV Style Instructions modify the output stylistically rather than structurally Write a clear and curt paragraph about llamas. Role “Persona” Pretend you are a shepherd and write a limerick about llamas. Additional Information “Context” if the directive is to write an email, you might include information such as your name and position so the GenAI can properly sign the email

In Context Learning (ICL) ICL refers to the ability of GenAIs to learn skills and tasks by providing them with exemplars and or relevant instructions within the prompt, without the need for weight updates/retraining Skills can be learned from exemplars and/or instructions

1. Zero Shot Zero-Shot Prompting uses zero exemplars. Can be combined with other concept : Chain of Thought A few examples include Role Prompting Style Prompting Emotion Prompting System 2 Attention SimToM Rephrase and Respond ( RaR ) Re-reading (RE2) Self-Ask

Example : Role Prompting Role Prompt prompts that assign the role to the LLM “You are [role]” Audience Prompt prompts that specify the audience of the conversation “You are talking to a [role]” Interpersonal Prompt prompts that connotate the relationship between the speaker and listener. “You are talking to your [role]”

2. Few-Shot Prompting GenAI learns to complete a task with only a few examples (exemplars).

Few-Shot Prompting Design Decision Design Decision Explanation Exemplar Quantity Increasing the quantity of exemplars in the prompt generally improves model performance, particularly in larger models However, in some cases, the benefits may diminish beyond 20 exemplars Exemplar Ordering The order of exemplars affects model behavior Exemplar Label Distribution if 10 exemplars from one class and 2 exemplars of another class are included, this may cause the model to be biased toward the first class Exemplar Label Quality Despite the general benefit of multiple exemplars, the necessity of strictly valid demonstrations is unclear. Some research shows that exemplars with incorrect labels may negatively diminish performance, but some research don’t support this. Exemplar Format Common format : “Q: {input}, A: {label}” Exemplar Similarity Selecting exemplars that are similar to the test sample is generally beneficial for performance

3. Thought Generation Zero Shot CoT Contain 0 Exemplars Thought Inducer "Let’s think step by step.“ "Let’s work this out in a step by step way to be sure we have the right answer“ Characteristic No example Task Agnostic Few Shot CoT Contain multiple exemplars

3. Thought Generation

4. Ensembling In GenAI , ensembling is the process of using multiple prompts to solve the same problem, then aggregating these responses into a final output.

5. Self Criticism When creating GenAI systems, it can be useful to have LLMs criticize their own outputs

6. Decomposition decomposing complex problems into simpler sub-questions.

Hands On

Hands-on : Langsmith

LangSmith Sign up for LangSmith

LangSmith User API Keys Settings > API Keys

LangSmith Projects Settings > API Keys

LangSmith Dataset

Hands-on : Supervised Fine Tuning

Supervised Fine Tuning Use GPU for Google Colab Environment. Example : T4 Using LLM : Mistral-7B-Instruct-v0.2 With limited time, only limited epoch and data will be tuned on

Hands-on : Chainlit

Ngrok Sign up for ngrok : https://dashboard.ngrok.com/signup navigate to your dashboard and locate the Authtoken

That’s it, Thank you for joining!
Tags