[GEMINI EXTERNAL DECK] Introduction to Gemini.pptx

AhmedElbaloug 347 views 146 slides May 07, 2024
Slide 1
Slide 1 of 146
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130
Slide 131
131
Slide 132
132
Slide 133
133
Slide 134
134
Slide 135
135
Slide 136
136
Slide 137
137
Slide 138
138
Slide 139
139
Slide 140
140
Slide 141
141
Slide 142
142
Slide 143
143
Slide 144
144
Slide 145
145
Slide 146
146

About This Presentation

tripoli


Slide Content

Intro to Gemini The official Google AI deck! Ahmed Elbaloug GDG Tripoli @elbaloug

https://trygcp.dev/e/build-with-ai-tripoli Learning more https://ai.google.dev/docs/discord

Visit the cookbook: https://github.com/google-gemini/cookbook Start developing Go to Google AI Studio . Login with your Google account. Create an API key. Use a quickstart for Python, or call the REST API using curl . Read on to learn more about the Gemini API Get started! Newly added!

what is generative ai ?

What is a n LLM? LLMs Explained [...] [...] [...] [...] 0.02 0.03 0.9 0.01 0.0 … Dogs Rain Drops Fish Wind … and cats raining It’s

Roses are red, Violets are blue, Sugar is sweet, LLMs Explained

Roses are red, Violets are blue, Sugar is sweet, LLMs Explained

for(var i = 0; ; i < 10; i++) {

for(var i = 0; i < 10; i++) {

Modern LLMs are large . LLMs Explained

Classic Natural Language Problems LLMs Explained Entity extraction Classification Summarization Sentiment Analysis Translation …

LLMs let us prototype fast . LLMs Explained

Explain this joke: “ I used to be addicted to soap, but I'm clean now.” Model: The joke is a pun on the word "clean". It has two meanings: Model: User: The joke is funny because it is unexpected. The listener expects the speaker to be talking about the first meaning of "clean", but they are actually talking about the second meaning. This creates a surprise and a laugh. To be free of dirt or other unwanted substances. To be free of addiction. LLMs Explained

Model: 1. **Build a volcano.** This is a classic science fair project that is both fun and educational. You can learn about the science of volcanoes, earthquakes, and plate tectonics. 2. **Make a solar oven.** This is a great project to learn about renewable energy and how to harness the power of the sun. You can use your solar oven to cook food, dry fruit, or even melt chocolate. 3. **Create a 3D printer.** This is a more challenging project, but it is also very rewarding. You can learn about 3D printing technology and design your own custom objects. 4. **Build a robot.** This is a great project for those who are interested in engineering and robotics. You can learn about basic electronics, mechanics, and programming. 5. **Design a water filtration system.** This is a project that can help you learn about the importance of clean water and how to filter it. You can use your water filtration system to clean water from a stream or river. LLMs Explained Give me five neat ideas for science fair projects. User:

Why are large language models different? LLMs are characterized by emergent abilities , or the ability to perform tasks that were not present in smaller models. LLMs contextual understanding of human language changes how we interact with data and intelligent systems. LLMs can find patterns and connections in massive, disparate data corpora. Search Conversation Content generation

Google is the industry pioneer in AI Responsible AI 3,000 Researchers 7,000 Publications Built & Tested for Safety Privacy in design Upholds high scientific standards Accountable to People Socially Beneficial Avoid creating unfair bias 2015 Google DeepMind AlphaGo defeats Go champion 2016 Google’s DeepMind helps detect eye disease 2017 Google invents Transformer kickstarting LLM revolution 2018 Google’s groundbreaking large language model, BERT 2019 Text-to-Text Transfer Transformer LLM 10B P Model Open Sourced 2020 Google LaMDA Model Trained to converse 2022 AlphaFold predicts structures of all known proteins 2023 A conversational AI Service powered by PaLM2 202 4 Family of multimodal LLMs & products

Multimodality

(Android AICore)

ai.google.dev/gemma

The Gemini Ecosystem The most advanced AI from Google For Developers For Consumers For Business and Enterprise Models Gemini API (in Google AI Studio + ai.google.dev) Gemini for Google Workspace Gemini for G oogle Cloud Gemini in Vertex AI Gemini | app and web Gemini in the Google App Gemini in Gmails, Docs… this deck is about

Getting started with the Gemini API Train the Trainer in AI

AI Studio aistudio.google.com Generate API Keys Create, test, and save prompts Customize models in minutes Generate starter code

AI Studio aistudio.google.com

AI Studio aistudio.google.com

AI Studio aistudio.google.com

AI Studio aistudio.google.com

AI Studio

ai.google.dev REST API + Client libraries for Python, Node, Java, and Swift Libraries Libraries SDKs

Vertex AI Enterprise grade support. Full MLOps (Examples: Model evaluation, monitoring, registry) Vertex AI Check it out when you're ready for production Gemini API and Vertex AI Both give access Gemini family models. Vertex AI Cloud Production ai.google.dev/docs/migrate_to_cloud PlatfoEndpoints rms Platforms Platforms

Usage patterns Train the Trainer in AI

We’ll cover Basic Prompting Code generation Search, retrieval and information synthesis Function calling Image understanding Tuning Usage Patterns

Basic prompting Train the Trainer in AI

Write a poem about a magic backpack. In a world of wonder, behold, A backpack with tales yet untold. Its fabric shimmers, alive with might, A portal to realms, day and night. aistudio.google.com

Write a poem about a magic backpack. In a world of wonder, behold, A backpack with tales yet untold. Its fabric shimmers, alive with might, A portal to realms, day and night. ... aistudio.google.com

Generate a short tip about JavaScript. Good tips are short and contain a code example showing the tip. **Tip: Use Array.prototype.flat() to flatten an array of arrays.** ```javascript const array = [[1, 2], [3, 4], [5, 6]]; console.log(array.flat()); // Output: [1, 2, 3, 4, 5, 6] ```

Generate a short tip about JavaScript. Good tips are short and contain a code example showing the tip. **Tip: Use Array.prototype.flat() to flatten an array of arrays.** ```javascript const array = [[1, 2], [3, 4], [5, 6]]; console.log(array.flat()); // Output: [1, 2, 3, 4, 5, 6] ```

Generate a short tip about JavaScript. Good tips are short and contain a code example showing the tip. topic : let and const tip : Prefer using `let` and `const` to initialise variables, instead of `var`, as `var` has no block scope or protection against mutability. e.g. const myVar = "EXAMPLE"; topic : arrow syntax tip :

AI Studio Endpoints Prompting

import google.generativeai as genai model = genai.GenerativeModel( 'models/gemini-pro' ) resp = model.generate_content( 'Write the first paragraph of a story about a magic backpack' ) >>> print (resp.text) In a bustling city, amidst the vibrant tapestry of human existence, there existed a peculiar entity named Archie. Archie, however, was no ordinary backpack...

chat = model.start_chat() response = chat.send_message( "Hello, what should I have for dinner?" ) print (response.text) # 'Here are some suggestions...' response = chat.send_message( "How do I cook the first one?" )

Contents Settings/Configs Text Text Text Tools Tools Safety Settings Safety Settings Generation Config Content of role: "user" containing multimodal Parts: Content of role: "model" "user" content: a text Part Tools specified by the caller Safety setting configured by caller Generation Config includes: temperature, Top P, Top K, stop seq, max output tokens etc. Generate Content Request

Candidate(s) Feedback Text Block Reason Safety Ratings Candidate "Content" note: only one candidate returned today why the model stopped generating feedback on the prompt Finish Reason Finish Message set if finish reason is present Safety Ratings Safety Ratings how safe is the response Generate Content Response …

AI Studio Endpoints Prompting https://ai.google.dev/docs/prompt_best_practices Chained prompts - make a plan, then execute it Context - Few shot prompts Generation parameters - Temperature, Safety settings, Top-P, TopK

Code generation Train the Trainer in AI

Code Generation Content generation Convert human language requests to machine requests Rapid tool building Generate code based on a user prompt

Examples Code generation Generate data

Examples Code generation Generate data Generate a SQL query

Examples Code generation Generate data Generate a SQL query Simulate Execution

Examples Code generation Generate data Generate a SQL query Simulate Execution Quick coding help - Smart pair programer.

Write a unit test for this Python function: def is_diver_neutrally_buoyant(density_water, density_diver): """Check if a diver is neutrally buoyant.""" if density_water == density_diver: return True

import unittest class IsDiverNeutrallyBuoyantTest (unittest.TestCase): def test_diver_is_neutrally_buoyant(self): self.assertTrue(is_diver_neutrally_buoyant( 1000 , 1000 )) def test_diver_is_not_neutrally_buoyant(self): self.assertFalse(is_diver_neutrally_buoyant( 1000 , 1010 )) def test_invalid_input_types(self): with self.assertRaises(TypeError): is_diver_neutrally_buoyant( "1000" , 1000 ) with self.assertRaises(TypeError): is_diver_neutrally_buoyant( 1000 , "1000" )

Search and Information Synthesis Train the Trainer in AI

Models have knowledge cut-offs LLMs are not fact engines No exposure to private data BYO Data Search & IR

Instructions + Context + Question all in the prompt Easy to implement No extra code, just ask. Use the prompt's context window Search & IR

model = genai.GenerativeModel( 'gemini-pro' ) document = pathlib.Path( 'document.txt' ).read_text() result = model.generate_content( f""" Explain how deep-sea life survives. Please answer based on the following document: {document} """ )

Use the prompt's context window Learning more Limited by the model's context length gemini-1.0-pro: 30K tokens. Search & IR

Use the prompt's context window Learning more gemini-1.5-pro: 1M tokens. Remember: Everything is tokens. Using context may be more flexible than Retrieval Augmented Generation (RAG). More input context means higher latency Join the waitlist: https://aistudio.google.com/app/waitlist/97445851 https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024 Search & IR

Use Attributed Question Answering (AQA) with inline chunks AQA Instructions + Chunks + Question in one request AQA is a new feature Feedback is welcome The AQA model is specifically tuned for RAG

import google.generativeai.answer import google.generativeai as genai chunks = [ "Your Googlecar has a climate control system that allo..." , "Your Googlecar has a large touchscreen display that p..." , "Your Googlecar has an automatic transmission. To shif..." ]

answer = genai.answer.generate_answer( contents= 'How do you make it less windy?' , answer_style=AnswerStyle. ABSTRACTIVE , inline_passages=chunks)

answer { content { parts { text: "To decrease the airflow inside the car, " "turn the airflow knob counterclockwise." }} ... grounding_attributions { source_id { grounding_passage { passage_id: "0" } } content { parts { text: "Your Googlecar has a climate control system that allows..." }}} } answerable_probability: 0.995222747

Use AQA with inline chunks Learning more Limited by the request size - Max 4MB. Inefficient when asking about the same data repeatedly. Simple to use. Example: handle chunks returned by a search tool. AQA

Use AQA with the retriever service Learning more Corpus < Document < Chunk hierarchy. Pass a reference to a document or corpus to the generate_answer function. Corpus Document Chunk AQA

from google.generativeai import retriever my_corpus = retriever.create_corpus(display_name= "My Documents" ) for document in my_documents: doc = my_corpus.create_document(display_name=document.title) doc.batch_create_chunks(htmlchunker.chunk(document.html)) answer = genai.answer.generate_answer( semantic_retreiver_config=my_corpus, ...) Coming soon

Use AQA with the retriever service Learning more Supports filtering based on chunk metadata Limits 10k documents per corpus 1M chunks per corpus 5 corpora per user AQA

Function Calling Train the Trainer in AI

Function Calling https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/function-calling/sql-talk-app goo.gle/gemini-fn-call-sql-github

Function Calling https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/function-calling/sql-talk-app SELECT ROUND(( COUNT(DISTINCT IF(returned_at IS NOT NULL, order_id, NULL)) / COUNT(DISTINCT order_id)) * 100, 2) AS return_rate FROM thelook_ecommerce.orders goo.gle/gemini-fn-call-sql-github

Function Calling Function Calling Describe external functions to the model. The model may ask you to call the function to help it respond to your queries. Endless possibilities for integrating external tools.

Function calling - Basics Function Calling How? The google.generativeai SDK will inspect the function's type hints to determine the schema. Allowed types are limited: AllowedTypes = ( int | float | str | bool | list | dict ) https://ai.google.dev/tutorials/function_calling_python_quickstart

def multiply(a: float , b: float ): """Returns a * b.""" return a*b model = genai.GenerativeModel( model_name= 'gemini-1.0-pro' , tools=[multiply])

Function calling - Basics Function Calling Because function calling requires alternating turns, it's easiest to use through chat. Enable "automatic function calling" when you start a chat, and the ChatSession will call the function(s) for you. You don't have to use automatic function calling, it just makes simple cases eas ier. https://ai.google.dev/tutorials/function_calling_python_quickstart

chat = model.start_chat( enable_automatic_function_calling= True ) response = chat.send_message( 'I have 57 cats, each owns 44 mittens, ' 'how many mittens is that in total?' )

chat = model.start_chat( enable_automatic_function_calling= True ) response = chat.send_message( 'I have 57 cats, each owns 44 mittens, ' 'how many mittens is that in total?' ) print (response.text) # The number of mittens in total is 2508.

Function calling - Basics Function Calling What happened? Use the chat history to find out. The chat history collects all the function calls and responses that took place. https://ai.google.dev/tutorials/function_calling_python_quickstart

for content in chat.history: part = content.parts[ ] print(content.role, "->" , type(part).to_dict(part))

for content in chat.history: part = content.parts[ ] print(content.role, "->" , type(part).to_dict(part)) # user -> { 'text' : 'I have 57 cats, each owns 44 mittens, ' # 'how many mittens is that in total?' } # model -> { 'function_call' : { 'name' : 'multiply' , # 'args' : { 'a' : 57.0 , 'b' : 44.0 }}} # user -> { 'function_response' : { 'name' : 'multiply' , # 'response' : { 'result' : 2508.0 }}} # model -> { 'text' : ' The number of mittens in total is 2508.' }

Contents Tools Text Function Declaration Function Declaration Text Function Calling interaction Function Call Function Response Text model may predict a function call based on user content model can understand the function response and generate text OR another function call if one or more function declarations are provided, function calling feature will turn on

Function calling - More Examples Function Calling Wikipedia research aid Integrates a search tool. Uses the Gemini API inside the function call to summarize pages. def wikipedia_search(queries:list[str]) -> list[str]: ... https://ai.google.dev/docs/search_reranking_using_embeddings/

model = genai.GenerativeModel( 'gemini-pro' , tools=[wikipedia_search]) chat = model.start_chat(enable_automatic_function_calling=True) query = "Explain how deep-sea life survives." res = chat.send_message(instructions.format(query=query))

model = genai.GenerativeModel( 'gemini-pro' , tools=[wikipedia_search]) chat = model.start_chat(enable_automatic_function_calling=True) query = "Explain how deep-sea life survives." res = chat.send_message(instructions.format(query=query)) # Searching for "How do deep-sea creatures survive the extreme pressure?" # Related search terms: [ 'Deep sea' , 'Deep-sea community' , 'Deep-sea fish' ] # Fetching page: "Deep sea" # Information Source: https://en.wikipedia.org/wiki/Deep_sea # Fetching page: "Deep-sea community" # Information Source: https://en.wikipedia.org/wiki/Deep-sea_community # Fetching page: "Deep-sea fish" # Information Source: https://en.wikipedia.org/wiki/Deep-sea_fish # Searching for "How do deep-sea creatures survive the cold temperatures?" # Related search terms: [ 'Deep-sea community' , 'Deep sea' , 'Deep-water coral' ] # Fetching page: "Deep-water coral" # Information Source: https://en.wikipedia.org/wiki/Deep-water_coral # Searching for "What adaptations do deep-sea creatures have to survive?" # Related search terms: [ 'Deep sea' , 'Deep-sea community' , 'Deep-sea fish' ] # Searching for "What are the unique characteristics of deep-sea creatures?" # Related search terms: [ 'Deep-sea community' , 'Deep-sea fish' , 'Deep sea' ] # Searching for "How do deep-sea creatures find food?" # Related search terms: [ 'Deep-sea community' , 'Deep-sea fish' , 'The Deep (TV series)' ] # Fetching page: "The Deep (TV series)" # Information Source: https://en.wikipedia.org/wiki/The_Deep_(TV_series) # Searching for "How do deep-sea creatures communicate with each other?"

Function calling - More Examples Function Calling SQL Talk Use function calling to talk to a database. Live example: https://goo.gle/gemini-fn-call-sql https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/function-calling/sql-talk-app

sql_query_func = FunctionDeclaration( name= "sql_query" , description= "Get information from data in BigQuery using SQL queries" , parameters={ "type" : "object" , "properties" : { "query" : { "type" : "string" , "description" : " SQL query on a single line ...

Function Calling https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/function-calling/sql-talk-app SELECT ROUND(( COUNT(DISTINCT IF(returned_at IS NOT NULL, order_id, NULL)) / COUNT(DISTINCT order_id)) * 100, 2) AS return_rate FROM thelook_ecommerce.orders

Function calling - Schema Function Calling Automatically building the schema from the type hints doesn't [currently 02/2024] work for everything. The allowed types are actually: AllowedType= ( int | float | str | bool | list['AllowedType'] | dict[str, 'AllowedType'] ) https://ai.google.dev/tutorials/function_calling_python_quickstart

Function calling - Schema Function Calling Let's look at how the schema is constructed. https://ai.google.dev/tutorials/function_calling_python_quickstart

model = genai.GenerativeModel( model_name= 'gemini-1.0-pro' , tools=[multiply]) model._tools.to_proto()

[function_declarations { name: "multiply" description: "returns a * b." parameters { type_: OBJECT properties { key: "b" value { type_: NUMBER }} properties { key: "a" value { type_: NUMBER }} required: "a" required: "b" }}]

Function calling - Schema Function Calling It's an OpenAPI schema, written as a protobuf . The protobuf-classes are available in the google.ai.generativelanguage client library. Reference docs: https://ai.google.dev/api/python/google/ai/generativelanguage/FunctionDeclaration https://ai.google.dev/tutorials/function_calling_python_quickstart

import google.ai.generativelanguage as glm calculator = glm.Tool( function_declarations=[ glm.FunctionDeclaration( name= 'multiply' , description= "Returns the product of two numbers." , parameters=glm.Schema( type=glm.Type. OBJECT , properties={ 'a' : glm.Schema(type=glm.Type. NUMBER ), 'b' : glm.Schema(type=glm.Type. NUMBER )}, required=[ 'a' , 'b' ]))])

Function calling - Schema Function Calling They can be written out as JSON-compatible objects as well. https://ai.google.dev/tutorials/function_calling_python_quickstart

calculator = { 'function_declarations' : [{ 'name' : 'multiply' , 'description' : 'Returns the product of two numbers.' , 'parameters' : { 'type' : 'OBJECT' , 'properties' : { 'a' : { 'type' : 'NUMBER' }, 'b' : { 'type' : 'NUMBER' }}, 'required' : [ 'a' , 'b' ]}}]}

model = genai.GenerativeModel( model_name= 'gemini-1.0-pro' , tools=[calculator])

Function calling - Structured data Function Calling Structured data extraction. You can just ask the model to do it and return JSON. https://ai.google.dev/tutorials/structured_data_extraction

response = model.generate_content(textwrap.dedent( "" "\ Please return JSON describing the the people, places, things and relationships from this story using the following schema: { "people" : list[ PERSON ], "places" :list[ PLACE ], "things" :list[ THING ], "relationships" : list[ RELATIONSHIP ]} PERSON = { "name" : str, "description" : str, "start_place_name" : str, "end_place_name" : str} PLACE = { "name" : str, "description" : str} THING = { "name" : str, "description" : str, "start_place_name" : str, "end_place_name" : str} RELATIONSHIP = { "person_1_name" : str, "person_2_name" : str, "relationship" : str} Here is the story: "" ") + story)

Function calling - Structured data Function Calling Asking for JSON often works. Function calling lets you strictly describe the schema. With a strict description, we can strictly enforce that that's what gets returned. https://ai.google.dev/tutorials/structured_data_extraction

add_to_database = glm.FunctionDeclaration( name= "add_to_database" , description="Adds entities to the database. " , parameters=glm.Schema( type=glm.Type. OBJECT , properties = { 'people' : glm.Schema( type=glm.Type. ARRAY , items=glm.Schema( type = glm.Type. OBJECT , properties = { 'name' : glm.Schema(type=glm.Type. STRING ), 'description' : glm.Schema(type=glm.Type. STRING ), 'start_place_name' : glm.Schema(type=glm.Type. STRING ), 'end_place_name' : glm.Schema(type=glm.Type. STRING )})), 'places' : glm.Schema( ...

add_to_database = glm.FunctionDeclaration( name= "add_to_database" , description= "Adds entities to the database." , parameters={ "type" : "OBJECT" , "properties" : { 'people' : { "type" : "ARRAY" , "items" : { "type" : "OBJECT" , "properties" : { 'name' : { "type" : "STRING" }, 'description' : { "type" : "STRING" }, 'start_place_name' : { "type" : "STRING" }, 'end_place_name' : { "type" : "STRING" }})), 'places' : {...}, ...

model = model = genai.GenerativeModel( model_name= 'gemini-1.0-pro' , tools = [ add_to_database ])

Function calling - Structured data Function Calling Why? Function calling doesn 't return a string. Function calling returns data-structures: You don't parse text. https://ai.google.dev/tutorials/structured_data_extraction

response = model.generate_content(f """ Please add the people, places, things and relationships from this story to the database: {story} """ ) fc = response.candidates[ ].content.parts[ ].function_call pprint.pprint(type(fc).to_dict(fc))

{ 'name' : 'add_to_database' , 'args' : { 'people' : [{ 'description' : 'A curious girl with wild hair of ' 'midnight blue and eyes that shone like... , 'name' : 'Luna' , ... 'places' : [{ 'description' : 'Nestled between rolling hills and ' 'sparkling rivers, where Luna and her ' 'family lived.' , 'name' : 'Small town' }], 'relationships' : [{ 'person_1_name' : 'Luna' , 'person_2_name' : 'Elara' , 'relationship' : 'Mother-daughter' }, { 'person_1_name' : 'Luna' , 'person_2_name' : 'Orion' , 'relationship' : 'Father-daughter' }], 'things' : [{ 'description' : 'A dusty, forgotten backpack that Luna ' 'found in the attic, which possessed a...' , 'end_place_name' : None, 'name' : 'Backpack' , 'start_place_name' : 'Attic' }, ...

Train the Trainer in AI Image understanding

Image understanding Multimodality Images are just tokens in the input Can be used for instructions, context or query subject

import google.generativeai as genai # Use the Gemini vision model. PRO_VISION = 'models/gemini-pro-vision' model = genai.GenerativeModel(PRO_VISION)

!wget -O instrument.jpg -q https://goo.gle/instrument-img import PIL.Image img = PIL.Image. open ( 'instrument.jpg' ) # Preview the image (thumb := img.copy()).thumbnail(( 200 , 200 )) thumb

response = model.generate_content( [ 'What instrument is this?' , img, 'What kinds of music would use it?' ]) print (response.text) This is a pipe organ. It is a musical instrument that produces sound by driving pressurized air through pipes. Organs are often used in churches, concert halls, and other large venues. They can be used to play a wide variety of music, from classical to contemporary.

Structured data extraction Image conditioning RAG Image understanding Multimodality

Image conditioning Multimodality Generate text or structured data from images

AI Studio Endpoints Images Structured Data

AI Studio Endpoints Images Structured Data

AI Studio Endpoints Images Structured Data

import enum from pydantic import BaseModel class DogApplication (BaseModel): """Represents an instance of a completed dog-walking application form.""" dog: Dog owner: Customer schedule: Schedule class Dog (BaseModel): name: str breed: str color: str age_years: float temperament: Temperament class Temperament ( str , enum.Enum): FRIENDLY = 'friendly' SHY = 'shy' AGGRESSIVE = 'aggressive' class Customer (BaseModel): name: str phone: str class Schedule (BaseModel): regular: bool days_and_times: list [ str ]

from llama_index.multi_modal_llms import GeminiMultiModal from llama_index.multi_modal_llms.generic_utils import load_image_urls from llama_index.program import MultiModalLLMCompletionProgram from llama_index.output_parsers import PydanticOutputParser images = load_image_urls([ 'https://goo.gle/dog-form-img' ]) prompt = ( "Please extract each of the fields in this " "form and provide a JSON representation: " ) # Load the Gemini Pro Vision model. gemini_llm = GeminiMultiModal(api_key=GOOGLE_API_KEY, model_name=PRO_VISION)

# Build a callable that will map the images to the Pydantic schema. image_to_application = MultiModalLLMCompletionProgram.from_defaults( output_parser=PydanticOutputParser( DogApplication ), image_documents=images, prompt_template_str=prompt, multi_modal_llm=gemini_llm, ) # Image ORM! form = image_to_application() assert isinstance (form, DogApplication ) assert form.dog.name == 'Fido' assert form.owner.name == 'Mark' assert form.schedule.regular is True assert 'Wednesday Morning' in form.schedule.days_and_times

pprint(form. dict ()) { 'dog' : { 'age_years' : 3.0 , 'breed' : 'Cavoodle' , 'color' : 'Black & Tan' , 'name' : 'Fido' , 'temperament' : <Temperament.FRIENDLY: 'friendly' >}, 'owner' : { 'name' : 'Mark' , 'phone' : '0491570006' }, 'schedule' : { 'days_and_times' : [ 'Monday Morning' , 'Tuesday Morning' , 'Wednesday Morning' ], 'regular' : True }}

Tuning Train the Trainer in AI

Tuning In AI Studio Tuning

Tuning Using the API: https://ai.google.dev/tutorials/tuning_quickstart_python Tuning

Tuning Permissions: There are permissions configuration in the API, but it's not integrated into the SDK yet. Tuning

Responsible AI Train the Trainer in AI

Responsible AI
Tags