Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
BaltimoreNISO
856 views
40 slides
Apr 29, 2024
Slide 1 of 40
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
About This Presentation
This presentation was provided by William Mattingly of the Smithsonian Institution, during the fourth segment of the NISO training series "AI & Prompt Design." Session Four: Structured Data and Assistants, was held on April 25, 2024.
1.Importance of Structured Data
2.How to Generate Structured Data from LLMs
3.Importance of Consistency in LLM Outputs
4.How to Generate Consistent Responses
5.Vector Databases and Semantic Search
6.Retrieval Augmented Generation
7.Assistants
Goals
John went to Paris on 1 August 2023.
Named Entity Recognition
John went to Paris on 1 August 2023.
●John => PERSON
●Paris => LOCATION
●1 August 2023 => DATE
Traditional Approaches
●Rules-Based
●Task-Specific Machine Learning Model
Zero Shot NER with LLMs
Structured Data
Structured Data
Types of Data
Important Structures
●CSV
●JSON
●HTML/XML
Important Questions:
1.Should the data be hierarchical (nested).
2.Do I want to preserve the input data? If
so, how?
3.What is the intended usage of the data?
4.How much data will I have (scalability)?
CSV
Comma Separated Value
CSV
JSON
JavaScript Object Notation
JSON
HTML
HyperText Markup Language
HTML
<p>
Not that <span class="person">Belladonna
Took</span> ever had any adventures after she
became Mrs. <span class="person">Bungo
Baggins</span>.
<span class="person">Bungo</span>, that was
<span class="person">Bilbo</span>’s father, built
the most luxurious hobbit-hole for her
(and partly with her money) that was to be found
either under <span class="place">The Hill</span>
or over <span class="place">The Hill</span>
or across <span class="place">The Water</span>,
and there they remained to the end of their days.
</p>
XML
eXtensible Markup Language
XML
<text>
<sentence>
Not that <person>Belladonna Took</person> ever had any
adventures after she became Mrs. <person>Bungo Baggins</person>.
</sentence>
<sentence>
<person>Bungo</person>, that was <person>Bilbo</person>’s
father, built the most luxurious hobbit-hole for her
(and partly with her money) that was to be found either under
<place>The Hill</place> or over <place>The Hill</place>
or across <place>The Water</place>, and there they remained to
the end of their days.
</sentence>
</text>
Exercise 1 (10 min): Generate Structured
Data Output for “John went to Paris on 1
August 2023.”
Importance of Structured Output
Exercise 2 (10 min): Create your Own Texts
and Try to get the Same Output each time,
first in the same chat, then in different chats.
Few-Shot NER.
Practical Applications with Real World Data
An ANCYL member who was shot
and severely injured by SAP
members at Lephoi, Bethulie,
Orange Free State (OFS) on 17
April 1991. Police opened fire on a
gathering at an ANC supporter's
house following a dispute between
two neighbours, one of whom was
linked to the ANC and the other to
the SAP and a councillor.
Assistants
Vector Databases
Representing
Texts
Digitally
Embeddings
●The apple is in the tree.
○1-[0.01234, -0.23456, 0.87654,
0.45678, -0.56123, 0.65432,
0.12345, -0.77123, 0.08456,
0.34567, ...]
○2-different vector
○3-different vector
○4-different vector
○1-[0.01234, -0.23456, 0.87654,
0.45678, -0.56123, 0.65432,
0.12345, -0.77123, 0.08456,
0.34567, ...]
○5-different vector
Vector
Database
What is it?
●It holds vectors in a database
as storage.
●Similar vectors are stored
closer.
Vector
Database
How do we use a vector
database?
●We populate a vector database
with by using a machine
learning model to vectorize
data and send them to the
database.
Vector
Database
Why use a vector database?
Vector
Database
Why use a vector database?
●Vector databases allow users
to store vector data in a way
that allows users to query it
and find similarity based on a
vector-level similarity, rather
than explicit human-defined
similarity.
Vector
Database
What is it?
●A vector database holds
numerous vectors or
embeddings of data.
Sometimes, the database will
also store the original data
alongside these vectors.
Vector Database Stacks
Vector Database Stacks
Vector Database
Stacks
What is available to us?
●Python, Annoy, Streamlit
○Cheap, easy to deploy, great for
smaller datasets, but requires a
little bit of knowledge to build from
scratch
○Best for smaller databases (under
10,000 data)
●Python, txtAI
○Cheap and easy to use, more
resource intensive but easy to
deploy
○Allows for easy interpretability (via
highlighting)
Multi-Modal
How does it work?
Retrieval-Augmented Generation
How tall is Wookie?
How tall is Wookie?
RAG
What is it?
●RAG allows for you to combine
the strengths of large language
models (LLMs) with vector
databases
●It limits the chances for an LLM
to hallucinate (generate fake
information)
●It uses a vector database to
find relevant material to a query
RAG
What is it?
●RAG allows for you to combine
the strengths of large language
models (LLMs) with vector
databases
●It limits the chances for an LLM
to hallucinate (generate fake
information)
●It uses a vector database to
find relevant material to a query
1
2
3
4
5 6