Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

BaltimoreNISO 856 views 40 slides Apr 29, 2024
Slide 1
Slide 1 of 40
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40

About This Presentation

This presentation was provided by William Mattingly of the Smithsonian Institution, during the fourth segment of the NISO training series "AI & Prompt Design." Session Four: Structured Data and Assistants, was held on April 25, 2024.


Slide Content

Prompt Design
04: Structured Data, Assistants, & RAG

1.Importance of Structured Data
2.How to Generate Structured Data from LLMs
3.Importance of Consistency in LLM Outputs
4.How to Generate Consistent Responses
5.Vector Databases and Semantic Search
6.Retrieval Augmented Generation
7.Assistants
Goals

John went to Paris on 1 August 2023.

Named Entity Recognition
John went to Paris on 1 August 2023.
●John => PERSON
●Paris => LOCATION
●1 August 2023 => DATE

Traditional Approaches
●Rules-Based
●Task-Specific Machine Learning Model

Zero Shot NER with LLMs

Structured Data

Structured Data
Types of Data
Important Structures
●CSV
●JSON
●HTML/XML

Important Questions:
1.Should the data be hierarchical (nested).
2.Do I want to preserve the input data? If
so, how?
3.What is the intended usage of the data?
4.How much data will I have (scalability)?

CSV
Comma Separated Value

CSV

JSON
JavaScript Object Notation

JSON

HTML
HyperText Markup Language

HTML
<p>
Not that <span class="person">Belladonna
Took</span> ever had any adventures after she
became Mrs. <span class="person">Bungo
Baggins</span>.
<span class="person">Bungo</span>, that was
<span class="person">Bilbo</span>’s father, built
the most luxurious hobbit-hole for her
(and partly with her money) that was to be found
either under <span class="place">The Hill</span>
or over <span class="place">The Hill</span>
or across <span class="place">The Water</span>,
and there they remained to the end of their days.
</p>

XML
eXtensible Markup Language

XML
<text>
<sentence>
Not that <person>Belladonna Took</person> ever had any
adventures after she became Mrs. <person>Bungo Baggins</person>.
</sentence>
<sentence>
<person>Bungo</person>, that was <person>Bilbo</person>’s
father, built the most luxurious hobbit-hole for her
(and partly with her money) that was to be found either under
<place>The Hill</place> or over <place>The Hill</place>
or across <place>The Water</place>, and there they remained to
the end of their days.
</sentence>
</text>

Exercise 1 (10 min): Generate Structured
Data Output for “John went to Paris on 1
August 2023.”

Importance of Structured Output

Exercise 2 (10 min): Create your Own Texts
and Try to get the Same Output each time,
first in the same chat, then in different chats.

Few-Shot NER.

Practical Applications with Real World Data
An ANCYL member who was shot
and severely injured by SAP
members at Lephoi, Bethulie,
Orange Free State (OFS) on 17
April 1991. Police opened fire on a
gathering at an ANC supporter's
house following a dispute between
two neighbours, one of whom was
linked to the ANC and the other to
the SAP and a councillor.

Assistants

Vector Databases

Representing
Texts
Digitally
Embeddings
●The apple is in the tree.
○1-[0.01234, -0.23456, 0.87654,
0.45678, -0.56123, 0.65432,
0.12345, -0.77123, 0.08456,
0.34567, ...]
○2-different vector
○3-different vector
○4-different vector
○1-[0.01234, -0.23456, 0.87654,
0.45678, -0.56123, 0.65432,
0.12345, -0.77123, 0.08456,
0.34567, ...]
○5-different vector

Vector
Database
What is it?
●It holds vectors in a database
as storage.
●Similar vectors are stored
closer.

Vector
Database
How do we use a vector
database?
●We populate a vector database
with by using a machine
learning model to vectorize
data and send them to the
database.

Vector
Database
Why use a vector database?

Vector
Database
Why use a vector database?
●Vector databases allow users
to store vector data in a way
that allows users to query it
and find similarity based on a
vector-level similarity, rather
than explicit human-defined
similarity.

Vector
Database
What is it?
●A vector database holds
numerous vectors or
embeddings of data.
Sometimes, the database will
also store the original data
alongside these vectors.

Vector Database Stacks

Vector Database Stacks

Vector Database
Stacks
What is available to us?
●Python, Annoy, Streamlit
○Cheap, easy to deploy, great for
smaller datasets, but requires a
little bit of knowledge to build from
scratch
○Best for smaller databases (under
10,000 data)
●Python, txtAI
○Cheap and easy to use, more
resource intensive but easy to
deploy
○Allows for easy interpretability (via
highlighting)

Multi-Modal
How does it work?

Retrieval-Augmented Generation

How tall is Wookie?

How tall is Wookie?

RAG
What is it?
●RAG allows for you to combine
the strengths of large language
models (LLMs) with vector
databases
●It limits the chances for an LLM
to hallucinate (generate fake
information)
●It uses a vector database to
find relevant material to a query

RAG
What is it?
●RAG allows for you to combine
the strengths of large language
models (LLMs) with vector
databases
●It limits the chances for an LLM
to hallucinate (generate fake
information)
●It uses a vector database to
find relevant material to a query
1
2
3
4
5 6