Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"

BaltimoreNISO 677 views 33 slides May 16, 2024

Slide 1 of 33

About This Presentation

This presentation was provided by William Mattingly of the Smithsonian Institution, during the seventh segment of the NISO training series "AI & Prompt Design." Session 7: Open Source Language Models, was held on May 16, 2024.

Size: 977.34 KB

Language: en

Added: May 16, 2024

Slides: 33 pages

Slide Content

Prompt Design LLMs with Text Classification and Open Source

GPT-4o Multimodal LLMs Vector Databases and Semantic Search What is Text Classification? How is it useful? Traditional Approaches LLMs and Text Classification Open Source LLMs Goals

GPT-4o

GPT-4o A New Model Pricing: GPT-4o is 50% cheaper than GPT-4 Turbo, coming in at $5/M input and $15/M output tokens). Rate limits: GPT-4o’s rate limits are 5x higher than GPT-4 Turbo—up to 10 million tokens per minute. Speed: GPT-4o is 2x as fast as GPT-4 Turbo. Vision: GPT-4o’s vision capabilities perform better than GPT-4 Turbo in evals related to vision capabilities. Multilingual: GPT-4o has improved support for non-English languages over GPT-4 Turbo. GPT-4o currently has a context window of 128k and has a knowledge cut-off date of October 2023.

GPT-4o A New Model Released This week Purely Multimodal Exceptionally fast (low latency) Cheaper Available via the API and Chat

GPT-4o Multimodal “GPT-4o is OpenAI's new flagship model that can reason across audio , vision , and text in real time.” - OpenAI’s Docs

GPT-4o Multimodal Text, Audio, and Video are all vectorized by the same model and treated the same way. In other words, a text that describes a beach would be very similar in vector space to an image of a beach.

Vector Databases and Semantic Search

Representing Texts Digitally Embeddings The apple is in the tree. 1-[0.01234, -0.23456, 0.87654, 0.45678, -0.56123, 0.65432, 0.12345, -0.77123, 0.08456, 0.34567, ...] 2-different vector 3-different vector 4-different vector 1-[0.01234, -0.23456, 0.87654, 0.45678, -0.56123, 0.65432, 0.12345, -0.77123, 0.08456, 0.34567, ...] 5-different vector

Vector Database What is it? It holds vectors in a database as storage. Similar vectors are stored closer.

Vector Database How do we use a vector database? We populate a vector database with by using a machine learning model to vectorize data and send them to the database.

Vector Database Why use a vector database?

Vector Database Why use a vector database? Vector databases allow users to store vector data in a way that allows users to query it and find similarity based on a vector-level similarity, rather than explicit human-defined similarity.

Vector Database What is it? A vector database holds numerous vectors or embeddings of data. Sometimes, the database will also store the original data alongside these vectors.

Vector Database Stacks

Vector Database Stacks What is available to us? Python, Annoy, Streamlit Cheap, easy to deploy, great for smaller datasets, but requires a little bit of knowledge to build from scratch Best for smaller databases (under 10,000 data) Python, txtAI Cheap and easy to use, more resource intensive but easy to deploy Allows for easy interpretability (via highlighting)

Multi-Modal How does it work?

Text Classification

Text Classification Overview Assign a text to a specific category or categories. Categories == labels.

Text Classification Emails "Congratulations! You've won a $1,000 Walmart gift card. Click here to claim your prize." "Limited time offer: Buy one get one free on all items in our store." "Dear customer, your account has been temporarily suspended. Please update your information to restore access."

Text Classification Sentiment "I love this product! It works exactly as described." "The product arrived late and was damaged. Very disappointed." "It's okay, not great but not terrible either." "Excellent service and quick delivery. Highly recommend!"

Text Classification Types Binary Classification Multiclass Classification Multilabel Classification Hierarchical Classification

Text Classification Binary Classification Classifies text into one of two categories. Spam detection in emails, where emails are classified as either "spam" or "not spam."

Text Classification Multiclass Classification Classifies text into one of three or more categories. Sentiment analysis with categories such as "positive," "negative," and "neutral."

Text Classification Multilabel Classification Assigns multiple (or single) labels to a single text instance, where each label represents a different category. News categorization where an article can belong to multiple categories such as "politics," "economy," and "health."

Text Classification Hierarchical Classification Classifies text into a hierarchy of categories, where categories are structured in a tree-like hierarchy. Document classification in a library where documents are classified into categories like "science," "arts," "technology," with subcategories under each (e.g., "science" can have "physics," "chemistry," "biology").

Open Source Machine Learning

Open Source ML Overview Open source machine learning, like open source software (OSS), is driven by the public. It has several components: open source datasets, open source machine learning models, and open source applications. The best resource: HuggingFace

Open Source ML Datasets Datasets for training task-specific models NER Text Classification Image Classification Object Detection Datasets for training language models Unannotated collections of texts Dataset Cards Task Language Biases

Open Source ML Models Trained Machine Learning Models for specific Tasks NER Text Classification Image Classification Object Detection ASR HTR OCR Trained machine learning language models (including LLMs) Dataset Cards Task Language Biases

Open Source ML Benefits and Limitations Benefits Open, meaning they are freely available to use (though sometimes with commercial limitations) Publicly Critiqued Understanding of the Data Limitations Closed models are better in many cases (BUT!!! That gap is closing).

Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Mattingly &quot;AI and Prompt Design: LLMs with Text Classification and Open Source&quot;

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx

Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"