Oracle Database 23ai: AI Vector Search: The What, The How, & The Possiblities
Size: 52.85 MB
Language: en
Added: May 10, 2024
Slides: 55 pages
Slide Content
Oracle 23ai: AI Vector Search: The What, The How, & The Possibilities Will start about 3 minutes after the hour.
Webinar Keeping Use Q&A dialog to ask questions. Will answer as pauses occur or at end of webinar.
Speaker Bobby Curtis, MBA Atlanta, GA Author Speaker @ dbasolved | @ rheodatallc https:// dbasolved.com | https:// www.rheodata.com /blog [email protected] https:// www.slideshare.net / BobbyCurtisMBA
Cloud Managed Services Provider (CMSP) 24/7 On-Call Support Data and Cloud Migrations Near-Zero Downtime Data Engineering Analytics, Machine Learning, Artificial Intelligence (Oracle Digital Assistants) License Management On-Premises & Cloud Cost Management Data Modernization Data Visualization, Cyber Security, Data Warehouse, etc. Service Offerings: www.rheodata.com | www.rheodata.ai Co-Managed Fully-Managed Professional Services
Infrastructure Data Integration Data Engineering & App development Applications Database (Oracle /MySQL/PostgreSQL) (ADW, ATP, ExaCC , etc.) Query Performance Data Warehouse (Autonomous Database/MySQL Heatwave) Cloud IaaS/PaaS Automation (Terraform) GoldenGate (on-premises & oci ) Oracle Integration Cloud (OIC) Oracle Data Integrator (ODI) Machine Learning (ML) & Artificial Intelligence (AI) Data Catalog Low Code (Apex) Oracle Analytics Cloud Oracle Technology Stacks: Security Data Lakehouse HCM/PeopleSoft EBS Primavera Engineered Systems (Exadata/ODA)
Prerequisites Oracle Database 23c 23ai oracle-database-free-23c-1.0-1.el8.x86_64 (23.4 ((LA1)) oracle-database-ee-23c-1.0-1.el8.x86_64 (23.4(LA2)) OCI Oracle Database Service On-premises ( here ) - Free Python 3.11/3.12 python- oracledb oracledb-2.2.0 Langchain https:// python.langchain.com /docs/ get_started /introduction LLM API Key Cohere, ChatGPT, etc.
The What?
Vector Database are a new classification of specialized databases designed for AI workloads that allows you to query based on semantics, rather than keywords.
Vector Search enhances information retrieval by mapping queries to relevant data in your database based on semantics, instead of precise matches, using vectors to measure similarity.
45 68 16 50 42 Vector Vectors are a sequence of numbers, called dimensions, used to capture the important “features” of the data. Vectors in AI represent semantics of unstructured data such as images, documents, videos, etc.
45 68 16 50 42 Vector The mathematical distance between two vectors indicate how similar they are Vectors represent the semantic content of data, not the underlying words or pixels
Input Data Object Embedding Model 4 8 42 72 Vector Search Relevant Content Data Object Retrieval Vector ID Matches Vector Embedding
The How? -> The Math
The Math? Euclidean/Squared Euclidean Distance Cosine Similarity Reflects the distance between each of the vector coordinates being compared -> straight line distance between vectors Measures the cosine of the angle between two vectors.
The Math? DOT Product Similarity Manhattan Distance Similarity of two vectors are viewed by multiplying the size of each vector by the cosine of their angle. Calculated by summing the distance between the coordinates of two vectors.
The Math? Hamming Distance The distance between two vectors represent the number of coordinates where they differ.
The How? -> Oracle
Vector data types can be declared in the following forms: Vector Datatype Vector Arbitrary number of dimensions and format Vector (*,*) Vector = Vector(*,*) Vector ( number_of_dimensions , *) Vectors must all have the specified number of dimensions, or an error is thrown. No format modification. Vector(*, dimension_element_format ) Arbitrary number of dimensions, but format will be up-converted or down-converted to the specific dimension (INT8, FLOAT32, FLOAT64). Vector( number_of_dimensions , dimension_element_format ) Must have all the specified number of dimensions, or error thrown. Each will be up-converted or down-converted to the specified format (INT8, FLOAT32, FLOAT64). Vectors can be NULL but dimensions cannot: [1.1, NULL, 2.2] Same Thing
Vector Base Operations TO_VECTOR || VECTOR Constructors for vectors FROM_VECTOR || VECTOR_SERIALIZE Functions take a vector and return as string in VARCHAR2 or CLOB as output VECTOR_NORM Function returns the Euclidean norm of the vector in BINARY_DOUBLE (SQRT(SUM((xi- yi )2)) VECTOR_DIMENSION_COUNT Function returns the number of dimensions of a vector as an Oracle number VECTOR_DIMENSION_FORMAT Returns the storage format of the vector in VARCHAR2 format (INT8, FLOAT32, FLOAT64) VECTOR_DISTANCE * Function returns Squared Euclidean distance. Default for Oracle AI Vector Search There are a few base functions that are used with Vectors .
Declare vector datatypes during table creation. Example 1: CREATE TABLE text_vec ( id NUMBER, embeddings VECTOR ); Example 2: CREATE TABLE text_vec ( id NUMBER, embeddings VECTOR(384, INT8) );
Vector datatypes can be added to tables later as well. Example 1: CREATE TABLE text_vec ( id NUMBER, col1 varchar2(50), ); ALTER TABLE text_vec add ( vec vector );
Python Approach – 1 of 4 import oracledb import cohere import os from langchain . document_loaders import TextLoader from langchain . text_splitter import CharacterTextSplitter
Python Approach – 2 of 4 with open ( "./Scripts/oracle/vector/ cities.txt " ) as f : cities = f . read () #print(cities) text_splitter = CharacterTextSplitter ( separator = " \n " , chunk_size = 35 , chunk_overlap = 5 , length_function = len , is_separator_regex = False ) texts = text_splitter . split_text ( cities )
Python Approach – 3 of 4 api_key = ”API Key" os . environ [ "COHERE_API_KEY" ] = api_key print ( os . environ [ "COHERE_API_KEY" ]) co = cohere . Client ( os . getenv ( "COHERE_API_KEY" )) Cohere offers two types of API Keys: Development (Limit 10 calls per minute) Production (unlimited calls per minute)
Python Approach – 3.5 of 4 def database_connection (): try : connection = oracledb . connect ( user = "vector" , password = "vector" , dsn = "100.130.230.100:1521/freepdb1" ) print ( 'connected’ ) return connection except : print ( 'Could not make a connection' )
Create Vectors select text from vector . text_vec order by vector_distance ( txt_vec ,( select txt_vec from vector . text_vec_lookup where id = :1 ), DOT ) fetch first :2 rows only ;
Similarly Searches (Demo)
Similarly Searches (Demo) Explain Plan No indexes Selected 43 rows in milliseconds – all text New webinar: Vector Indexes, coming soon.
How do I embed a LLM within the database?
ONNX Files ONNX = Open Neural Network Exchange A n open-source format designed for machine-learning models. Ensuring cross-platform compatibility and supports major languages and frameworks, facilitating easy and efficient model exchanges.
Python Approach – 1 of 5 $ export ORACLE_HOME_23ai=/opt/oracle/product/23ai/dbhome_1 $ cd $ORACLE_HOME_23ai/python/bin $ python -V $ export PATH=$ORACLE_HOME_23ai/python/bin:$PATH $ python -v Verify that the correct Python is installed. Python 3.12 (min)
Python Approach – 2 of 5 $ cd ~ $ mkdir onnx Make directory for all ONNX files
Python Approach – 3 of 5 $ cd ~/ onnx $ unzip ./ omlutils.zip -d Change directory to ONNX directory Unzip OMLUTILS.zip file
Python Approach – 4 of 5 $ cd ~/ onnx $ python -m pip install -r requirements.txt $ python -m pip install omlutils-0.13.0-cp312-cp312-linux_x86_64.whl Install all the necessary requirments Install OMLUTILS wheel file
Python Approach – 5 of 5 $ cd ~/ onnx $ python >>> from omlutils import EmbeddingModel , EmbeddingModelConfig >>> em = EmbeddingModel ( model_name ="sentence-transformers/all-MiniLM-L6-v2”) >>> em.export2file("all-MiniLM-L6-v2",output_dir=".”) >>> exit() $ Install requested LLM into database using Python Interactive Prompt View model in Oracle Data Dictonary
New Database Views *_DATA_MODELS SELECT * FROM *_DATA_MODELS;
Using the Embedded Model (TO_VECTOR & VECTOR_EMBEDDING) SELECT to_vector ( vector_embedding ( VECTOR . DOC_MODEL using url as data )) as embedding from ggai.base ;
Updating existing table with Vectors update ggai . base_url set urlvec = ( SELECT to_vector ( vector_embedding ( VECTOR.DOC_MODEL using url as data )) from ggai . base_url where id = 1 ) where client = upper ( 'test' );
The Possibilities?
The possibilities of Oracle Vector Search are going to drive growth in everything!
Retrieval Augmented Generation (RAG) A technique for enhancing the accuracy and reliability of generative AI models with facts from external sources
Input Data Object Embedding Model 4 8 42 72 Vector Search Relevant Content Data Object Retrieval Vector ID Matches Vector Embedding
Use Cases: Search (known use case) Natural-Language Processing/Search (semantic search) Recommendation Systems (retail, customer service) Biometric and Anomaly Detection (healthcare) Drug discovery and genomics (healthcare) Patient similarity Analysis (healthcare) Personalized Digital Assistants
Private data shared via Vector databases/data type need to be secure. Organizations should evaluate what data is vectorized (shared) before using with public LLMs. Consider building private LLMs to keep data secure.
Questions
Next webinars! June 13, 2024 @ 1 pm EST: Oracle GoldenGate 23ai: What is new and how to upgrade June 27, 2024 @ 1 pm EST (tentative): Oracle Database 23ai: Vector Indexes – What and how to use