David Michels: DevOps My AI at AWS Community Day Midwest 2024

awschicago 78 views 25 slides Jun 26, 2024
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

AWS Community Day Midwest 2024
David Michels
DevOps My AI


Slide Content

Always questioning.
Always solving.
DevOps My AI

AI is software…
…it should be treated as such

DevOps in AI/ML
•AIOps – Analytics and AI Operations
•MLOps – Machine Learning Operations
•LLMOps – Large Language Model Operations
•Similar fundamental premise typically differ in:
•Data Sources and Types (structured/unstructured, text/numeric)
•Model Development/Training/Deployment
•Have same foundational ideals of DevOps

Two Types of AI/ML Models
•Deterministic
•Regression - Numeric output
•Classification - Grouping on unstructured enumerated data
•Clustering - Grouping on structured enumerated data
•Non-deterministic
•Unstructured output
•Free form text, Images, Audio
•Natural Language Processors – NLP
•Repeated Execution with same input does NOT yield exact
same output
This is Generative AI

“Normal” CI/CD Process

Gen AI Model CI/CD Process
X
X
Choose Model/Framework
Train/Fine Tune

The “Build”/Train Phase…
Data is critical. A model’s accuracy and
behavior is reflective of the data that it is
trained on.

CI/CD - Build
•Normal CI/CD “builds” compile/package for and test/deploy
•Usually measured in minutes or 10s of minutes
•“Building” a LLM is about training
•LLM training is measured in days
•LLM fine tuning typically requires hours
•NOT conducive to

CI/CD Data
•CI/CD is about optimizing for pipeline execution
•Lightweight/smaller datasets
•Generated/Non-production
•Just cover the data you need to test against
•Ideally ephemeral (can be easily created/destroyed)
•That is contradictory to AI/ML model training
•Typically, VERY large and diverse datasets
•NEEDS to be production (or “real”) data for accurate responses
•Not feasible to be ephemeral

The “Test” Phase…
You must test something that doesn’t yield
the same response for executions with
same input. Possible input/output
combinations are exceptionally large.

AWS Services for Gen AI
•SageMaker - End-to-End offering for
AI/ML
•Bedrock – Fully managed service for
“foundational models”

Bedrock/SageMaker - Pipelines
•Can be coordinated via external CI/CD platforms
•Github Actions
•Gitlab
•AWS CodePipeline
•etc
•Cloudformation, CDK, along with external IaC provider support
•Testing requires deployment

SageMaker
•Development, Training and Inference with SageMaker Studio
•Data prep and management tooling – Wrangler, Spark/EMR,
Ground Truth, etc
•Build-in Algorithms and Frameworks
•Foundational and custom models (supports “BYOM”)
•“Pipelining”
•Comprehensive machine learning lifecycle support
•(i.e. Need to customize some facet of the LLM lifecycle)

SageMaker - Pipelines
•Facilitate automation of complex steps
•Training (or Pretraining) models
•Tuning existing models
•Running checks (model/data bias, quality)
•Packaging/versioning/deploying model
•Fairly involved
•Understanding of “data science-y stuff” for building/testing
models

SageMaker Pipelines

Bedrock
•Fully managed service for deployment of foundational models
•Supports Serverless
•Facilitates fine-tuning (via S3 bucket)
•Model Evaluation
•Guardrails – Denied topics, content filters, etc…
•No ”BYOM”
•Knowledge Base – managed service for RAG applications

RAG Applications
•Retrieval Augmented Generation architecture
•Specific data to be searched and added to model context
•“Cue Cards” with critical points for the model
•Augments context in a meaningful way with *your* data
•Product Catalog
•User Manual
•Employee Handbook
•Vector store maintained separately as part of the application

Knowledge Base
•Managed service for RAG applications
•Managed Vector stores
•Facilities for “hydrating” vector stores – chunking,
embeddings, loading into store
•More efficient way to include custom data than fine tuning
•Larger and larger model context window make more
appealing
•Can include/handle larger datasets from vector store

Typical RAG Application Flow

Benefits
•Little or often NO fine-tuning or training required
•Large number of available foundational models
•Many specifically trained
•Much faster to maintain data in vector store
•Testing/verification can be setup to be ”portable”
•More cost effective:
•Training is time consuming and expensive
•Maintaining model versions
•Specific skills needed

RAG Application Pipeline

Data Approach
•CI/CD with production data
•Access management across environments
•Algorithm development for Data Scientists
•Auditing

Data Approach

Other Areas to Consider
•“Click-Ops” can be pervasive in this space
•Difficult to unwind if much is built on top of it
•Data and strategy is critical
•Models need “real” data conflicts with data security
•Pipeline execution environments need for training
•Testing approach is very different
•Minimal interfaces to test, gazillion possible
inputs/outputs
•Training vs Fine-Tuning vs RAG

Questions?