Fueling AI with Great Data with Airbyte Webinar

chloewilliams62 131 views 44 slides Jun 13, 2024

Slide 1 of 44

About This Presentation

This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.

Size: 5.09 MB

Language: en

Added: Jun 13, 2024

Slides: 44 pages

Slide Content

Feuling AI ??????
with Great Data
Staff Software
Engineer
AJ Steers
Jun 13, 2024

•Introduction to GenAI Data Sourcing
•Building Solid Foundations with ELTP
•Data from Anywhere with just enough code
•PyAirbyte and Milvus Lite: Better Together
•Getting from Prototype to Production
2
Our journey today…

Introductions

3

4
Intro to Myself (“AJ”)
“I build data products.” ??????

5
INTRO TO
Covers the long tail of connectors

Airbyte has over 330+ high-quality connectors by thanks to
its participative model.

Extensible and non-opinionated to
address your exact needs

You can build or edit any pre-built connectors to your speciﬁc
needs. Airbyte also integrates with your data stack.

A fair usage-based pricing

Volume is well known across data warehouses. It is
predictable and scales well with database replication. Airbyte
can be up to 10x cheaper than alternatives.

6
INTRO TO
Support for unstructured
document sources

Vector store destinations

Run Anywhere

135+

No-Code
Sources

(Yaml)
Largest data connectors catalog
285+

Total Sources

250+

Python-Based

Sources
10+

Vector Destinations
40+

Destinations

6,000+

Daily active users
Largest data movement community
150K+

Deployments
2 PBs +

Synced /month
1,000+

PRs merged
800+

Code contributors

9
INTRO TO
Airbyte OSS
Open Source, Deploy Anywhere via K8
Airbyte Enterprise
Self-Managed, with Airbyte Support
Airbyte Cloud
Data Movement as a Service

10
INTRO TO
Airbyte OSS
Open Source, Deploy Anywhere via K8
Airbyte Enterprise
Self-Managed, with Airbyte Support
Airbyte Cloud
Data Movement as a Service

PyAirbyte
Run Anywhere ??????

11
INTRO TO
GenAI

✓All the data for your AI app.
✓Interop with GenAI Python libraries.
✓Generate LLM documents.

Machine Learning

✓Get source data in minutes.
✓Integrate with Pandas.
✓Runs in a Notebook.
Data Warehousing and Analytics

✓Supports SQL natively.
✓Streamlines path to production.

Data Engineering

✓Scale to large data volumes.
✓Incremental sync built-in.
✓Built on ELT best practices.
Break down silos between data teams

12
INTRO TO
●Powerful open source vector store
●Scalable and Elastic Architecture
●Diverse Index Support
●Versatile Search Capabilities
●Built-in Staleness Handling
●Hardware-Accelerated Compute

13
INTRO TO
Developer-Friendly - Get Started in Seconds

14
Data Pipeline
Design Principles

15
No code or low-code?
❔❔❔

> “Everything should be as simple
as possible, but no simpler.”
16
No code or low-code?
- Albert Einstein

17
Simple beats complex
Goal is to design resilient composable pipelines, where each
step in the pipeline is simple and obvious. Things should
break in expectable ways, resulting in similarly obvious and
easy remediations.

18
Future-Prooﬁng Your Data Pipeline
Extract
Transform
Load

19
Extract
Transform
Load
Extract
Load
Transform
Future-Prooﬁng Your Data Pipeline

20
Extract
Transform
Load
Extract
Load
Transform
Extract
Load
Transform
Publish
Future-Prooﬁng Your Data Pipeline

21
A scalable and extensible framework

22
NOW... LET’S GET STARTED!

Introducing
PyAirbyte

✓Hundreds of Airbyte source connectors
✓The ability to create your own source connectors with no coding
✓A library you can pip install anywhere, including notebooks!
✓Your choice of production deployment paths:
Airbyte Cloud, OSS, or Self-Hosted Enterprise
23
DATA FROM ANYWHERE,
IN MINUTES, NOT DAYS

Data from Anywhere in 3 Steps
Step 1: Create a Source using get_source()

Data from Anywhere in 3 Steps
Step 2: Conﬁgure with set_config()

Data from Anywhere in 3 Steps
Step 3: Read the data using read()

Data from Anywhere in 3 Steps
The “Speedrun” Version
1 Step

Full Control in (Not a Lot of) Code

29

Choose Airbyte Cloud, Airbyte OSS, or
Airbye Enterprise for:

✓Ease-of-Use
✓Friendly UI
✓Redundancy
✓Peace of Mind

29
Migrate to Airbyte Cloud for
Zero-Code Load to Vector Stores

DEMO

Data extraction and exploration…
30

31
DEMO Script
Simple data extraction demo

Get data from anywhere - show list of sources for yaml,
docker, and python.

Record Format
32
CONVERTING RECORDS TO DOCUMENTS

Document Format
Issue Title DescriptionCreated Updated
123 Broken Widget
on Product Page
… 2024-01-042024-02-23
124 Feature
Request: New
Interactive UI
Model
… 2024-01-152024-02-12
… … … … …
# Broken Widget on
Product Page

```
Issue: 123
Created: 2024-01-04
```

{description}

DEMO

Converting Records to Documents
33

34
DEMO Script
GitHub Records to RAG Demo:
-Show records from last read() operation.
-Export to Documents.
-Show Langchain code to ingest those
documents for chunking, embedding, etc.

DEMO

Building Python prototypes that scale
35

36
DEMO Script

-Return to the notebook
-Continue to load in RAG and run sample query.

37
What happens now depends upon your choices earlier…
How to get to production?

38
What happens next depends upon your choices…
Deploy now!

And move on to new
adventures.
Go back to the
beginning.

Start again on the
“production” solution.
How to get to production?

○Recreate Notebook-based transformations.
○Migrate to a new tool or a new language after
the prototype.
○Pass the prototype to your IT team, which will
(probably) rebuild it differently.
○Test everything over from scratch and fix the
new bugs.
✓Seamless migration from PyAirbyte to Airbyte Cloud
or Self-Managed K8.
✓Same Schema regardless of how you deploy.
✓Handoff to your IT team without changing the
pipelines or switching tools.
✓Your transformations and tests carry over after
deployment.
The Old Way:
Rebuild from Scratch, Cross your Fingers ??????
The Airbyte Way:
Promote what Works, Drop the Rest
39
Beneﬁts of Building ELTP and Airbyte

DEMO

From Prototype to Production
40

41
Recap &

Wrap Up

To get to production faster:

Start with a tool and set of design principles
that will see you all the way to the finish.
42
RECAPPING LESSONS LEARNT

Fueling AI with Great Data with Airbyte Webinar

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Fueling AI with Great Data with Airbyte Webinar

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx