Presented at CDOIQ 2024: How to Unlock Data for AI by Breaking Through the Data Transformation Bottleneck

barleyfish 87 views 21 slides Jul 18, 2024
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

Data is the competitive advantage to power the next generation of AI and analytics. Yet access to clean, trusted, and timely data remains a challenge. Traditional data transformation doesn’t work well with data cloud platforms and coding of pipelines puts strain on scarce data engineering resource...


Slide Content

How to Unlock Data for AI by Breaking Through the Data Transformation Bottleneck Maciej Szpakowski Co-founder CTO Matt Turner, Director Product Marketing

Bad data is like manure … it gets everywhere! Susan Lauda Director, Global Advanced Technology AGCO Corp 2019 2018 To properly train a predictive model, historical data must meet exceptionally broad and high quality standards 2020 AI processes are given data that is not unique, accurate, consistent, and timely, these processes will not produce reliable results and therefore will lead to unwanted business outcomes

Bad data is like manure … it gets everywhere! Susan Lauda Director, Global Advanced Technology AGCO Corp 2019 2018 2020

But then … GenAI ! It’s a design revolution Cassie Kozyrkov CEO Data Scientific Data Innovation Summit 2023 First fundamental computing platform change in 60 years Jenson Huang CEO NVIDIA Snowflake Summit 2023 genAI strategy IS data strategy Adam Selipsky (then) CEO AWS Re:Invent 2023

Gartner D&A Summit March 2024 June 2024

Enable users Process data Product analysis MORE is needed by enterprises Data engineers Data analysts Data scientists Structured Semi-structured Unstructured Business intelligence Generative AI Precision ML Reports Oversubscribed Blocked

Development Observability Orchestration Execution Data Existing solutions lack performance or usability Legacy ETL Cloud Data Platform Pros Enables all users Higher productivity Cons Locked-in Low performance Pros Code power High performance Cons Fewer users Low productivity git airflow observability sql dwh spark data Metadata Informatica, DataStage, AbInitio, Alteryx

The data transformation iron triangle Powerful Intuitive Intelligent

55% Reduction in the time it took to complete a task What would you like to do today? Copilots with AI change the game

Copilots can be the productivity layer for users Data engineers Data analysts Data scientists Data transformation copilot Business logic - code on git SPARK SQL

The Copilot Recipe Artificial intelligence Compiler Visual interface

Enable every user Ease of use Remove barriers High productivity Low code Drag & drop Spark & SQL Data analyst Data engineer

Modernize stack to deliver championship winning data Create pipelines 7x faster Deliver 10x more data Collaboration and knowledge sharing scale data transformation

Makes recommendations Converts natural language to business logic Complete pipelines Generate tests Writes documentation Suggests fixes for errors More productivity for every team member Higher Productivity per User Focus on Analytics

Reinvent pipeline creation process to include business SMEs and unlock supply chain data and improve quality and reduce risk Results: Reduce pipeline creation steps from 26 to 9 Deliver 10x more data objects to the business

Compilers bring the full power & freedom of code Visual pipelines become code Visual Code AI Code for Standards & Framework Plugins Prompts for AI generate code, that becomes visual pipelines Code Plugins become gems in the visual interface, and they get integrated with AI

Speed migration to data cloud architecture to improve efficiency and scale resources Results: Improve data transformation processing speeds by 65% 50% savings in first year including data modernization project costs Ownership of code and retain knowledge

Integrated and comprehensive Single pane of glass with existing systems of record, without adding another system of record Data Transformation Copilot Development Metadata Deployment Governance Observability Business logic Spark code SQL code Airflow code Git, CI, CD system Storage Compute SPARK SQL Cloud Data Platform Cloud Data Platform

Build a SQL data pipeline on Databricks with Copilot using Text + Visual or SQL Build a new model visual, text prompts, recommendations modify code with copilot test, documentation, deploy Completing lifecycle Alteryx import & Fix-it Orchestration with Airflow Demo Powerful Intuitive Intelligent

Prophecy delivers productivity without compromise Compiler Artificial intelligence Visual interface

The Data Transformation Copilot Visit our Table Schedule a Demo