Creating datasets with dbt is faster and more scalable than using drag-and-drop tools, stored procedures, scripts, or Spark. How does it work and why is it important to my business? During this talk, dbt's collaborative approach to managing data transformations will be presented. Also presented ...
Creating datasets with dbt is faster and more scalable than using drag-and-drop tools, stored procedures, scripts, or Spark. How does it work and why is it important to my business? During this talk, dbt's collaborative approach to managing data transformations will be presented. Also presented will be the centrality of dbt in a typical modern analytical architecture and the benefits cloud-based analytical systems, commonly called the modern data stack.
Size: 6.37 MB
Language: en
Added: Sep 21, 2024
Slides: 22 pages
Slide Content
Ship data faster with dbt Sean McIntyre – Senior Solutions Architect @ dbt Labs DSC DACH 2024
The world is not enabled for AI (at scale)
2023 “Delivering new data projects has ground to a halt”
2023 “Delivering new data projects has ground to a halt” 2024 “We need to ship an LLM on top of our data!”
Many organizations are struggling with the basics of data management
On a journey of making analytics accessible to everyone… ‹#› On-prem Big Data Cloud DBMS Multi-Platform AI/ML Analytics accessibility
… putting greater pressure on trust in data… ‹#› On-prem Big Data Cloud DBMS Multi-Platform AI/ML Analytics accessibility Governance and trust
… while navigating increasing infrastructure complexity ‹#› On-prem Big Data Cloud DBMS Multi-Platform AI/ML Complexity of multi-cloud infrastructure and data sources Analytics accessibility Governance and trust
Emergence of cloud data platforms created an ecosystem ‹#› Ingestion Data and AI Platform Transformation The Modern Data Stack BI/Analytics
Emergence of cloud data platforms created an ecosystem ‹#› Ingestion Data and AI Platform Transformation The Modern Data Stack BI/Analytics Orchestration Observability Catalog Semantics Metadata Silos
The impact of this is profound ‹#› 96% Rate scaling AI + ML a top priority for enterprise data strategies 2 Every organization loses $12.9m each year due to bad data quality. 1 <0.5% of data is ever analyzed and used 3 1 Gartner: https://www.gartner.com/smarterwithgartner/how-to-improve-your-data-quality 2 MIT Technology Review, Databricks; Building a high-performance data and AI organization 3 Forrester Research, ROI of Data Quality .
A data control plane centralizes metadata to make it actionable Data and AI Platform _Control Plane for Data and AI Orchestration Observability Catalog Semantics Ingestion BI/Analytics ‹#› Active Metadata Layer Transformation
dbt connects the metadata across your business ‹#› _Integrity Built-in assertions to ensure data remains consistent and trusted _Documentation Describe the data of your business _Development Fully integrated in the development workflow _Governance End-to-end lineage across workflows, projects and teams _Semantics Define and manage the logic for critical business metrics
_Data & AI platform dbt makes metadata actionable to power data and AI Data Ingest Analytics & AI Catalog ‹#› Orchestration Observability Visual Editor Semantic Layer _dbt Cloud / The control plane for data and AI / AI Services Security Metadata API / / Active Metadata Framework
dbt is collaborative data management
‹#› dbt is where data engineers and data analysts come together to manage data complexity at scale. Analytics Data sources Functions Applications Databases Events Files JSON Event Data JSON Response Data CSV + JSON Files Database Tables Application Data webhook_events_source salesforce_source google_analytics_source sql_server_source postgres_source csv_source json_source kafka_events_source weather_data_source census_data_source staging Staging & Core Marts Fit for Purpose Analytics Data engineers Data engineers & data analysts Data platform
Why data engineers ❤️ dbt Share auto-updating lineage and documentation Easily trace dependencies from source to dashboard to align teams and build trust Why data analysts ❤️ dbt Speed development Modular code eliminates boilerplate DDL & DML Test and version control before production Connect to git to eliminate silos, automate deploys, and improve reliability Build your DAG as you code Stop manually managing dependencies Collaborate across multiple dev environments Build and test from the command line, accessible web-based IDE, or visual editor Why data consumers ❤️ dbt Trust data freshness Get real-time signals on data quality and health, right where you need them dbt enables multiple roles to collaborate in one shared space Self-serve metrics Tap into governed, accurate metrics from your BI tool or AI chatbot Divide projects into domains Embrace mesh to speed production without compromising governance Spot and fix errors fast Alerts, logs, and column-level lineage make it easy to identify root cause
Use dbt to ship data faster
#1 In G2 Crowd’s DataOps Report 4,600+ dbt Cloud Customers 40,000+ Companies using dbt 100,000 dbt Community Members dbt is the new standard for data transformation ‹#› Partner of the year 2024 Data Integration Customer Impact Partner of the Year
Ship higher quality data faster and cheaper with dbt reduction in data quality issues Build trust in data & data teams 30-80+% improvement in velocity & productivity Ship data products faster ~50% reduction in data platform costs Optimize costs of generating insights 20-40% ‹#› DISH reduced bugs and data quality issues by 30%, Plentific experienced 99% decline in data pipeline breaks; Pepperstone reduced data quality issues and inconsistent reports by 80% Pepperstone increased speed to delivery by 30%; Siemens improved data delivery time by 93%; AXS improved data deployment time by 50% Forrester TEI Report (engineers experience 30% productivity improvement; analysts recoup 20% of time); Conde Naste increased self-service by 30%; AXS saved 40% maintenance hours
How to get started with dbt Reduce maintenance overhead dbt Core (“the engine”): pip install dbt-core dbt-ADAPTER_NAME SQL-based transformations Testing framework Documentation dbt Cloud (“the racecar”): getdbt.com/signup Manage chaos and complexity: Explorer, Mesh, Semantic Layer, column-level lineage Onboard more contributors faster: CLI and IDE, Visual Editor, AI copilot, easy git Optimize cloud platform costs: SlimCI, Defer to Production, Model Timing View Enhance security & governance: RBAC/SSO, Advanced Networking, Training, SLAs Reduce maintenance overhead: SaaS, managed dbt version upgrades ‹#›