4070949. 89-Test-12-File.pdf

raypoll198 32 views 18 slides Jul 15, 2024

Slide 1 of 18

About This Presentation

Data engineering

Size: 1.3 MB

Language: en

Added: Jul 15, 2024

Slides: 18 pages

Slide Content

Data Engineering -
Best Practices
Suraj Acharya,
Director, Engineering
Singh Garewal,
Director, Marketing

Data Engineering Drivers
Advanced analytics / ML
coming of age
Industry-spanning
adoption
Technology innovation:
hardware, cloud and storage
Increased financial
scrutiny
Role evolution: CDO,
Data Curator
...
$$

Accelerate innovation by unifying data science,
engineering and business
•Original creators of , Databricks Delta &
•2000+ global companies use our platform across big
data & machine learning lifecycle
VISION
WHO WE
ARE
Unified Analytics PlatformSOLUTION

Apache Spark: The 1st Unified Analytics Engine

Runtime
Delta
Spark Core Engine

Big Data Processing
ETL + SQL +Streaming
Machine Learning
MLlib + SparkR
Uniquely combined Data & AI technologies

Databricks Delta
Adds data reliability and performance to data lakes

●Co-designed compute & storage

●Compatible with Spark API’s

●Built on open standards (Parquet)

Databricks Delta
Indexes &
Stats
Transactional
Log
Versioned
Parquet Files
Leverages your cloud blob storage

Data Engineering Playing Field
Message Log
Dashboarding/
Reporting/ BI
Storage
Data Model
Data Catalog/
Lineage
Compute: ETL,
analytics, ML
Sandbox
Orchestration
and Workflow
CI/CD Data Quality

What
Data organization and relation of the
different top-level data sets to each
other.

Data Model
How
•Audience segmentation
•Table categorization
•Data types
•Modeling discipline

Data Catalog + Lineage
What
Easy discovery of data sets
Policy enforcement

How
•Explore data model
•Search + suggestions
•Column and table annotations
and grouping
•Lineage tracking
•Automatic flagging of PII +
sensitive columns

Storage Architecture
What
Where data is stored and using what
formats.

How
•Columnar formats
•Minimize metadata lookups
•Compaction

Message Log
What
Source of streaming and batch data.

How
•Read logs into “raw” tables with
minimal preprocessing
•Firehose

Sandbox
What
Isolated environment for
experimentation and exploration.

How
•Notebook collaboration
•Tracking
•Management
•Source control

Compute / Data Processing

What
Execution engine used to process
data.
Layer where “jobs” run.

How
•Multiple multiple frameworks and
language
•SQL compatibility
•Connectors for your data-sources
•Less data scanned => faster job
execution

Orchestration and Workflow
What
Scheduling and triggering jobs
Job Dependencies

How
•“DAG” : Graphical view of job
dependencies and status
•Describe dependencies in code
•Retry policies
•Backfill policies

Dashboarding/ Reporting/ BI
What
Static reports and auto-updating
dashboards
Business facing

How
•Static graphs + emailed reports
•Rollups + aggregations
•Data modelling + Data Analyst
•Real-time dashboards

Quality : Monitoring and Alerting
What
Mechanisms for detecting and fixing
incorrect and stale data-sets
Anomaly detection

How
•Monitor job failures
•Prioritization and coalescing
•Emit metrics during and after jobs
•Metrics database + Graphing
•Monitoring dashboards
•Define KPIs and create alerts

CI/CD
What
Development tools and processes

How
•Sandbox queries, job code and
workflows in source control.
•Deployment process : life of a PR
•Multiple environment support
•Test data sets : sampling,
obfuscation, randomized.

4070949. 89-Test-12-File.pdf

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

4070949. 89-Test-12-File.pdf

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......