Fundamentals of Data Engineering | IABAC

seenivasanv5 48 views 10 slides Mar 06, 2025
Slide 1
Slide 1 of 10
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10

About This Presentation

Fundamentals of Data Engineering covers the fundamental concepts of creating and managing data infrastructure. It comprises data intake, storage, processing, and pipeline automation with SQL, Python, Hadoop, and cloud platforms. It focuses on creating scalable systems for efficient data handling w...


Slide Content

Fundamentals of
Data Engineering
iabac.org

What is Data Engineering?
The process of creating, developing,
and managing systems for data
collection, storage, and processing.
Make sure that data is accurate,
readily available, and ready for
analyzing.
connects raw data to useful
discoveries.
iabac.org

Key Components of Data
Engineering
Data Collection – Gathering information
from multiple sources.
Data Storage – Utilizing databases, data
lakes, and warehouses for storage.
Data Processing – Converting raw data into
formats that can be used effectively.
Data Workflow Orchestration – Streamlining
the movement of data through automation.
Data Governance & Security – Maintaining
compliance and safeguarding data integrity.
iabac.org

Aspect
Focus
Data Engineering Data Science
Data Engineering vs Data Science
Focus Key Tools
Goal
Key Tools
Goal
SQL, Spark,
Airflow
Reliable data for
analytics
Analysis &
modeling
Python, ML
libraries
Insights &
predictions
iabac.org

Relational databases (SQL, PostgreSQL, and
MySQL).
NoSQL databases (MongoDB and
Cassandra).
Data Warehouses (BigQuery, Snowflake, and
Redshift).
Data lakes (S3 and Delta Lake).
iabac.org
Data Storage Technologies

Batch Processing (ETL) – for example,
Apache Spark.
Hadoop Stream Processing – for example,
Apache Kafka, Flink.
Hybrid Approaches – Combining batch &
real-time.
Data Processing Frameworks
iabac.org

Data Pipeline Orchestration
Workflow automation tools: Apache Airflow,
Prefect, Dagster.
Steps in a pipeline:
Data ingestion1.
Cleaning & transformation2.
Storage & indexing3.
Delivery to consumers4.
iabac.org

Data Quality- Validations, deduplication,
and anomaly detection.
Security- Encryption, access control (IAM)
Compliance- GDPR, HIPAA, SOC 2.
Metadata Management- Process of
classifying data in order to make it
discoverable.
Data Governance & Security
iabac.org

Data Storage: PostgreSQL, MongoDB and
Snowflake.
Processing: Spark, Flink and DBT.
Orchestration: Airflow and Prefect.
Cloud Platforms: AWS, GCP and Azure.
Tools & Technologies in Data
Engineering
iabac.org

Thank You
Visit: iabac.org