数据工程-第3课-F1 silver 和gold layer 处理,datrabricks可视化.pptx

GravenGuan 10 views 13 slides May 28, 2024
Slide 1
Slide 1 of 13
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13

About This Presentation

Azure Synapse Studio: Unleashing Insights from NYC Taxi Data
该项目旨在深度探索Azure Synapse Analytics的高级功能,将以纽约市出租车数据为基础,全面展开数据工程实战。我们将学习如何利用Azure Synapse Analytics的多元化服务,包括Serverless SQL Pool...


Slide Content

数据工程 F1 项目 Heren 老师

F1 Project Databricks Workflow Databricks Workflows orchestrates data processing, machine learning, and analytics pipelines on the Databricks Data Intelligence Platform. Workflows has fully managed orchestration services integrated with the Databricks platform, including Databricks Jobs to run non-interactive code in your Databricks workspace and Delta Live Tables to build reliable and maintainable ETL pipelines.

F1 Project Include Notebook (%run)

F1 Project Create Databricks Jobs

F1 Project Data Objects Catalog: a grouping of databases. Database or schema: a grouping of objects in a catalog. Databases contain tables, views, and functions. Table: a collection of rows and columns stored as data files in object storage. View: a saved query typically against one or more tables or data sources.

F1 Project Spark SQL Azure Data Lake Hive Meta Store Databricks Default External Meta Store (Azure SQL, MySQL etc.) Hive Metastore

F1 Project Managed tables Create managed table using Python Create managed table using SQL Effect of dropping a managed table Describe table

F1 Project External tables Create external table using Python Create external table using SQL Effect of dropping an external table

F1 Project Ingest Transform Analyze Report ErgastAPI ADLS Raw Layer ADLS Ingested Layer ADF Pipelines ADLS Presentation Layer External Tables Managed Tables Manage tables and External Tables

F1 Project Create tables from block storage files in raw, silver, gold layers

F1 Project Spark transformations: Filter Join Create presentation layer Race Result table

F1 Project Dominant Drivers/ Teams Analysis Create a table with the data required Granularity of the data – race_year , driver, team Rank the dominant drivers of all time/ last decade etc Rank the dominant teams of all time/ last decade etc

F1 Project Create presentation layer (gold layer) Create a table with the data required Granularity of the data – race_year , driver, team Rank the dominant drivers of all time/ last decade etc Rank the dominant teams of all time/ last decade etc Visualization of dominant drivers Visualization of dominant teams
Tags