Azure Synapse Studio: Unleashing Insights from NYC Taxi Data
该项目旨在深度探索Azure Synapse Analytics的高级功能,将以纽约市出租车数据为基础,全面展开数据工程实战。我们将学习如何利用Azure Synapse Analytics的多元化服务,包括Serverless SQL Pool...
Azure Synapse Studio: Unleashing Insights from NYC Taxi Data
该项目旨在深度探索Azure Synapse Analytics的高级功能,将以纽约市出租车数据为基础,全面展开数据工程实战。我们将学习如何利用Azure Synapse Analytics的多元化服务,包括Serverless SQL Pool、Spark Pool和Dedicated SQL Pool,进行数据的摄取、转换及深入分析,从而把握数据背后的商业逻辑与价值。项目还包括Azure Cosmos DB Synapse Link的应用,探究混合事务与分析处理(HTAP)在实时数据场景中的关键作用。通过实践操作,掌握如何运用Synapse Pipelines和Triggers实现数据工作流的自动化,优化数据处理过程,确保数据的准确性和可靠性。通过构建、调试和优化数据处理管道,为企业提供深度的数据洞察,支持数据驱动的决策过程,并将分析成果有效展现在Power BI中,以直观的方式呈现数据洞察,辅助企业战略规划与执行。
1. 介绍现代数据仓库
Size: 777.86 KB
Language: en
Added: May 28, 2024
Slides: 13 pages
Slide Content
数据工程 F1 项目 Heren 老师
F1 Project Databricks Workflow Databricks Workflows orchestrates data processing, machine learning, and analytics pipelines on the Databricks Data Intelligence Platform. Workflows has fully managed orchestration services integrated with the Databricks platform, including Databricks Jobs to run non-interactive code in your Databricks workspace and Delta Live Tables to build reliable and maintainable ETL pipelines.
F1 Project Include Notebook (%run)
F1 Project Create Databricks Jobs
F1 Project Data Objects Catalog: a grouping of databases. Database or schema: a grouping of objects in a catalog. Databases contain tables, views, and functions. Table: a collection of rows and columns stored as data files in object storage. View: a saved query typically against one or more tables or data sources.
F1 Project Spark SQL Azure Data Lake Hive Meta Store Databricks Default External Meta Store (Azure SQL, MySQL etc.) Hive Metastore
F1 Project Managed tables Create managed table using Python Create managed table using SQL Effect of dropping a managed table Describe table
F1 Project External tables Create external table using Python Create external table using SQL Effect of dropping an external table
F1 Project Ingest Transform Analyze Report ErgastAPI ADLS Raw Layer ADLS Ingested Layer ADF Pipelines ADLS Presentation Layer External Tables Managed Tables Manage tables and External Tables
F1 Project Create tables from block storage files in raw, silver, gold layers
F1 Project Spark transformations: Filter Join Create presentation layer Race Result table
F1 Project Dominant Drivers/ Teams Analysis Create a table with the data required Granularity of the data – race_year , driver, team Rank the dominant drivers of all time/ last decade etc Rank the dominant teams of all time/ last decade etc
F1 Project Create presentation layer (gold layer) Create a table with the data required Granularity of the data – race_year , driver, team Rank the dominant drivers of all time/ last decade etc Rank the dominant teams of all time/ last decade etc Visualization of dominant drivers Visualization of dominant teams