Explore the transformative capabilities of Direct Lake within Microsoft Fabric through this detailed SlideShare presentation. Aimed at Power BI professionals, data engineers, and analytics enthusiasts, this comprehensive guide provides an in-depth look at how Direct Lake integrates with the broader ...
Explore the transformative capabilities of Direct Lake within Microsoft Fabric through this detailed SlideShare presentation. Aimed at Power BI professionals, data engineers, and analytics enthusiasts, this comprehensive guide provides an in-depth look at how Direct Lake integrates with the broader Microsoft ecosystem to offer a robust, efficient, and scalable solution for real-time data analysis and business intelligence.
The presentation begins by introducing the key players in the Microsoft Fabric landscape, setting the stage for understanding how Direct Lake fits into the bigger picture of data integration and analytics. You’ll gain a thorough understanding of the various components involved, including Data Factory, Synapse, and Power BI, and how they work together within the Fabric framework to create a seamless end-to-end analytics experience.
The slides take you through the architecture of Power BI both before and after the introduction of Fabric, focusing on the significant improvements brought about by Direct Lake. Learn how Direct Lake eliminates the need for SQL endpoints, optimizes Parquet files through the unique V-Ordering method, and supports real-time data processing without the drawbacks of data duplication or latency typically associated with traditional data warehousing solutions.
One of the standout features of Direct Lake is its ability to perform almost at par with Import mode, but without the need for frequent data refreshes. The presentation details the prerequisites for leveraging Direct Lake, such as the necessity of Fabric F Capacity and Power BI Premium, and explains the importance of Delta tables in facilitating efficient data storage and retrieval.
Dive into the concept of “framing” within Direct Lake, which refreshes metadata only, significantly speeding up operations compared to full data refreshes. The presentation also explores the differences between Import mode and Direct Lake, discussing scenarios where fallback to Direct Query may be necessary and how to manage these situations effectively.
Moreover, the slides provide a practical demonstration of how to set up and use Direct Lake, including a demo on integrating a sample Lakehouse with Direct Lake to showcase its performance advantages. You’ll also learn about the limitations of Direct Lake, such as the current restrictions on calculated tables and columns, and receive best practices on how to decide when to implement Direct Lake in your data workflows.
The presentation wraps up by emphasizing the strategic importance of Direct Lake for “Greenfield” projects centered around lake-centric solutions while advising caution for those who are content with their existing Import models. With its ability to drastically reduce the time and resources needed for data processing while maintaining high performance, Direct Lake represents a significant advancement in the realm of data analytics and business intelligence.
Size: 17.07 MB
Language: en
Added: Aug 13, 2024
Slides: 30 pages
Slide Content
Nikola Ilic d ata-mozart.com @DataMozart I'm making music from the data ! Power BI and SQL addict , blogger, speaker... Father of 2, B a r c a & Leo Messi fan... Consultant & Trainer learn. d ata-mozart.com
TODAY() Everything is subject to change!
“Players” in Microsoft Fabric Explore end-to-end analytics with Microsoft Fabric - Training | Microsoft Learn Data Integration Data Factory Data Engineering Synapse Data Warehouse Synapse Data Science Synapse Real-Time Intelligence Business Intelligence Power BI OneLake Observability Data Activator Data Factory : Data integration combining Power Query with the scale of Azure Data Factory to move and transform data Synapse Data Engineering : Data engineering with a Spark platform for data transformation at scale Synapse Data Warehouse : Data warehousing with SQL performance and scale to support data use Synapse Data Science : Data Science with Azure Machine Learning and Spark for model training and execution tracking Real-Time Intelligence : Real-time analytics to query and analyze large volumes of data in real-time Power BI : Business intelligence for translating data into decisions Data Activator : Real-time detection and monitoring of data that can trigger notifications and actions when it finds specified patterns in data
4 I’m a Power BI Professional… What should I do now?!
Power BI Architecture – Pre-Fabric Import mode DAX queries Import Direct Query DAX queries SQL queries Fast performance Data duplication Data latency Slow performance Real-time No data duplication
Power BI Architecture – Fabric Import mode DAX queries Import Direct Query DAX queries SQL queries Fast performance Data duplication Data latency Slow performance Real-time No data duplication Direct Lake DAX queries Scan Scan Delta files in OneLake Scan (“see-through”)
Direct Lake Prerequisites Fabric F Capacity/Power BI Premium Lakehouse + SQL Endpoint (for DQ fallback)/Warehouse Delta tables V-Ordering* * V-Ordering Fabric-specific way of additionally optimizing Parquet files when writing data
Back to the future with Microsoft Fabric! This Photo by Unknown Author is licensed under CC BY-NC-ND
Lakehouse Lakehouse Tables section Default PBI Semantic Model (Direct Lake) SQL analytics endpoint Custom Semantic Model (Direct Lake)
Warehouse Lakehouse Default PBI Semantic Model (Direct Lake) Custom Semantic Model (Direct Lake)
Default vs. Custom Semantic Model DEMO
How does this architecture magic work?
Adding new tables to the model If set to off (default), you need a sync to add new tables.
Refresh option for semantic models
Lakehouse Semantic model “Frame” Direct Lake Refresh (AKA “Framing”) Delta table
Lakehouse Semantic model “Frame” Direct Lake Refresh (AKA “Framing”) Delta table
Lakehouse Semantic model “Frame” Direct Lake Refresh (AKA “Framing”) Delta table
Lakehouse DimCustomer Direct Lake Refresh (AKA “Framing”)
Lakehouse DimCustomer Direct Lake Refresh (AKA “Framing”)
Framing = Refreshes METADATA ONLY!
Syncing Framing Paging Temperature Adding new tables to a semantic model Adding the info about the latest “version” of the data to a semantic model Loading columns needed by query in cache memory Keep frequently used columns in cache memory
Import vs. Direct Lake Hot ‘n’ cold DEMO
Fallback to Direct Query
Direct Lake Guardrails Learn about Direct Lake in Power BI and Microsoft Fabric - Power BI | Microsoft Learn Max Memory = memory resource limit for how much data can be paged in for each query Max model size on disk/ Onelake = limit beyond which all queries fall back to DirectQuery
DMV for DirectQuery fallback reason $SYSTEM.TMSCHEMA_DELTA_TABLE_METADATA_STORAGES 0 is fine, anything else means fallback to DirectQuery
DirectLakeBehavior Property From Web UI or Tabular Editor
Final thoughts…
Querying one single Lakehouse or Warehouse No DAX calculated columns/calculated tables (because they are created/persisted at refresh time) No composite model No DateTime relationships Web modeling only or Tabular Editor (you can build only the report from PBI Desktop) Limitations of Directlake (as of today!) Always check the list of current limitations!
Benefits of Direct Lake Performance comparable to Import mode Eliminating the serving layer (Azure SQLDB, Azure Synapse…) saves costs Refreshes in Import mode may use a lot of CUs Multiple models can use the same parquet file with Shortcuts (more saves on no refreshes) SQL/ PySpark to query the same data - One single copy for all use cases!
We are ALL still learning when to use Direct Lake Direct Lake IS a fantastic feature! Direct Lake IS NOT a one solution to “rule-them-all”! Primary choice for “Greenfield” lake-centric solutions? To Wrap Up… If you’re happy with your existing Import models – don’t switch them to Direct Lake (yet)!