details on azure synape offering and associated solution details
Size: 5.05 MB
Language: en
Added: Sep 09, 2024
Slides: 44 pages
Slide Content
Azure Synapse Analytics
Hemant Das
Cloud Solution Architect
LOB
CRM
Social
IoT & Edge Devices
Media
Logs
Data Sources
Enterprise Data
On-Prem Data Stores
Public Datasets
3
rd
Party Endpoints
More… Data Factory
Event Hub
Purview
Data Tools
Data Governance
Azure Tools
Monitor
Data IngestionData Storage (Raw & Curated)Data Transformation ML / AI Data Consumption
Storage Account Cosmos DB Stream Analytics Machine Learning Power BI Serverless Functions
IoT Hub
Event Grid
Service Bus
Data Lake Redis Cache
Data Warehouse
/ Synapse
SQL / VM
Azure SQL DB SAP HANA
HDInsight
Cognitive
Services
Databricks Databricks
Synapse Synapse
Data Explorer DS-VM
Analysis Services Kubernetes Services
Power Platform Bot Service
Data Share On-prem
Data Gateway
Elastic PoolsElastic JobsData BoxData Migration
Service
IoT Central
Apps
Log AnalyticsKey VaultService
Endpoint
Private LinkFirewallMetricsBlueprintLighthouseCost
Management
Active
Directory
DevOpsSecurity
Center
Batch / Query
Also supports MySQL, PostgreSQL, Mongo DB, and
other open source, some on hyperscale
Azure Data Services Landscape
Big data
Experimentation
Fast exploration
Semi-structured
Data science
Relational data
Proven security & privacy
Dependable performance
Structured
Business analytics
Data lake Data warehouse
Section 3
BI & DW come together
Azure Synapse Analytics
Azure meets these challenges,
with a single service to provide limitless analytics
Azure Synapse is the only unified platform for analytics, blending
big data, data warehousing, and data integration into a single cloud
native service for end-to-end analytics at cloud scale.
The firstunified, cloud native platform for converged analytics
Unified experience
Azure Synapse Studio
Integration Management Monitoring Security
Analytics runtimes
SQL
Azure Data Lake Storage
Azure Machine
Learning
On-premises data
Cloud data
SaaS data
Streaming data
Power BI
Azure Synapse lies at the heart of business, AI, and BI
Azure Synapse Analytics
Azure Synapse Analytics
Limitless analytics service with unmatched time to insight
PlatformPlatform
Azure
Data Lake Storage
Azure
Data Lake Storage
Common Data Model
Enterprise Security
Optimized for Analytics
METASTORE
SECURITY
MANAGEMENT
MONITORING
DATA INTEGRATION
Analytics Runtimes
DEDICATED SERVERLESS
Form Factors
SQL
Languages
Python .NET Java Scala
ExperienceExperience
Synapse Analytics Studio
Artificial Intelligence / Machine Learning / Internet of Things
Intelligent Apps / Business Intelligence
METASTORE
SECURITY
MANAGEMENT
MONITORING
Query and analyze data with
T-SQL using both provisioned
and serverless models
Quickly create notebooks with
your choice of Python, Scala,
SparkSQL, and .NET for
Apache Spark
Build end-to-end workflows
for your data movement and
data processing scenarios
Execute all data tasks with a
simple UI and unified
environment
Azure Synapse Analytics
Synapse SQL
Apache Spark
for Synapse
Synapse PipelinesSynapse Studio
Data Warehouse
Push rules for enforcement
to SDKs in data sources
Scalable and secure SQL analytics platform
Flexible consumption models
Serverless pay-per-query ideal for ad-hoc data lake
exploration and transformation
Dedicatedclusters optimized mission-critical data
warehouse workloads
Serverless Dedicated
Fully-managed elastic platform
Elastic compute that can be easily optimized to
different classes of workload
All features available in a single tier
Infinite cost effective PAYG storage
SQL Editor
Automatic code completion (Intellisense)
Script collaboration within the Workspace
Built-in visualizations
Easily switch between clusters
11.5
7
62
9.5
28
2.5
30.5
48.5
99
22.5
9
5
11.5
6.5
2
5
27
99.5
18.5
21
95.5
8
Q1Q2Q3Q4Q5Q6Q7Q8Q9Q10Q11Q12Q13Q14Q15Q16Q17Q18Q19Q20Q21Q22
TPC-H 1 petabyte execution times
TPH-H and TPC-DS Leader
Price/performance leadership relative to other cloud
data warehouses
“Polaris” is the only query engine to successfully
complete TPC-H at 1PB scale
https://aka.ms/synapse-dqp
$47
$152
$564
$51
DW30000C4X-LargeBigQuerydc2.8xlarge
60N
Test-H 30TB
Price/performance @30TB
($ per query per hour) lower is better
$153
$286
$309
$570
Azure
Synapse
RedshiftSnowflake
Enterprise
BigQuery Flat
Rate
Test-DS 30TB
Price/performance comparison (lower is better)
* GigaOmTPC-H benchmark report, January 2019, “GigaOmreport: Data Warehouse in the Cloud Benchmark
Only platform to compete TPC-H
benchmark at 1 Petabyte
Massive Concurrency
Global Workload Graph
Workload aware query scheduling
https://aka.ms/synapse-dqp
Workload Management
Scale in Scale-out
Azure Synapse supports a more diverse set of workload management tools through workload importance, intra-cluster isolation, andelastic clusters.
Workload Importance Workload Isolation
Workload Group B
40%
Elastic Cluster (Scale Up)
2000 cDWU
Workload Management Management
•It manages resources, ensures highly efficient resource
utilization, and maximizes return on investment (ROI).
•The three pillars of workload management are
1.Workload Classification –To assign a request to a
workload group and setting importance levels.
2.Workload Importance –To influence the order in
which a request gets access to resources.
3.Workload Isolation –To reserve resources for a
workload group.
Pillars of Workload
Management
Classification
Importance
Isolation
Workload classification
•Overview
•Map queries to allocations of resources via pre-determined
rules.
•Use with workload importance to effectively share
resources across different workload types.
•If a query request is not matched to a classifier, it is
assigned to the default workload group.
•Benefits
•Map queries to both
Resource
Management
and Workload Isolation concepts.
•Monitoring DMVs
•sys.workload_management_workload_classifiers
sys.workload_management_workload_classifier_details
•Query DMVs to view details about all active workload
classifiers.
CREATE WORKLOAD CLASSIFIER classifier_name
WITH
(
WORKLOAD_GROUP = 'name’
, MEMBERNAME = ‘security_account'
[ [ , ] IMPORTANCE = {LOW|BELOW_NORMAL|NORMAL|ABOVE_NORMAL|HIGH} ] )
[ [ , ] WLM_LABEL = 'label' ]
[ [ , ] WLM_CONTEXT = 'name' ]
[ [ , ] START_TIME = 'start_time' ]
[ [ , ] END_TIME = 'end_time' ]
)[ ; ]
WORKLOAD_GROUP: maps to an existing resource class
IMPORTANCE: specifies relative importance of
request
MEMBERNAME:database user, role, AAD login or AAD
group
Using Resource Classes to improve Data Ingestion
Workload importance
•Overview
•Queries past the concurrency limit enter a FiFo
queue
•By default, queries are released from the queue on
a first-in, first-out basis as resources become
available
•Workload importance allows higher priority
queries to receive resources immediately
regardless of queue
•Example
•State analysts have normal importance.
•National analyst is assigned high importance.
•State analyst queries execute in order of arrival
•When the national analyst’s query arrives, it jumps
to the top of the queue
CREATEWORKLOADCLASSIFIERNational_Analyst
WITH
(
WORKLOAD_GROUP = ‘analyst’
,IMPORTANCE=HIGH
,MEMBERNAME =‘National_Analyst_Login’)
Azure Synapse
Analytics
What if you want to
prioritize the workloads
that get access to
resources?
12 1011
Running Queued
34567 98 12101112
Scheduler without importance
9 10
QueuedQueued
CEOCEOCEO
By default, workloads are run on a first-in first out basis.By default, workloads are run on a first-in first out basis.
Workload importance
With workload
importance,
prioritized workloads
take precedence
12 1011
Running Queued
34567 98 12
Scheduler With Importance Turned On
12
Queued
CEOCEO
LowNormal NormalHigh
CREATE WORKLOAD CLASSIFIER classifier_name
WITH
(
WORKLOAD_GROUP = 'name’ ,
MEMBERNAME= 'security_account' [ [ , ]
IMPORTANCE= { LOW | BELOW_NORMAL | NORMAL (default) | ABOVE_NORMAL | HIGH }])
Workload importance
Intra cluster workload isolation
(Scale in)
Marketing
CREATE WORKLOAD GROUP Sales
WITH
(
[ MIN_PERCENTAGE_RESOURCE = 60 ]
[ CAP_PERCENTAGE_RESOURCE = 100 ]
[ MAX_CONCURRENCY = 6 ] )
40%
Data
warehouse
Local In-Memory + SSD Cache
Compute
1000c DWU
60%
Sales
60%
100%
Workload aware
query execution
Workload isolation
Multiple workloads share
deployed resources
Reservation or shared resource
configuration
Online changes to workload policies
Workload Isolation
Category Feature
Data Protection
Data in transit
Data encryption at rest
Data discovery and classification
Access Control
Object level security (tables/views)
Row level security
Column level security
Dynamic data masking
Column level encryption
Authentication
SQL login
Azure active directory
Multi-factor authentication
Network Security
Managed virtual network
Custom virtual network
Firewall
Azure ExpressRoute
Azure Private Link
Threat protection
Threat detection
Auditing
Vulnerability assessment
Isolation
Dedicated metadata store
Hosted in customer tenant
Best-in-class Security
Customer & System Managed Keys
All data encrypted by default
Up to 3x levels of data encryption at rest
Democratize data at scale with fine-grained ACL
Proactive protection
Comprehensive Compliance
HIPAA /
HITECH
IRS 1075 Section 508
VPAT
ISO 27001 PCI DSS Level 1SOC 1 Type 2 SOC 2 Type 2 ISO 27018Cloud Controls
Matrix
Content Delivery and
Security Association
Singapore
MTCS Level
3
United
Kingdom
G-Cloud
China Multi
Layer
Protection
Scheme
China
CCCPPF
China
GB 18030
European Union
Model Clauses
EU Safe
Harbor
ENISA
IAF
Shared
Assessments
ITAR-ready
Japan
Financial Services
FedRAMP JAB
P-ATO
FIPS 140-2 21 CFR
Part 11
DISA Level 2FERPA CJIS
Australian
Signals
Directorate
New
Zealand
GCIO
Industry-leading compliance
Eliminate network maintenance
One-click enables automated management of virtual
networks between cluster endpoints
Synapse resources only ever interop with private
endpoints
No management of subnets or IP Ranges
Prevents data exfiltration
Compliance Boundary
More than just data security
Native integration with Azure Purview
Automatically discover and classify data assets
End-to-end data lineage
Real-time Operational Analytics
Push rules for enforcement
to SDKs in data sources
Eliminate latency and accelerate decision making
Real-time operational analytics
One-click enablement in Azure Portal
No data integration pipelines required
Near-zero impact on operational systems
Latency <90s at 99
th
percentile
Azure Cosmos DB
Analytical Store
Column storeoptimized
for analytical queries
Transactional Store
Row store optimized for
transactional operations
Azure Synapse
Analytics
Cloud-Native HTAP
Azure Synapse Link
Machine Learning
Push rules for enforcement
to SDKs in data sources
Empower everyone with predictive insights
Democratize data science to all
Synapse makes predictive analytics accessible to all
Notebooks provides a code authoring experience for
complex predictive models
Automatic ML graphical interface provides a no-
code experience for creating ML models
Native integration with Azure Cognitive Search
provides access to pre-built models
All Code Low/No-Code Pre-built models
Code-first ML model development
PySpark, Scala, and C# languages supported
Automatic code completion (Intellisense)
Author multiple languages in a single notebook
Analyze data from the data warehouse, data lake,
and real-time operational data from one place
Data+Languages
Languages such as SQL, PySpark, Scala and C# in
support of data science and data warehouse
workloads
The data lake supports and unlimited set of file
formats including Parquet, ORC and Json as well as
audio, image, and video formats
Language
Data
All you need is data
Fully automated feature exploration
Code-free in Synapse Studio
No-code creation on Machine Learning models
Democratize ML to everyone since no data science
domain knowledge required
Support for ensemble models
Supports classification, regression, and time-series
forecasting
Code-free in Synapse Studio
No-code references to Machine Learning models
Democratize ML to everyone since no data science
domain knowledge required
Easily embed in SQL Stored Procedures for
transformation of Views for reporting
SELECTd.*, p.ScoreFROMPREDICT(MODEL= @onnx_model, …
In-engine ML Scoring
Machine Learning models executed using SQL
“In-engine” for performance and scalability
No data leaves the platform for scoring
No additional cost for scoring
T-SQL Language
Synapse SQL
Model Data Predictions
Data Integration
Code-free Hybrid Data Integration
Push rules for enforcement
to SDKs in data sources
Cloud native ETL/ELT
95+ connectors available
Secure connectivity to on-premise data sources,
other clouds, and SaaS applications
Code-first and low/no code design interfaces
Schedule and Event based triggering
Code-free
Code-free data
wrangling
No/low-code data
transformation
Excel-like interface is familiar to users
Transform data to desired shape completely visually
Operationalize into pipelines
Real-time operational
analytics
No data integration pipelines required
Near-zero impact on operational systems
Latency <90s at 99th percentile
Accelerate time to solution
Azure Open Data sets
Pre-built samples to accelerate development
•
SQL Scripts
•
Notebooks
•
Data Pipelines
Build dashboard in Synapse Studio
Code-free experience for development rich
visualizations
One-click publishing to for secure consumption
across the enterprise
Questions
Azure Synapse Analytics for Data Engineers | Microsoft Azure