Azure Synapse Overview for data analytics

EkanshGirdhar1 55 views 44 slides Sep 09, 2024
Slide 1
Slide 1 of 44
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44

About This Presentation

details on azure synape offering and associated solution details


Slide Content

Azure Synapse Analytics
Hemant Das
Cloud Solution Architect

LOB
CRM
Social
IoT & Edge Devices
Media
Logs
Data Sources
Enterprise Data
On-Prem Data Stores
Public Datasets
3
rd
Party Endpoints
More… Data Factory
Event Hub
Purview
Data Tools
Data Governance
Azure Tools
Monitor
Data IngestionData Storage (Raw & Curated)Data Transformation ML / AI Data Consumption
Storage Account Cosmos DB Stream Analytics Machine Learning Power BI Serverless Functions
IoT Hub
Event Grid
Service Bus
Data Lake Redis Cache
Data Warehouse
/ Synapse
SQL / VM
Azure SQL DB SAP HANA
HDInsight
Cognitive
Services
Databricks Databricks
Synapse Synapse
Data Explorer DS-VM
Analysis Services Kubernetes Services
Power Platform Bot Service
Data Share On-prem
Data Gateway
Elastic PoolsElastic JobsData BoxData Migration
Service
IoT Central
Apps
Log AnalyticsKey VaultService
Endpoint
Private LinkFirewallMetricsBlueprintLighthouseCost
Management
Active
Directory
DevOpsSecurity
Center
Batch / Query
Also supports MySQL, PostgreSQL, Mongo DB, and
other open source, some on hyperscale
Azure Data Services Landscape

Big data
Experimentation
Fast exploration
Semi-structured
Data science
Relational data
Proven security & privacy
Dependable performance
Structured
Business analytics
Data lake Data warehouse

Section 3
BI & DW come together
Azure Synapse Analytics
Azure meets these challenges,
with a single service to provide limitless analytics

Azure Synapse is the only unified platform for analytics, blending
big data, data warehousing, and data integration into a single cloud
native service for end-to-end analytics at cloud scale.
The firstunified, cloud native platform for converged analytics

Unified experience
Azure Synapse Studio
Integration Management Monitoring Security
Analytics runtimes
SQL
Azure Data Lake Storage
Azure Machine
Learning
On-premises data
Cloud data
SaaS data
Streaming data
Power BI
Azure Synapse lies at the heart of business, AI, and BI
Azure Synapse Analytics

Azure Synapse Analytics
Limitless analytics service with unmatched time to insight
PlatformPlatform
Azure
Data Lake Storage
Azure
Data Lake Storage
Common Data Model
Enterprise Security
Optimized for Analytics
METASTORE
SECURITY
MANAGEMENT
MONITORING
DATA INTEGRATION
Analytics Runtimes
DEDICATED SERVERLESS
Form Factors
SQL
Languages
Python .NET Java Scala
ExperienceExperience
Synapse Analytics Studio
Artificial Intelligence / Machine Learning / Internet of Things
Intelligent Apps / Business Intelligence
METASTORE
SECURITY
MANAGEMENT
MONITORING

Query and analyze data with
T-SQL using both provisioned
and serverless models
Quickly create notebooks with
your choice of Python, Scala,
SparkSQL, and .NET for
Apache Spark
Build end-to-end workflows
for your data movement and
data processing scenarios
Execute all data tasks with a
simple UI and unified
environment
Azure Synapse Analytics
Synapse SQL
Apache Spark
for Synapse
Synapse PipelinesSynapse Studio

Data Warehouse
Push rules for enforcement
to SDKs in data sources
Scalable and secure SQL analytics platform

Flexible consumption models
Serverless pay-per-query ideal for ad-hoc data lake
exploration and transformation
Dedicatedclusters optimized mission-critical data
warehouse workloads
Serverless Dedicated

Fully-managed elastic platform
Elastic compute that can be easily optimized to
different classes of workload
All features available in a single tier
Infinite cost effective PAYG storage

SQL Editor
Automatic code completion (Intellisense)
Script collaboration within the Workspace
Built-in visualizations
Easily switch between clusters

11.5
7
62
9.5
28
2.5
30.5
48.5
99
22.5
9
5
11.5
6.5
2
5
27
99.5
18.5
21
95.5
8
Q1Q2Q3Q4Q5Q6Q7Q8Q9Q10Q11Q12Q13Q14Q15Q16Q17Q18Q19Q20Q21Q22
TPC-H 1 petabyte execution times
TPH-H and TPC-DS Leader
Price/performance leadership relative to other cloud
data warehouses
“Polaris” is the only query engine to successfully
complete TPC-H at 1PB scale
https://aka.ms/synapse-dqp
$47
$152
$564
$51
DW30000C4X-LargeBigQuerydc2.8xlarge
60N
Test-H 30TB
Price/performance @30TB
($ per query per hour) lower is better
$153
$286
$309
$570
Azure
Synapse
RedshiftSnowflake
Enterprise
BigQuery Flat
Rate
Test-DS 30TB
Price/performance comparison (lower is better)
* GigaOmTPC-H benchmark report, January 2019, “GigaOmreport: Data Warehouse in the Cloud Benchmark

Only platform to compete TPC-H
benchmark at 1 Petabyte
Massive Concurrency
Global Workload Graph
Workload aware query scheduling
https://aka.ms/synapse-dqp

Workload Management
Scale in Scale-out
Azure Synapse supports a more diverse set of workload management tools through workload importance, intra-cluster isolation, andelastic clusters.
Workload Importance Workload Isolation
Workload Group B
40%
Elastic Cluster (Scale Up)
2000 cDWU

Workload Management Management
•It manages resources, ensures highly efficient resource
utilization, and maximizes return on investment (ROI).
•The three pillars of workload management are
1.Workload Classification –To assign a request to a
workload group and setting importance levels.
2.Workload Importance –To influence the order in
which a request gets access to resources.
3.Workload Isolation –To reserve resources for a
workload group.
Pillars of Workload
Management
Classification
Importance
Isolation

Workload classification
•Overview
•Map queries to allocations of resources via pre-determined
rules.
•Use with workload importance to effectively share
resources across different workload types.
•If a query request is not matched to a classifier, it is
assigned to the default workload group.
•Benefits
•Map queries to both
Resource
Management
and Workload Isolation concepts.
•Monitoring DMVs
•sys.workload_management_workload_classifiers
sys.workload_management_workload_classifier_details
•Query DMVs to view details about all active workload
classifiers.
CREATE WORKLOAD CLASSIFIER classifier_name
WITH
(
WORKLOAD_GROUP = 'name’
, MEMBERNAME = ‘security_account'
[ [ , ] IMPORTANCE = {LOW|BELOW_NORMAL|NORMAL|ABOVE_NORMAL|HIGH} ] )
[ [ , ] WLM_LABEL = 'label' ]
[ [ , ] WLM_CONTEXT = 'name' ]
[ [ , ] START_TIME = 'start_time' ]
[ [ , ] END_TIME = 'end_time' ]
)[ ; ]
WORKLOAD_GROUP: maps to an existing resource class
IMPORTANCE: specifies relative importance of
request
MEMBERNAME:database user, role, AAD login or AAD
group

Using Resource Classes to improve Data Ingestion

Workload importance
•Overview
•Queries past the concurrency limit enter a FiFo
queue
•By default, queries are released from the queue on
a first-in, first-out basis as resources become
available
•Workload importance allows higher priority
queries to receive resources immediately
regardless of queue
•Example
•State analysts have normal importance.
•National analyst is assigned high importance.
•State analyst queries execute in order of arrival
•When the national analyst’s query arrives, it jumps
to the top of the queue
CREATEWORKLOADCLASSIFIERNational_Analyst
WITH
(
WORKLOAD_GROUP = ‘analyst’
,IMPORTANCE=HIGH
,MEMBERNAME =‘National_Analyst_Login’)
Azure Synapse
Analytics

What if you want to
prioritize the workloads
that get access to
resources?
12 1011
Running Queued
34567 98 12101112
Scheduler without importance
9 10
QueuedQueued
CEOCEOCEO
By default, workloads are run on a first-in first out basis.By default, workloads are run on a first-in first out basis.
Workload importance

With workload
importance,
prioritized workloads
take precedence
12 1011
Running Queued
34567 98 12
Scheduler With Importance Turned On
12
Queued
CEOCEO
LowNormal NormalHigh
CREATE WORKLOAD CLASSIFIER classifier_name
WITH
(
WORKLOAD_GROUP = 'name’ ,
MEMBERNAME= 'security_account' [ [ , ]
IMPORTANCE= { LOW | BELOW_NORMAL | NORMAL (default) | ABOVE_NORMAL | HIGH }])
Workload importance

Intra cluster workload isolation
(Scale in)
Marketing
CREATE WORKLOAD GROUP Sales
WITH
(
[ MIN_PERCENTAGE_RESOURCE = 60 ]
[ CAP_PERCENTAGE_RESOURCE = 100 ]
[ MAX_CONCURRENCY = 6 ] )
40%
Data
warehouse
Local In-Memory + SSD Cache
Compute
1000c DWU
60%
Sales
60%
100%
Workload aware
query execution
Workload isolation
Multiple workloads share
deployed resources
Reservation or shared resource
configuration
Online changes to workload policies
Workload Isolation

Category Feature
Data Protection
Data in transit 
Data encryption at rest 
Data discovery and classification 
Access Control
Object level security (tables/views) 
Row level security 
Column level security 
Dynamic data masking 
Column level encryption 
Authentication
SQL login 
Azure active directory 
Multi-factor authentication 
Network Security
Managed virtual network 
Custom virtual network 
Firewall 
Azure ExpressRoute 
Azure Private Link 
Threat protection
Threat detection 
Auditing 
Vulnerability assessment 
Isolation
Dedicated metadata store 
Hosted in customer tenant 
Best-in-class Security
Customer & System Managed Keys
All data encrypted by default
Up to 3x levels of data encryption at rest
Democratize data at scale with fine-grained ACL
Proactive protection
Comprehensive Compliance

HIPAA /
HITECH
IRS 1075 Section 508
VPAT
ISO 27001 PCI DSS Level 1SOC 1 Type 2 SOC 2 Type 2 ISO 27018Cloud Controls
Matrix
Content Delivery and
Security Association
Singapore
MTCS Level
3
United
Kingdom
G-Cloud
China Multi
Layer
Protection
Scheme
China
CCCPPF
China
GB 18030
European Union
Model Clauses
EU Safe
Harbor
ENISA
IAF
Shared
Assessments
ITAR-ready
Japan
Financial Services
FedRAMP JAB
P-ATO
FIPS 140-2 21 CFR
Part 11
DISA Level 2FERPA CJIS
Australian
Signals
Directorate
New
Zealand
GCIO
Industry-leading compliance

Eliminate network maintenance
One-click enables automated management of virtual
networks between cluster endpoints
Synapse resources only ever interop with private
endpoints
No management of subnets or IP Ranges
Prevents data exfiltration
Compliance Boundary

More than just data security
Native integration with Azure Purview
Automatically discover and classify data assets
End-to-end data lineage

Real-time Operational Analytics
Push rules for enforcement
to SDKs in data sources
Eliminate latency and accelerate decision making

Real-time operational analytics
One-click enablement in Azure Portal
No data integration pipelines required
Near-zero impact on operational systems
Latency <90s at 99
th
percentile
Azure Cosmos DB
Analytical Store
Column storeoptimized
for analytical queries
Transactional Store
Row store optimized for
transactional operations
Azure Synapse
Analytics
Cloud-Native HTAP
Azure Synapse Link

Machine Learning
Push rules for enforcement
to SDKs in data sources
Empower everyone with predictive insights

Democratize data science to all
Synapse makes predictive analytics accessible to all
Notebooks provides a code authoring experience for
complex predictive models
Automatic ML graphical interface provides a no-
code experience for creating ML models
Native integration with Azure Cognitive Search
provides access to pre-built models
All Code Low/No-Code Pre-built models

Code-first ML model development
PySpark, Scala, and C# languages supported
Automatic code completion (Intellisense)
Author multiple languages in a single notebook
Analyze data from the data warehouse, data lake,
and real-time operational data from one place

Data+Languages
Languages such as SQL, PySpark, Scala and C# in
support of data science and data warehouse
workloads
The data lake supports and unlimited set of file
formats including Parquet, ORC and Json as well as
audio, image, and video formats
Language
Data

All you need is data
Fully automated feature exploration

Code-free in Synapse Studio
No-code creation on Machine Learning models
Democratize ML to everyone since no data science
domain knowledge required
Support for ensemble models
Supports classification, regression, and time-series
forecasting

Code-free in Synapse Studio
No-code references to Machine Learning models
Democratize ML to everyone since no data science
domain knowledge required
Easily embed in SQL Stored Procedures for
transformation of Views for reporting

SELECTd.*, p.ScoreFROMPREDICT(MODEL= @onnx_model, …
In-engine ML Scoring
Machine Learning models executed using SQL
“In-engine” for performance and scalability
No data leaves the platform for scoring
No additional cost for scoring
T-SQL Language
Synapse SQL
Model Data Predictions

Data Integration
Code-free Hybrid Data Integration
Push rules for enforcement
to SDKs in data sources

Cloud native ETL/ELT
95+ connectors available
Secure connectivity to on-premise data sources,
other clouds, and SaaS applications
Code-first and low/no code design interfaces
Schedule and Event based triggering
Code-free

Code-free data
wrangling
No/low-code data
transformation
Excel-like interface is familiar to users
Transform data to desired shape completely visually
Operationalize into pipelines

Real-time operational
analytics
No data integration pipelines required
Near-zero impact on operational systems
Latency <90s at 99th percentile

Accelerate time to solution
Azure Open Data sets
Pre-built samples to accelerate development

SQL Scripts

Notebooks

Data Pipelines

Build dashboard in Synapse Studio
Code-free experience for development rich
visualizations
One-click publishing to for secure consumption
across the enterprise

Questions
Azure Synapse Analytics for Data Engineers | Microsoft Azure