SAP Data Hub – What is it, and what’s new? (Sefan Linders)

tbroek 1,129 views 28 slides Nov 30, 2018
Slide 1
Slide 1 of 28
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28

About This Presentation

SAP Inside Track talk by Sefan Linders

Data Hub – What is it, and what’s new?
It’s a year after the launch of Data Hub, and the new 2.3 version has just been released. We’ll take a look at the new metadata features, discuss several use cases, and it’s position within the ever growing data...


Slide Content

PUBLIC
SAP HANA Data Management Suite
Sefan Linders
Big Data Warehouse Architect
Customer Innovation & Enterprise Platform
November 2018

2PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Legal disclaimer
The information in this presentation is confidential and proprietary to SAP and may not be disclosed without the permission
of SAP. This presentation is not subject to your license agreement or any other service or subscription agreement with SAP.
SAP has no obligation to pursue any course of business outlined in this document or any related presentation, or to develop
or release any functionality mentioned therein. This document, or any related presentation, and SAP’s strategy and possible
future developments, products, and platforms, directions, and functionality are all subject to change and may be changed
by SAP at any time for any reason without notice. The information in this document is not a commitment, promise, or legal
obligation to deliver any material, code, or functionality. This document is provided without a warranty of any kind, either
express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose,
or noninfringement. This document is for informational purposes and may not be incorporated into a contract. SAP assumes
no responsibility for errors or omissions in this document, except if such damages were caused by SAP’s willful misconduct
or gross negligence.
All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements,
which speak only as of their dates, and they should not be relied upon in making purchasing decisions.

3PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
What problem are we addressing?
Business users need to
have all the data
relevant to their
decision and they need
to trust the security
and accuracy of their
data
Businesses need to
harness the power of
all their data –
business and new data
types –and to
anticipate and influence
business outcomes
Businesses need to
provide all users with
the right information
in context at the right
moment forthe task at
hand

4PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Supply Chain Finance HR Manufacturing Sales Connected Assets
Third-party Finance
and Planning
Visualization ToolsStatistical AnalyticsSpreadsheets
SAP
BusinessObjects
Decision
Intelligence
Systems
TACTICAL REPORTS FUNCTIONAL REPORTS STRATEGIC REPORTS INNOVATION APPS
BW
Today: Data sprawl, impossible to govern, security complexity

5PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Business situation and implications
Most enterprises now have data in 6-8 clouds
Data has become less accessible due to the proliferation of
cloud based solutions and business unit build applications further
fragmenting the data landscape
Company’s understanding of their customers, suppliers,
products has been in decline, caused by data being inaccessible
Substantial legal risks due to lack of governance, e.g. GDPR
Difficulty of operationalizing data science use in everyday
business processes

6PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Business situation and implications
Most enterprises now have data in 6-8 clouds
Data has become less accessible due to the proliferation of
cloud based solutions and business unit build applications further
fragmenting the data landscape
Company’s understanding of their customers, suppliers,
products has been in decline, caused by data being inaccessible
Substantial legal risks due to lack of governance, e.g. GDPR
Difficulty of operationalizing data science use in everyday
business processes
More
trusted
data
More
connected,
intelligent
data
More
cloud and
architecture
flexibility

8PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Third-party Finance
and Planning
Visualization ToolsStatistical AnalyticsSpreadsheets
SAP
BusinessObjects
Decision
Intelligence
Systems
Supply Chain Finance HR Manufacturing Sales Connected Assets
Vision: Common data model, all data used by everyone, simple
TACTICAL REPORTS FUNCTIONAL REPORTS STRATEGIC REPORTS INNOVATION APPS
BW
SAP HANA DATA MANAGEMENT SUITE
In-Memory Data Management | Single logical data model across entire organization | Data Flow Modeling and Control | Insights from powerful analytics engines

11PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
SAP HANA Data Management Suite
Trusted Data | Connected, Intelligent Data | Cloud Architecture Flexibility
SAP Intelligent Enterprise Suite SAP Leonardo and SAP Analytics Cloud Third-Party Applications
SAP HANA Data Management Suite
In-memory
transaction & analytics
Data discovery &
governance
Data orchestration
& integration
Data cleansing &
enrichment
Data storage &
compute
SAP HANA
SAP Data Hub
SAP Enterprise
Architecture Designer
SAP Big Data Services
Third-Party
Services & Products
Spark
Hadoop
Third-party
Databases
Third-party Data
Management
Hybrid Cloud Management
SAP Cloud Platform
Business
Data
Cloud
Application Data
IoT Spatial Social Image
On Premises Multi-Cloud
SAP Add-On API
Services & Products
SAP HANA
Spatial services
SAP HANA
Blockchainservice
SAP HANA
Streaming Analytics
Other SAP Cloud
Platform and SAP
Leonardo Services
SAP EIM Solutions
Hybrid

13PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Development platform for
applications that need analytics
on real-time transactions
Harmonized UX across
administration and development
tools
Data governance, anonymization,
and pipeline flow to protect and
refine data across the landscape
Modelling across business,
data, and technology
Applied AI to automate data
operations and pre-defined
business application scenarios
In-memory multi-model analytics
and data processing on a
distributed computing framework
Common metadata catalog,
business models, and
comprehensive data governance
SAP HANA Data
Management Suite
SAP HANA Data Management Suite
Common capabilities today and tomorrow
On Premise | Hybrid | Multi-cloud
Today Future
Seamless Cloud Service

14PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
2019
SAP HANA and SAP Data Hub engine integration
Shared capabilities in SAP HANA & SAP Data Hub: spatial data,
SQL, Graph, Doc store and common SQL
Connectivity
Automatic connectivity between HANA and Data Hub
Lifecycle management / DevOps / deployment
Data Hub as a Service (beta)
Tooling / UX
Consistent navigation through HDMS tooling
Meta data model and content repository
CommonMeta Data Catalog across HDMS and 3rd partystores
and data orchestrationwith end-to-end lineage
Security and system enablement
Enhanced secure connections between HDMS components
(hybrid, multicloud, on-premises)
SAP HANA and SAP Data Hub engine integration
Extension of shared capabilities in HANA & Data Hub: spatial (adv), graph & doc data types, loading of parquet or OCR files
Data tiering
Data Tiering as cloud service with BDS integration & HANA Nativestorage extension
Lifecycle management / DevOps / deployment
Scenario basedHDMS deployment of HANA & Data Hub in SAP Cloud Platform
Cross-cloud federation support
One Backup,recovery, and High Availability approach
Common Lifecycle handling –content lifecycle, platform lifecycle(e.g. upgrade) across all HDMScomponents and engines
Further deployment options for cloud providers & data center
Deeper EAD integration w meta data catalog and lineage
Data Science
Data Hub to execute pipelines using commonML libraries (PAL, APL) with HANA, consume additional ML frameworks
(Leonardo,3rd party services, etc.)
Common custom ML operatorsfor TF & R serving deployed by Data Hub and consumed in HANA
Tooling / UX
Alignment and harmonization of tooling
Meta data model and content repository
Partner ecosystem for SAP Hana Data Management Suite content
Security and system enablement
Streamlineduser management, authorizations and authentications across full logical DW managed by HDMS
SAP HANA Data Management Suite
Roadmap
20192018
The SAP HANA Data Management Suite roadmap follows a ‘cloud first’ strategy. Relevant capabilities will be available in on-premises versions on later dates.

15PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Big Data
Warehouse
Leonardo
Platform
S/4HANA
Expansion
Spatial
Analytics
Analytics
Data Mart
SAP
BW/4HANA
SAP
Leonardo
SAP
S/4HANA
SAP
HANA
Earth
Observation
Analysis
SAP Cloud
Platform
Spatial
SAP
HANA
SAP Data
Hub
Business
Intelligence
Tools
Multiple Patterns from One Architecture
Cloud and architecture flexibility
Cloud freedom for data systems, applications, and system development
SAP
HANA
SAP Data
Hub
Big Data
services
from SAP
SAP EA
Designer
SAP
HANA
SAP Data
Hub
Big Data
services
from SAP
SAP EA
Designer
SAP
HANA
SAP Data
Hub
Big Data
services
from SAP
SAP EA
Designer
Business
Intelligence
Tools
SAP EA
Designer

16PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
SAP HANA Data Management Suite
Trusted Data | Connected, Intelligent Data | Cloud Architecture Flexibility
SAP Intelligent Enterprise Suite SAP Leonardo and SAP Analytics Cloud Third-Party Applications
SAP HANA Data Management Suite
In-memory
transaction & analytics
Data discovery &
governance
Data orchestration
& integration
Data cleansing &
enrichment
Data storage &
compute
SAP HANA
SAP Data Hub
SAP Enterprise
Architecture Designer
SAP Big Data Services
Third-Party
Services & Products
Spark
Hadoop
Third-party
Databases
Third-party Data
Management
Hybrid Cloud Management
SAP Cloud Platform
Business
Data
Cloud
Application Data
IoT Spatial Social Image
On Premises Multi-Cloud
SAP Add-On API
Services & Products
SAP HANA
Spatial services
SAP HANA
Blockchainservice
SAP HANA
Streaming Analytics
Other SAP Cloud
Platform and SAP
Leonardo Services
SAP EIM Solutions
Hybrid

22PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Key capabilities
▪Strategy: Define the business strategy with common business
architecture standards to build a plan to act
▪Design: Create business and technical architecture using industry-
standard models to define the implementation
▪Implementation: Align development with strategy and design to drive or
represent the implementation
▪Consume: Communicate understanding and drive action across all
stakeholders
SAP Enterprise Architecture Designer
Architecture and design
Cloud | On premise
DeveloperBusiness user Architect
Strategy Design Implementation
SAP Enterprise Architecture Designer
Knowledge worker
Landscape Big DataDatabasesRequirementsCapabilitiesProcesses

23PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

SAP Data Hub
What’s New

26PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
What is SAP Data Hub?
The Lego Analogy
Streams: live data feed(e.g.
audio, video, twitter)
Events : alert/notification (e.g.
IoT)
Semi-structured: JSON, XML
Structured: RDBMS, CRM, ERP,
Legacy, File, etc.
Unstructured: PPTs, Words,
video, audio, image
Information Catalog | Monitoring & Scheduling | Orchestration | Pipelines
Hybrid
Stream
Subscribe
Ingest
Validate
Transform
Enrich
Compute
Machine Learning
Mask
Custom
Code Image
Processing Compute
Refine Publish
Trigger Action
Data
Consumption
Disparate
Data Landscapes
Intelligent apps
Automated processes
On-Premises Cloud

27PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Release Cycle -SAP Data Hub version2.3
SAP Data Hub 1.4
SAP Vora 2.2
Innovation
SAP
Data Hub
2.3
Release Scope:
Lean deployment and installation
with a complete containerized
setup ready for any deployment
Unified User Experience in one
modeling environment
Introducing Metadata Explorer
and Cataloging
Unifying SAP Vora & SAP Data
Hub release cycle with a
synchronized delivery
Motivation: Enables enterprises to build scalable data-driven applications rapidly

28PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Release Theme –SAP Data Hub version 2.3
Deployment &
Consumption
User
Experience
Metadata
Governance
Data Integration &
Processing
•Deployment on cloud
environments with
managed Kubernetes
•Individual SAP Data Hub
Applications
•All components are
containerized
•Unified Modeling Tool for
Workflows, Pipelines and
Data Transforms
•Self Service Data
Preparation with SAP Agile
Data Preparation
•Comprehensive Monitoring &
Diagnostic Framework
•Information Catalog to
discover, define and
understand sources
•Search for Metadata
attributes and Tags
•Automated Metadata
Crawling for SAP HANA,
Cloud Stores, & SAP Vora
•Enhanced Connectivity
(Databases, Big Data
Stores, Cloud native
Technologies)
•Data Integration into SAP
S/4HANA, SAP Coud
solutions (Hybris, etc),
Master Data management
•Data Quality Management

29PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Deployment & Consumption
Cloud Deployments and Decoupling of Hadoop & Hana
Simplified deployment of SAP Data Hub
in cloud and on-premise environments
•All components are fully containerized and delivered as
Docker images including SAP HANA
•remove the pre-requisites of installing SAP HANA database and
XS advanced.
•remove Hadoop as a pre-requisites of setting up a Hadoop
cluster
•Decoupling data processing from storage platforms (any
supported cloud stores). All runtime execution is now
occurred in Kubernetes
•Deployable on most popular Kubernetes managed
environments*. Supports:
•managed Kubernetes Services of the major cloud providers (i.e.
AWS, Microsoft Azure, Google Cloud Platform),
•private cloud, and
•on-premise installations
* See Product Availability Matrix for detailed version dependencies

30PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
User Experience
Introducing Launchpad –a fresh new look UI
One central entry point to all services and applications
•Connection Management
•Monitoring
SAP Data Hub v2.3
•Metadata Explorer
•Modeler

31PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Metadata Governance
SAP Data Hub Metadata Explorer
A centralized location for
browse connections | monitoring | metadata catalog | search datasets | publications | labels

32PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
The connectivity framework (Flowagent) serves as the underlying infrastructure with
the goal to rapidly grow and enhance the native connectivity and integration
functionalities:
Data Integration & Processing
The unified connectivity framework
SAP Data Hub
Metadata & Applications
SAP Data Hub Connectivity
Framework (FlowAgent)
Metadata
Extractor
Adapter
HDFS, BW4HANA, Oracle, S3, …
1.Metadata Services (Browsing, Profiling, Data Preview)
Hadoop (HDFS)
Cloud Object Storages (AWS S3, GCP GCS, Azure Data Lake,
WASB)
Oracle*, ABAP/ODP*, OData*
2.Connection Operators (Consumer, Producer)
HDFS, S3, GCS, ADL, WASB
Oracle**, ABAP/ODP**, OData**
Support custom adapters
3.Spark code generation
•HDFS
*profiling is planned in future release
**producer is planned in future release

33PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Orchestration (external):
▪SAP BW Process Chain
–Trigger execution of a process chain on a BW system
▪Data Transfer (BW)
–Transfer data from a BW system into Vora tables (created on the fly)
▪Data Services
–Execute remote data servicesjobs((demo)
▪SAP HANA Flowgraph
–Trigger execution of a HANA flowgraph using SDI REST API (XSC)
▪Spark / Hadoop
–Submit Spark jobs, Hive queries, etc. to Hadoop clusters
Execution (internal):
▪Pipeline
–Start a pipeline on a local or remote SAP Data Hub Pipeline engine
–Wait for completion of pipeline (or if set continue immediately)
▪Data Transform
–Run relational transformations (join, union, filter, etc.) on structured data
(tables, CSV, Parquet, etc.)
User Experience
Workflows Definition

34PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Connectivity:
Connectivity via Flowagent:
DQMm: Leonardo ML:
Data Integration & Processing
Predefined Connectivity Snapshot
-Azure Data Lake (ADL)
-Local File System (file)
-Google Cloud Storage (GCS)
-HDFS
-Amazon S3
-Azure Storage Blob (WASB)
-WebHDFS
SAP Vora:
Spark / Hadoop:
-Spark
-Spark SQL
-PySpark
-Hive

35PUBLIC©2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Subengines:
▪Develop and compile new operators locally using SDKs
▪Register and run custom operators in available pipeline subengine
Process / Command Executors:
▪Run a process within a pipeline and give contiguous stream to it
▪Run a shell command for each arrival of a message within a pipeline
Programming Operators:
▪Write and run custom scripts for data manipulation within a pipeline
▪Build re-usable operators in different programming languages
Data Integration & Processing
Data Processing

Thank you.

© 2018 SAP SE or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See https://www.sap.com/copyrightfor additional trademark information and notices.
www.sap.com/contactsap
Follow us