[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Architectures and Data Catalogs - Boris Perkovic

DataScienceConferenc1 100 views 25 slides Sep 21, 2024
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

In this engaging 20-minute talk, we'll explore how modern cloud architectures and advanced data catalog tools can revolutionize the way technical and business units collaborate. Participants will gain valuable insights into leveraging cutting-edge technologies to enhance data-driven decision-mak...


Slide Content

Bridging the Technical-Business Divide with Modern Cloud Architectures and Data Catalogs www.scaletechplatforms.com DSC DACH 2024

Table of About 01 02 03 04 Modern technologies Problem definition Data catalogs Snowflake 05 06 07 08 dbt Atlan Summary CONTENTS

Presenting Boris Perkovic CEO & Solution Architect Professional Expertise: Designing and optimising data pipelines. Leading cloud-based architecture projects to enhance data accessibility and insights. Skills Snapshot: Technologies: Snowflake, AWS, dbt, Airflow, Terraform Programming: SQL, Python Specialties: Cloud Services, Data Warehouse, ETL & Orchestration, Data Governance

ABOUT ScaleTech Platforms brings together specialised skills in cloud architecture, data engineering, and analytics. Cloud Services C loud engineering and data services Focusing on optimising data infrastructure and leveraging advanced cloud technologies Core Expertise D ata infrastructure optimisation Cloud architecture design Development of analytical applications to drive business insights Key Technologies Snowflake AWS services dbt Labs

TECHNOLOGIES Rapid adoption and growth Emergence of advanced services Focus on security and compliance Modern Rise of multi-cloud and hybrid cloud strategies Shift towards serverless and edge computing

Problem Tech - Business Gap data structure != business terms data understanding Data silos limited data access inconsistent data terminology inconsistent reporting Organizational Efficiency increased load on tech teams long time to get insights on business teams DEFINITION

Standardised definitions Cross-functional alignment Contextual information Hierarchical relationships Common language and terminology User-friendly search Metadata-rich results Data previews Recommendation Self-service data discovery CATALOGS Data

Origin tracking Impact analysis Quality metrics Usage statistics Data Context and Lineage Commenting and discussions Tagging and annotations Rating and reviews Knowledge sharing Collaboration Features CATALOGS Data

Advanced Comprehensive support for diverse data ecosystems Advanced AI/ML capabilities for metadata management and data discovery Robust data lineage, impact analysis, and governance features High level of customisation and scalability Integration with broader data management and analytics tools Suitable for large enterprises with complex data landscapes Intermediate Support for diverse data sources and platforms Enhanced metadata management and data discovery features Basic data lineage and governance capabilities Moderate level of customisation and scalability Suitable for medium-sized organisations or those beginning to scale their data operations Basic Platform native Limited to a single platform or ecosystem Basic metadata management and search capabilities Often included as part of a larger data management suite Typically easier to set up and use within the native environment Types

Cloud-native architecture public cloud hosted multi-cloud support seamless integration with other cloud services automatic updates and maintenance global availability and data residency options Cloud-based data platform that provides a unified solution for storing, processing, and analysing large volumes of structured and semi-structured data. It architecture separates compute and storage, allowing for independent scaling of resources, while offering features like instant elasticity, secure data sharing, and support for diverse workloads including data warehousing, data lakes, and data science. Snowflake Optimised storage adaptive caching micro-partitions auto clustering time travel zero copy cloning Elastic multi-cluster compute super-sized virtual warehouses (compute) scaling up and out search optimisation service SQL, Python, Java Advanced features out of the box data governance advanced security options native ML capabilities containerisation Jupiter notebooks support

Snowflake

Snowflake

Centralised metric definitions consistency across the organisation reduced redundancy (DRY) Simplifies maintenance and updates of metric logic Improved data governance with single source of truth for metrics SQL-based transformations easy acceptance due SQL based transformations complex data transformations within data warehouse integrates easily with modern data warehouses Integrated testing and version control write and run tests on your transformations version control of SQL transformations collaboration by tracking changes and allowing for code reviews reliability and reproducibility Enhanced collaboration shared environment for modelling and transformation documentation features code reuse through modular design efficiency by standardising processes A software that enables data teams to transform data in their warehouses using SQL-based models. It adds a Semantic Layer for centralised metric definitions. Semantic Layer allows teams to define, document, and version control their key business metrics alongside their data models. It brings consistency across analytics tools and promotes unified understanding of data. dbt Labs

dbt Labs

dbt Labs

dbt Labs

dbt Labs

Intuitive data discovery and search Natural language search AI-powered recommendations Rich metadata Visual data catalog Active metadata management Real-time insights Proactive alerts Automated metadata collection Metadata versioning End-to-end data lineage Column-level lineage Cross-tool lineage Impact analysis Lineage visualization Embedded collaboration In-platform discussions Data asset sharing knowledge management Role-based access control A modern data catalog and governance platform that centralises the data management. It enables intuitive data discovery and automated metadata management. It features end-to-end data lineage and collaborative tools, helping teams to find, understand, and trust their data effectively. Atlan

Atlan

Atlan

Atlan

Atlan

Atlan

Main Focus Hosting Data catalog maturity Snowflake Data platform SaaS Basic dbt Transformation layer SaaS/self-hosting Intermediate Atlan Catalog SaaS High CONCLUSION

CONTACT www.scaletechplatforms.com [email protected] https://www.linkedin.com/company/scaletechplatforms Vienna, Austria THANK YOU! Q&A
Tags