The role of Dremio in a data mesh architecture

PaoloPlatter 47 views 16 slides Sep 24, 2024

Slide 1 of 16

About This Presentation

How to use Dremio a standard interface to implement Output Ports in a Data Mesh architecture

Size: 2.84 MB

Language: en

Added: Sep 24, 2024

Slides: 16 pages

Slide Content

The role of Dremio in a Data Mesh architecture Presented by: Paolo Platter – CTO & Co-founder @ Agile Lab

Who we are : We value transparency, collaboration and results Totally decentralized and self-managed International culture and mindset Customer laser focused What we do: Data Engineering is our mission since 2013 Crafting end-to-end data platforms Data Strategy Managed Data Service www.agilelab.it

Data Mesh Principles Domain Driven Data Ownership Architecture Data as a product Self-Serve Infrastructure as a Platform Federated Computational Governance

Data Product Data + Metadata (syntax+semantic, expected behaviour, access control ) Infrastructure Data Pipeline Data Access API Observability API Internal processes (GDPR, DQ, etc ) Stream Processing Information API Control Ports code data Input Ports Operational systems Other Data Products External services Output Ports Events SQLView Raw/Files Graph/RDF infrastructure

Technology Independence Addressability Interoperability Self-Serve provisioning Independently deployable Data Mesh is a practice Each Data Product team can select the technology that best fits the use case. The technology must be compliant with Data Product features and requirements Multi-cloud needs

Output Ports Output Port API Data Consumer Descriptive schema Audit Access Control Decoupling(uri and protocol) SLO Read data through native protocol Events SQL Files Output Port API Data Consumer GraphQL or HTTP Data is flowing throught API Zero coupling, low performances Not suitable for all use cases of data consumption Events SQL Files Data is flowing throught native protocol Low coupling, good performances Fully Polyglot Pre-flight

www.agilelab.it Problem

Connecting a BI tool to an Output Port BI Tool SQL Files Output Port API GraphQL or other HTTP based protocols are not widely supported by BI Tools. Also thinking to have a custom pre-flight and dynamically discover the protocol of the source is something not easy In order to query directly a file/object storage you need a SQL Engine, tipically not available inside BI Tools JDBC/ODBC connection is a good and standard option for BI Tools, but this is hiding problems

A SQL Engine is also needed Data Products should also embed some query capabilities and offer them to data consumers. It can happens leveraging a DBMS tecnology ( storage and query capabilities all-in-one ) Otherwise you can rely on object storages and distributed file systems and then you need a SQL Engine ( Athena, BigQuery, etc. ) to query them. Application Data Analyst Data Scientist Query execution Data Products Outputport SQL Outputport Raw

Client-side coupling is not good Consumer JDBC driver Athena JDBC driver Redshift JDBC driver Aurora One driver doesn’t fit all Coupling is becoming a problem for change management DP is not indipendently deployable Resons why you need multiple technologies in a data mesh: Not all the use cases fit with a single tech Data Mesh is an evolutionary architecture, technologies will evolve and change over time and DPs will adopt them indipendently Your data mesh is expanding on a multi-cloud landscape

How to integrate legacy systems Data Mesh Data Lake Migration will require time ... Consumer jdbc jdbc What if we need to join data coming from different JDBC channels ? Huge impact on performances Join must be resolved at this level

www.agilelab.it Solution

Dremio Uniqueness SQL Execution Engine OLAP Accelleration Virtualization Extensible Semantic Layer Scalable Cache Cloud & Tech Agnostic Flexible & Self Serve deployment

Fitting into the big picture Native integration with data lakes Bridge also other enteprise assets Data Consumer Single interface to access all the silos and no coupling between data consumers and multiple specific technologies Single catalog of data You can use it as SQL Query Engine inside Data Products Interfacing other DBMS Cloud agnostic Efficient join between DPs leveraging different underlying technologies Query federation between data mesh and other data assets across the organization Native integration with DataLakes to facilitate the transition to the DataMesh

Self-Serve provisioning API Catalog Execution Engine ACL CI/CD Data Product specification Provision: Execution resources SQL View ACL Provision: Storage DB Policies IAM etc Deploy

Data-Product Caching Main Entity Pre-aggregated and denormalized views Query Acceleration & Caching Leverage external reflections to speed-up queries automatically without adding complexity to the data consumer Query Dremio can create such pre-aggregation automatically without the need to implement custom jobs for such purpose Data Consumer Data Consumer interacts with a single logical entity, but queries will speed-up due to the cache and reflections.

The role of Dremio in a data mesh architecture

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

The role of Dremio in a data mesh architecture

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......