openmetadatacollate
94 views
29 slides
Aug 09, 2024
Slide 1 of 29
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
About This Presentation
The OpenMetadata Community Meeting was held on August 7th, 2024. In this meeting, we discussed the journey of our Data Quality & Observability features and the highlights of our upcoming 1.5 release!
* New Data Quality Dashboard
* Data Diff Data Quality Tests
* Data Freshness Tests
* Anomaly De...
The OpenMetadata Community Meeting was held on August 7th, 2024. In this meeting, we discussed the journey of our Data Quality & Observability features and the highlights of our upcoming 1.5 release!
* New Data Quality Dashboard
* Data Diff Data Quality Tests
* Data Freshness Tests
* Anomaly Detection
* Multiple DQ Pipeline Schedule
* Data Asset & Explore Widget
* Pipeline Widget
* Domains RBAC & Subdomains
* Custom Data Insights Dashboards
●New Data Quality Dashboard
●Data Diff Data Quality Tests
●Multiple DQ Pipeline Schedule
●Improved Alert Templates
●Domains RBAC
●UI Support for Subdomains
OpenMetadata Release 1.5 Highlights
+220 closed issues & +630 merged PRs vs. 1.4.0, May 21st
●Data Asset Explore & Widget
●Pydantic v2 ??????
●GCS, Flink & SAP ERP Connectors
●API as a Metadata Asset
●Ingestion UI Revamp
●Freshness Data Quality Tests
●Anomaly Detection
●Customizable Data Insights
●Azure Synapse Connector
Collate Release 1.5 Highlights
Data Quality and Observability Platform
Current Features Upcoming in 1.5 Roadmap 1.6
●New Data Quality Dashboard
●Data Diff Data Quality Test
●Freshness Data Quality Test
●Multiple DQ Pipeline Schedule
●Anomaly Detection
●Improved Alert templates
●Native DQ Framework
●Create tests from UI
●UI DQ scheduling
●Rich API
●3rd party integrations
●Alerting System
●Incident Management
●RCA with lineage and sampling
New Data Quality Dashboard
●Integrity: Ensure the data remains correct throughout your
transformation process. E.g., Row Count
●Consistency: The same information in multiple instances matches. E.g.,
Data Diff
●Accuracy: The data properly represents the reality. E.g., Data Freshness
●Completeness: Check if important data is missing. E.g., Nullability checks
●Uniqueness: If single records are being used.
●Validity: Data follows business rules or organization standards. E.g.,
Regex tests.
One step closer to easier and better RCA ??????
Data Diff Data Quality Tests
●Ensure Data Consistency
throughout your ETL processes.
●Easily compare the data between
2 tables.
●Select your PK, columns to
compare, and filter your data.
Freshness Data Quality Test
●Ensure Data Accuracy: Am I
checking updated information?
●Select a DATE column and
validate that its contents are no
older than 1 day, 1 hour...
Anomaly Detection
●Some tests are easy to configure:
○I donʼt want nulls (either we have nulls or not)
○I donʼt want repeated values (either values are unique or not)
●Others, require business or functional knowledge:
○Data shouldnʼt be older than 1 day
○Data should be greater than 0
●However, what happens if our data is expected to evolve?
○E.g., is the amount of sales behaving “properlyˮ?
○We want to know if unexpected spikes or drops are happening.
Anomaly Detection
●We can either add bounds manually…
●Or dynamically learn from the data!
Anomaly Detection
●Collate will learn from your data and start assessing values dynamically
Multiple DQ Pipeline
Schedule
●Before:
○1 Table = 1 Data Quality Workflow
○It makes it easier to manage, but thereʼs
limited flexibility
●Now:
○You can create multiple DQ Workflows, each
of them with specific tests!
Data Asset & Explore Widget
●Before creating DQ tests we need to be able to find the right data ??????
●OpenMetadata is a platform for everyone! Technical & Non-technical users
●How can we make it easier to find the right data?
1⃣ New Explore Pannel 2⃣ New Explore Widget
Data Asset & Explore Widget
1⃣.4⃣ 1⃣.5⃣
Data Asset & Explore Widget
●Different people have different needs!!!
●Customize the Landing Page based on your User Personas
●Jump directly to your data
Pipeline Widget
●Remember that you can
create alerts as well!!
Domains RBAC & Subdomains
●Help large companies organize the data that can be owned by multiple teams under a
single domain.
Domains RBAC & Subdomains
●Introducing Subdomains: More flexibility when organizing your data
Domains RBAC & Subdomains
●Adding RBAC controls to allow only users from a specific domain to visualize the data
●You can still use Data Products to showcase publicly consumable data assets!!
API as a Metadata Asset
●Vision: Capture metadata from every part of the Organization
●Create API Services to capture Collections/Endpoints and their requests and
response schemas
●For now: Backend & UI. Weʼll add connectors on the next release.
●Goal: Provide e2e RCA, compliance and observability on your data stack:
Custom Data Insights Dashboards
●Data Insights help Organizations understand how their data is growing in different
dimensions:
○Are we properly setting tiers? Owners? Domains? Descriptions?
●These are critical metrics to understand the data culture in an organization: Are we
helping users make the most out of our data?
●Moreover, we want actionable insights:
○KPIs help drive goals and empower the users
●We want to give you the flexibility to define your important metrics and KPIs
○Introducing Custom Data Insights Dashboards & KPIs
○First release in 1.5, focused on Data Assets Tables, Dashboards,...)
○Weʼll cross this with Glossary & DQ data in 1.6
Star us on GitHub
https://github.com/open-metadata/OpenMetadata