OpenMetadata Community Meeting - 7th August 2024

openmetadatacollate 94 views 29 slides Aug 09, 2024
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

The OpenMetadata Community Meeting was held on August 7th, 2024. In this meeting, we discussed the journey of our Data Quality & Observability features and the highlights of our upcoming 1.5 release!

* New Data Quality Dashboard
* Data Diff Data Quality Tests
* Data Freshness Tests
* Anomaly De...


Slide Content

OpenMetadata
Community Meeting
August 2024 ??????

●Community Updates & Metrics
●Community Spotlight - Thndr⚡
●Release 1.5 Highlights
●Q&A


Agenda

4960
GitHub Stars
Open Source
Developers
262
Community
Members
6398
Community
2515
+2
+173
+260
+84
Community Stats

Community Metrics
●252 PRs in the last 4 weeks
●579 Qs in the last 4 weeks

?????? Shin-ichi Hashiba
??????Antoine Balliet
??????Sam McCarty
??????Pedro Buzzi Filho
??????Fuzmish



9 Community Contributions ??????
??????Matt Chamberlin
??????Mariusz Górski
??????Kenji Nakagaki
??????鲁汀

Matt Chamberlin - GCS Storage Connector ??????

OSS Survey ??????
We want to hear from you!

●New Data Quality Dashboard
●Data Diff Data Quality Tests
●Multiple DQ Pipeline Schedule
●Improved Alert Templates
●Domains RBAC
●UI Support for Subdomains


OpenMetadata Release 1.5 Highlights
+220 closed issues & +630 merged PRs vs. 1.4.0, May 21st
●Data Asset Explore & Widget
●Pydantic v2 ??????
●GCS, Flink & SAP ERP Connectors
●API as a Metadata Asset
●Ingestion UI Revamp

●Freshness Data Quality Tests
●Anomaly Detection
●Customizable Data Insights
●Azure Synapse Connector


Collate Release 1.5 Highlights

Data Quality and Observability Platform
Current Features Upcoming in 1.5 Roadmap 1.6
●New Data Quality Dashboard
●Data Diff Data Quality Test
●Freshness Data Quality Test
●Multiple DQ Pipeline Schedule
●Anomaly Detection
●Improved Alert templates

●Native DQ Framework
●Create tests from UI
●UI DQ scheduling
●Rich API
●3rd party integrations
●Alerting System
●Incident Management
●RCA with lineage and sampling

●Performance improvements
●Custom Dashboards
●Custom Alert Templates

New Data Quality Dashboard

New Data Quality Dashboard
●Integrity: Ensure the data remains correct throughout your
transformation process. E.g., Row Count
●Consistency: The same information in multiple instances matches. E.g.,
Data Diff
●Accuracy: The data properly represents the reality. E.g., Data Freshness
●Completeness: Check if important data is missing. E.g., Nullability checks
●Uniqueness: If single records are being used.
●Validity: Data follows business rules or organization standards. E.g.,
Regex tests.
One step closer to easier and better RCA ??????

Data Diff Data Quality Tests
●Ensure Data Consistency
throughout your ETL processes.
●Easily compare the data between
2 tables.
●Select your PK, columns to
compare, and filter your data.

Freshness Data Quality Test
●Ensure Data Accuracy: Am I
checking updated information?
●Select a DATE column and
validate that its contents are no
older than 1 day, 1 hour...

Anomaly Detection
●Some tests are easy to configure:
○I donʼt want nulls (either we have nulls or not)
○I donʼt want repeated values (either values are unique or not)
●Others, require business or functional knowledge:
○Data shouldnʼt be older than 1 day
○Data should be greater than 0
●However, what happens if our data is expected to evolve?
○E.g., is the amount of sales behaving “properlyˮ?
○We want to know if unexpected spikes or drops are happening.

Anomaly Detection
●We can either add bounds manually…

●Or dynamically learn from the data!

Anomaly Detection
●Collate will learn from your data and start assessing values dynamically

Multiple DQ Pipeline
Schedule
●Before:
○1 Table = 1 Data Quality Workflow
○It makes it easier to manage, but thereʼs
limited flexibility
●Now:
○You can create multiple DQ Workflows, each
of them with specific tests!

Data Asset & Explore Widget
●Before creating DQ tests we need to be able to find the right data ??????
●OpenMetadata is a platform for everyone! Technical & Non-technical users
●How can we make it easier to find the right data?
1⃣ New Explore Pannel 2⃣ New Explore Widget

Data Asset & Explore Widget
1⃣.4⃣ 1⃣.5⃣

Data Asset & Explore Widget
●Different people have different needs!!!
●Customize the Landing Page based on your User Personas
●Jump directly to your data

Pipeline Widget
●Remember that you can
create alerts as well!!

Domains RBAC & Subdomains
●Help large companies organize the data that can be owned by multiple teams under a
single domain.

Domains RBAC & Subdomains
●Introducing Subdomains: More flexibility when organizing your data

Domains RBAC & Subdomains
●Adding RBAC controls to allow only users from a specific domain to visualize the data
●You can still use Data Products to showcase publicly consumable data assets!!

API as a Metadata Asset
●Vision: Capture metadata from every part of the Organization
●Create API Services to capture Collections/Endpoints and their requests and
response schemas
●For now: Backend & UI. Weʼll add connectors on the next release.
●Goal: Provide e2e RCA, compliance and observability on your data stack:

Custom Data Insights Dashboards
●Data Insights help Organizations understand how their data is growing in different
dimensions:
○Are we properly setting tiers? Owners? Domains? Descriptions?
●These are critical metrics to understand the data culture in an organization: Are we
helping users make the most out of our data?
●Moreover, we want actionable insights:
○KPIs help drive goals and empower the users
●We want to give you the flexibility to define your important metrics and KPIs
○Introducing Custom Data Insights Dashboards & KPIs
○First release in 1.5, focused on Data Assets Tables, Dashboards,...)
○Weʼll cross this with Glossary & DQ data in 1.6

Star us on GitHub
https://github.com/open-metadata/OpenMetadata

Join our Slack
https://slack.open-metadata.org/

Follow us on X
@open_metadata

Discover Collate SaaS
https://www.getcollate.io/