The OpenMetadata Community Meeting was held on September 18th, 2024. In the Community Spotlight, Erica Beratn from Loggi (https://www.loggi.com.br/en/) explains us their data governance best practices at Loggi, a cloud-native logistics company. Discover how their OpenMetadata adoption led to signifi...
The OpenMetadata Community Meeting was held on September 18th, 2024. In the Community Spotlight, Erica Beratn from Loggi (https://www.loggi.com.br/en/) explains us their data governance best practices at Loggi, a cloud-native logistics company. Discover how their OpenMetadata adoption led to significant cost savings with the removal of 16,000 dashboards!
Size: 1.57 MB
Language: en
Added: Sep 19, 2024
Slides: 21 pages
Slide Content
1
September 2024 / Data Management
Erica Bertan
How Open Metadata
helps us in the Data
Governance at Loggi
8+ years of experience in data and software engineering teams
Software Engineering: microservices, unit tests, spark, docker
Data: analytics, visualization tools, data modelling, data quality
Strategy, communication and leadership
Erica Bertan
Analytics Engineering Manager
about Loggi
Brazilian logistic company
About Loggi
● 10+ years of experience delivering packages
● 300,000 of packages / day
● Hubs and distribution centers in the 27 federal states of the
country
● US$ 1 billion of investments in 7 years (SoftBank, Microsoft, GGV
Capital, Monashees, Kaszkek and others)
● 2019: a brazilian unicorn
Challenges
● Continental country, several modus operandi to delivery
package
● Complex logistic chain
● More than 2 thousand of workers spread across the country
● Every federal state with particularities
●how we are doing Data Governance at Loggi at the
moment
●what works for us
●how Open Metadata fits in the Data Governance
marathon
Goals
●deep dive about tools
●technical aspects of our infrastructure
●the pros and cons of the stack
Not goals
the problems
Problem 1: Communication and
definition of responsibilities
"Who can I ask about the business context of this model?"
Problem 2: Data organization
"Where can I find the correct data in order to produce insights?"
"How can I even start?"
Problem 3: Data reporting
inconsistencies
"Which metric is the correct one?"
● 18,000 of dashboards and looks
● 50 looker models
Problem 4: Complex
structures
"Which table should I use?"
● package_events versus package_register
data: our big numbers
~100
ETL Jobs
The midnight job processes
almost 500 hundred tables in 8
hours
data: our big numbers
776
Looker users
42TB DL
100TB DW
~1,8k
Looker Dashboards
200 GB
new data
daily volume
9 million
new records of
package's tracking/day
2,5 hours/day
Average daily usage
9,4k
tables
Storage
how Open Metadata fits?
how Open Metadata fits?
Definition of Ownership
"Who can I ask about the business context of this model?"
how Open Metadata fits?
Data Lineage
"What's the impact of this model new release?"
how Open Metadata fits?
Deletion of dated dashboards
"This dashboard is not used anymore" - from 18 thousand to 1,5 thousand
The process occurred in the following sequence:
1.We listed all the dashboards.
2.We informed the company.
3.People marked what they used.
4.We deleted everything that wasn't marked.
how Open Metadata fits?
Catalog
"Whats the meaning of this table/column table?"
how Open Metadata fits?
Data Quality
"Can we add robustness to these models?"
how Open Metadata fits?
Alerts
Proactiveness and observability building trust
timeline
Jun-Dec
2023
Cleaning the house
✅ Ownerships
✅model refactoring: midnight
job 30% faster
✅deletion of unused/dated
dashboards: from 18 -> 1.5
thousand
Jan-Jun
2024
Gold metrics and data
quality
✅building trust: developing
17+ test cases of data quality
on top of important models
✅catalog: documenting 250+
of our data sources
Jun-Dec
2024
Governance
✅ Deletion of unused/dated
tables: recovery of U$D
2000/month
?????? Organization of models
and permissions on Looker
?????? Organization: ownerships,
metadata, data quality