Data Quality Management & Approach & Implementation
ssusera92ed61
58 views
30 slides
Sep 04, 2024
Slide 1 of 30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
About This Presentation
Data Quality Management & Approach & Implementation
Size: 2.81 MB
Language: en
Added: Sep 04, 2024
Slides: 30 pages
Slide Content
What is Data Quality & Data Quality Management IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)
2 What is Data Quality & Data Quality Management Data Quality is defined as the degree to which information consistently meets the requirements and expectations of all knowledge workers who require it to perform their jobs. Data quality management is a combination of applications, processes and infrastructures to facilitate consistent, accurate, reliable and timely data to business applications and users.
3 What is Data Quality Key Data Quality dimensions: Accuracy: Ensuring data values represent the “Real World” model. Completeness: Ensuring availability of relevant and critical data. Consistency: Ensuring the available data conforms the specific formats. Correctness: Ensuring correct spelling, conforms to business rule. Integrity: Ensuring single representation of each entity across systems . Timeliness: Ensuring data to be current and recent.
DQM - Approach IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)
5 DQM – Approach Proactive DQM Approach - D iminishes the potential for new data quality problems to arise. Establishing the overall governance. Defining the roles and responsibilities. Establishing the quality expectations and the supporting business practices. Deploying a technical environment that supports these business practices. Specialized tools are often needed in this technical environment. Reactive DQM Approach - addresses problems that already persist. Dealing with problems that are inherent in the data in the existing databases. Data Quality issues resulted due to mergers and acquisitions.
DQM – Implementation Approach IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)
7 DQM - Implementation Approach Transformation & Data Upload Activities: Standardization Matching & Deduping Data Quality Testing Enrichment Data Extraction & Analysis Data Cleansing, Validation & Integrate Iterative approach, continuous quality improvement Activities: -Data Validation -Data Loading Activities: Data Analysis Mapping Cleansing Rules Staging Area Data Extracts … Source Systems Target System Analysis Enhance Standardize Integrate Cleanse Data Flow Business Rule Flow
DQM – Methodology IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)
9 DQM Methodology DQM Phase I – Frame Roadmap DQM Phase II – Analyze DQM Phase III – Cleanse DQM Phase IV – Integrate DQM Phase V – Enhance DQM Phase VI - Monitor
DQM Methodology: Analyze Phase IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)
11 DQM Phase I - ANALYZE Discover Anomalies & Inconsistency – Discover the problem and inefficiency in the data. Discover Redundancy – Discover the frequency distribution of the different value in the data. Discover Relationship – Analyze and discover dependency of the data among source systems. Captures metadata – Analyze and capture metadata.
DQM Methodology: Cleanse Phase IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)
13 DQM Phase II - CLEANSE Consistent Format – Parse content of each field in uniform structure and field lengths. Standardize – Arrange data into preferred uniform abbreviation and Uniform & accurate spelling. Correct – Correct inaccurate data. Capture missing data values complete the data.
14 Data Parsing: The data parsing process identifies, and dismantles individual data elements and place them into appropriate fields. Input Data Mr. Jatin Kumar Mehta, Capgemini Consulting India Pvt Ltd ., Building 8, Mindspace , Thane Belapur Road., Airoli New Bombay, 400709 Parsed Data Salutation: Mr. First Name: Jatin Middle Name: Kumar Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd . Firm Location: Mindspace Building: Building 8 Extra: Thane Belapur Road Locality: Airoli City: New Bombay Post Code: 400709 Cleanse
Parsed Data Salutation : Mr. First Name: Jatin Middle Name: Kumar Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building: Building 8 Extra: Thane Belapur Road Locality: Airoli City: New Bombay Post Code: 400709 Standardized Data Salutation : Mr. First Name: Jatin First Name Match Standard: Jateen Middle Name: Kumar Middle Name Match Standard : Ku., Ku Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building : Bldg 8 Extra: Thane Belapur Rd . Locality: Airoli City: Navi Mumbai Post Code: 400709 15 Data Standardization: The corrected data is standardized according to the required business criteria, according to which the data is matched in the later phase. Cleanse
Parsed & Standardized Data Salutation : Mr. First Name: Jatin First Name Match Standard: Jateen Middle Name: Kumar Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd. Firm Location: Mindspace Building: Bldg 8 Extra: Thane Belapur Rd. Locality: Airoli City: New Bombay Post Code: 400709 Corrected Data Salutation : Mr. First Name: Jatin First Name Match Standard: Jateen Middle Name: Kumar Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building: Bldg 8 Extra: Thane Belapur Rd. Locality: Airoli City: Navi Mumbai State: MH Post Code: 400708 16 Data Cleansing and Correction: Cleanse
DQM Methodology: Integrate Phase IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)
18 DQM Phase III - INTEGRATE Data Matching – Finding duplicate information within database or across databases. Deduplication – Preparing best record out of duplicate information using software applications. Manual Deduplication – Data Stewards identify whether set of records are duplicate or unique. Data Integration – The data integration process consolidates and integrates the data from different data sources.
19 Cleansed Data [Data Source-1] Salutation : Mr. First Name: Jateen First Name Match Standard: Jatin Middle Name: Ku Middle Name Match Standard : Ku., Kumar Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building : Bldg 8 Extra: Thane Belapur Rd . Locality: Airoli City: Navi Mumbai State: MH Post Code: 400708 Phone: 022-67566363 Fax: 020-67566120 Data Matching: This shows how cleansed data has prepared the original input record for matching against a record from another data source. M A T C H I N G D A T A Cleansed Data [Data Source-2] Salutation : Mr. First Name: Jatin First Name Match Standard: Jateen Middle Name: Kumar Middle Name Match Standard : Ku., Ku Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building : Bldg 8 Extra: Thane Belapur Rd . Locality: Airoli City: Navi Mumbai State: MH Post Code: 400708 Integration
20 Data Integration: Consolidates and integrates data from different data sources. Integrated Data Salutation : Mr. First Name: Jatin Middle Name: Kumar Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building : Bldg 8 Extra: Thane Belapur Rd. Locality: Airoli City: Navi Mumbai State: MH Post Code: 400708 Phone: 022-67566363 Fax: 020-67566120 Cleansed Data [Data Source-2] Salutation : Mr. First Name : Jatin First Name Match Standard: Jateen Middle Name: Kumar Middle Name Match Standard : Ku., Ku Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building : Bldg 8 Extra: Thane Belapur Rd. Locality: Airoli City: Navi Mumbai State: MH Post Code: 400708 Cleansed Data [Data Source-1] Salutation : Mr. First Name: Jateen First Name Match Standard: Jatin Middle Name: Ku Middle Name Match Standard : Ku., Kumar Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building : Bldg 8 Extra: Thane Belapur Rd. Locality: Airoli City: Navi Mumbai State: MH Post Code: 400708 Phone: 022-67566363 Fax: 022-67566120 Integration
DQM Methodology: Enhance Phase IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)
22 DQM Phase IV - ENHANCE Postal & Geocode – Customer data enhancement with postal address and geocodes information. Security – Data enhancement with the security, privacy and suppression services (do not call, email, etc.) information. Product – Product data enhancement with commodity coding and Categorization . Illicit Information – Data enhancement with criminal, terrorist and fraud information .
DQM Methodology: Monitor Phase IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)
24 DQM Phase V - MONITOR Deploy data & business rules – Create Business rule for Data Quality check. Check Point & Notifications – Setting up check points/limits and Notification if specified rules and limits are not complied with. Correct Problem – Once the problem is identified correct problematic data before it enters your systems.
25 DQM Methodology - recap DQM Phase I – Frame Roadmap DQM Phase II – Analyze DQM Phase III – Cleanse DQM Phase IV – Integrate DQM Phase V – Enhance DQM Phase VI - Monitor
DQM Tools IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)
27 DQM Tools Data Profiling & Data Quality Tools available in market are: IBM: IBM Information Analyzer IBM QualityStage Informatica: Informatica Data Quality SAP: SAP BusinessObjects Data Quality Management for SAP Solutions SAS: SAS Data Quality Solution Trillium: Trillium Software Profiling Tools: TS Discovery TS Insight TS Quality, a Trillium Software System
28 Gartner’s Magic Quadrant for DQM Tools
Case Study IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)
30 Business Issue ABCD is a leading manufacturing organization in North America . ABCD s implementing a global SAP/PeopleSoft-based core system. As part of the global effort, the data migration from its legacy data towards the SAP/PeopleSoft platform is to be performed. Data from 11 domains (Logistics, Manufacturing, Purchasing, Quality, Equipment & Projects, Services, Sales, CRM, Finance, Controlling, and HR) will be migrated to SAP/PeopleSoft. In the existing system, the potential anomalies in data includes: Duplicated entries for customers & vendors. Incorrect Address due to which delay in shipping & delivery Missing telephone and address contact information. Significant use of represented null values (e.g. “N/A”, “none”, blank) Solution Implemented Data profiling activity provided insight into the prevailing data quality issues. Migrate, cleanse and enrich client data required for the operation of the customer’s business. Data quality analysis and enhancements for Name, Phone No’s, Fax No’s and Address. Implement a robust ETL capability for the management of client data. Improved visibility of data to all departments. Increased Revenue Unique, standardized & corrected information of customers Name, Phone No., Fax No. & Address will enable On-time delivery Reduce postal costs Improved relationship with the customers Significant cost savings Client Benefits Case Study of a leading Manufacturing Organization