Data Quality Management & Approach & Implementation

ssusera92ed61 58 views 30 slides Sep 04, 2024
Slide 1
Slide 1 of 30
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30

About This Presentation

Data Quality Management & Approach & Implementation


Slide Content

What is Data Quality & Data Quality Management IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)

2 What is Data Quality & Data Quality Management Data Quality is defined as the degree to which information consistently meets the requirements and expectations of all knowledge workers who require it to perform their jobs. Data quality management is a combination of applications, processes and infrastructures to facilitate consistent, accurate, reliable and timely data to business applications and users.

3 What is Data Quality Key Data Quality dimensions: Accuracy: Ensuring data values represent the “Real World” model. Completeness: Ensuring availability of relevant and critical data. Consistency: Ensuring the available data conforms the specific formats. Correctness: Ensuring correct spelling, conforms to business rule. Integrity: Ensuring single representation of each entity across systems . Timeliness: Ensuring data to be current and recent.

DQM - Approach IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)

5 DQM – Approach Proactive DQM Approach - D iminishes the potential for new data quality problems to arise. Establishing the overall governance. Defining the roles and responsibilities. Establishing the quality expectations and the supporting business practices. Deploying a technical environment that supports these business practices. Specialized tools are often needed in this technical environment. Reactive DQM Approach - addresses problems that already persist.   Dealing with problems that are inherent in the data in the existing databases. Data Quality issues resulted due to mergers and acquisitions.

DQM – Implementation Approach IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)

7 DQM - Implementation Approach Transformation & Data Upload Activities: Standardization Matching & Deduping Data Quality Testing Enrichment Data Extraction & Analysis Data Cleansing, Validation & Integrate Iterative approach, continuous quality improvement Activities: -Data Validation -Data Loading Activities: Data Analysis Mapping Cleansing Rules Staging Area Data Extracts … Source Systems Target System Analysis Enhance Standardize Integrate Cleanse Data Flow Business Rule Flow

DQM – Methodology IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)

9 DQM Methodology DQM Phase I – Frame Roadmap DQM Phase II – Analyze DQM Phase III – Cleanse DQM Phase IV – Integrate DQM Phase V – Enhance DQM Phase VI - Monitor

DQM Methodology: Analyze Phase IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)

11 DQM Phase I - ANALYZE Discover Anomalies & Inconsistency – Discover the problem and inefficiency in the data. Discover Redundancy – Discover the frequency distribution of the different value in the data. Discover Relationship – Analyze and discover dependency of the data among source systems. Captures metadata – Analyze and capture metadata.

DQM Methodology: Cleanse Phase IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)

13 DQM Phase II - CLEANSE Consistent Format – Parse content of each field in uniform structure and field lengths. Standardize – Arrange data into preferred uniform abbreviation and Uniform & accurate spelling. Correct – Correct inaccurate data. Capture missing data values complete the data.

14 Data Parsing: The data parsing process identifies, and dismantles individual data elements and place them into appropriate fields. Input Data Mr. Jatin Kumar Mehta, Capgemini Consulting India Pvt Ltd ., Building 8, Mindspace , Thane Belapur Road., Airoli New Bombay, 400709 Parsed Data Salutation: Mr. First Name: Jatin Middle Name: Kumar Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd . Firm Location: Mindspace Building: Building 8 Extra: Thane Belapur Road Locality: Airoli City: New Bombay Post Code: 400709 Cleanse

Parsed Data Salutation : Mr. First Name: Jatin Middle Name: Kumar Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building: Building 8 Extra: Thane Belapur Road Locality: Airoli City: New Bombay Post Code: 400709 Standardized Data Salutation : Mr. First Name: Jatin First Name Match Standard: Jateen Middle Name: Kumar Middle Name Match Standard : Ku., Ku Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building : Bldg 8 Extra: Thane Belapur Rd . Locality: Airoli City: Navi Mumbai Post Code: 400709 15 Data Standardization: The corrected data is standardized according to the required business criteria, according to which the data is matched in the later phase. Cleanse

Parsed & Standardized Data Salutation : Mr. First Name: Jatin First Name Match Standard: Jateen Middle Name: Kumar Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd. Firm Location: Mindspace Building: Bldg 8 Extra: Thane Belapur Rd. Locality: Airoli City: New Bombay Post Code: 400709 Corrected Data Salutation : Mr. First Name: Jatin First Name Match Standard: Jateen Middle Name: Kumar Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building: Bldg 8 Extra: Thane Belapur Rd. Locality: Airoli City: Navi Mumbai State: MH Post Code: 400708 16 Data Cleansing and Correction: Cleanse

DQM Methodology: Integrate Phase IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)

18 DQM Phase III - INTEGRATE Data Matching – Finding duplicate information within database or across databases. Deduplication – Preparing best record out of duplicate information using software applications. Manual Deduplication – Data Stewards identify whether set of records are duplicate or unique. Data Integration – The data integration process consolidates and integrates the data from different data sources.

19 Cleansed Data [Data Source-1] Salutation : Mr. First Name: Jateen First Name Match Standard: Jatin Middle Name: Ku Middle Name Match Standard : Ku., Kumar Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building : Bldg 8 Extra: Thane Belapur Rd . Locality: Airoli City: Navi Mumbai State: MH Post Code: 400708 Phone: 022-67566363 Fax: 020-67566120 Data Matching: This shows how cleansed data has prepared the original input record for matching against a record from another data source. M A T C H I N G D A T A Cleansed Data [Data Source-2] Salutation : Mr. First Name: Jatin First Name Match Standard: Jateen Middle Name: Kumar Middle Name Match Standard : Ku., Ku Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building : Bldg 8 Extra: Thane Belapur Rd . Locality: Airoli City: Navi Mumbai State: MH Post Code: 400708 Integration

20 Data Integration: Consolidates and integrates data from different data sources. Integrated Data Salutation : Mr. First Name: Jatin Middle Name: Kumar Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building : Bldg 8 Extra: Thane Belapur Rd. Locality: Airoli City: Navi Mumbai State: MH Post Code: 400708 Phone: 022-67566363 Fax: 020-67566120 Cleansed Data [Data Source-2] Salutation : Mr. First Name : Jatin First Name Match Standard: Jateen Middle Name: Kumar Middle Name Match Standard : Ku., Ku Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building : Bldg 8 Extra: Thane Belapur Rd. Locality: Airoli City: Navi Mumbai State: MH Post Code: 400708 Cleansed Data [Data Source-1] Salutation : Mr. First Name: Jateen First Name Match Standard: Jatin Middle Name: Ku Middle Name Match Standard : Ku., Kumar Last Name: Mehta Firm: Capgemini Consulting India Pvt Ltd.. Firm Location: Mindspcae Building : Bldg 8 Extra: Thane Belapur Rd. Locality: Airoli City: Navi Mumbai State: MH Post Code: 400708 Phone: 022-67566363 Fax: 022-67566120 Integration

DQM Methodology: Enhance Phase IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)

22 DQM Phase IV - ENHANCE Postal & Geocode – Customer data enhancement with postal address and geocodes information. Security – Data enhancement with the security, privacy and suppression services (do not call, email, etc.) information. Product – Product data enhancement with commodity coding and Categorization . Illicit Information – Data enhancement with criminal, terrorist and fraud information .

DQM Methodology: Monitor Phase IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)

24 DQM Phase V - MONITOR Deploy data & business rules – Create Business rule for Data Quality check. Check Point & Notifications – Setting up check points/limits and Notification if specified rules and limits are not complied with. Correct Problem – Once the problem is identified correct problematic data before it enters your systems.

25 DQM Methodology - recap DQM Phase I – Frame Roadmap DQM Phase II – Analyze DQM Phase III – Cleanse DQM Phase IV – Integrate DQM Phase V – Enhance DQM Phase VI - Monitor

DQM Tools IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)

27 DQM Tools Data Profiling & Data Quality Tools available in market are: IBM: IBM Information Analyzer IBM QualityStage Informatica: Informatica Data Quality SAP: SAP BusinessObjects Data Quality Management for SAP Solutions SAS: SAS Data Quality Solution Trillium: Trillium Software Profiling Tools: TS Discovery TS Insight TS Quality, a Trillium Software System

28 Gartner’s Magic Quadrant for DQM Tools

Case Study IBM QualityStage Training Program – Day 1 Data Quality Management (DQM)

30 Business Issue ABCD is a leading manufacturing organization in North America . ABCD s implementing a global SAP/PeopleSoft-based core system. As part of the global effort, the data migration from its legacy data towards the SAP/PeopleSoft platform is to be performed. Data from 11 domains (Logistics, Manufacturing, Purchasing, Quality, Equipment & Projects, Services, Sales, CRM, Finance, Controlling, and HR) will be migrated to SAP/PeopleSoft. In the existing system, the potential anomalies in data includes: Duplicated entries for customers & vendors. Incorrect Address due to which delay in shipping & delivery Missing telephone and address contact information. Significant use of represented null values (e.g. “N/A”, “none”, blank) Solution Implemented Data profiling activity provided insight into the prevailing data quality issues. Migrate, cleanse and enrich client data required for the operation of the customer’s business. Data quality analysis and enhancements for Name, Phone No’s, Fax No’s and Address. Implement a robust ETL capability for the management of client data. Improved visibility of data to all departments. Increased Revenue Unique, standardized & corrected information of customers Name, Phone No., Fax No. & Address will enable On-time delivery Reduce postal costs Improved relationship with the customers Significant cost savings Client Benefits Case Study of a leading Manufacturing Organization
Tags