Interface Audit Remediation
Feed Reception & Transfer Validation Utility
Field Format/Data Type Validation
Reconciliation
1
Overview
As part of GIA's Robustness Review Audit, data feeds transferred from one system to
another need to undergo sufficient integrity checks to confirm transmission without
errors;
CTO recommendation is to use a common and consistent mechanism across WM
systems interchanging information via data feeds;
Two Generic utilities are built to validate transmission for any feed file;
transfer_utility–Used by upstream to generate control file;
reception_utility –Used at DWH to validate file transfer;
It compares checksum specified by transfer_utilityin control file with checksum of the
data feed using "md5 utility" and okays file for further processing only if two checksum
values match;
Supports processing for one feed at a time.
Configurable parameters to drive the behavior at individual feed file level.
Scope of first release is SG and HK locations only.
Configuration
2
One control file will be generated for every data file by transfer_utility;
Control file will include following information:
Filename of data file (feed file) for which checksum was computed << Optional >>
if file name is not specified, the script will use the name of the control file to deduce data file
name, eg. If ctr file name is (<feed_name>.ctr) data file name will be (<feed_name>);
The fact weather file name is present in the header of the control file could be highlighted
either as parameter to the shell script, as metadata in a separate file or via a wrapper
program.
Result of Unix md5sum command <<Mandatory>>;
Book Date/Bank Date/Business Date in "dd/mm/yyyy" format (for daily feed) <<Optional>>;
No of records in data file<<Optional>>;
File generation date & time in "dd/mm/yyyy hh:mm"<<Optional>>;
The utility behavior can be controlled using configuration parameters that can be maintained for
each individual file;
Data Field Format Validation
3
Following checks and validations are enforced in DWH in data loading jobs (ETL Jobs & scripts)
as business rules and in database through database constraints:
Empty File Checks
Business Date Checks including Special Month checks
Record Count Checks
Data Type Checks on Critical Fields
Unique Value Checks on Key Values
Null Value checks on Critical Fields
Data Reconciliation
Deviations in above cause errors and job failures;
Failure and cause are captured in log files and reported via eMails and via NetCool;
Appendix
4
Data Validation Processes & Procedures
Data Repository
5
DWH is a data abstraction layer used to couple local backend systems to global front-end
applications, providing a consolidated and standardized end-of-day view of transactions, positions
along with static data;
It feeds data to various downstream application for both operational and analytical purposes and
broadly stores following type of data:
Master Static Data
Clients, Partners
Accounts
Instrument Information
Transactional Activity Data
Orders
Trades
Payments
Positions & Holdings
Reference Data (including data from outside the bank)
Instrument Prices
ISO codes
DWH -Reconciliation Components
6
Interface Processing Control Report
Ensure each interface file expected is transmitted completely and is valid;
Validate that header/trailer/hash total checking performed and succeeded;
Validates checksum generated with md5 utility;
Generated and sent by email daily;
Reconciliation
Interface DWH Reconciliation
Cover all interface files
For each individual interface files, compare all records and all fields
DWH Internal Reconciliation
Reconcile DWH layers to ensure completeness and correctness of end-to-end data transformation, i.e.
from the point data is loaded from interface files (Staging), to the point data is available to downstream
systems
DWH-FDM Reconciliation
All Positions and transactions
Major static data including clients, mandates, portfolios, accounts, mandates, security prices and
exchange rates, security masters
Sanity Check and Data Integrity Check
Ensure data loaded into DWH/FDM is reasonable with respect to the meaning of data
Ensure data fulfills the data integrity requirements (e.g. parent-child record relationship, valid column
values, null/not null values etc.)
DWH : Data Validation Checks -Details
7
File/Interface Level Checks
Transmission validation using "md5 checksum" utility;
Check for empty file;
Check for Business Date;
Checks and handling for "special month-end" (when month-end falls on weekend);
Checks on Master Static Data Elements
Checks for NULL values for key fields & information;
Checks for data types for key fields;
Checks for duplicate records based on primary keys defined;
Referential Integrity enforced at dblevel that ensures no foreign key could exist without a valid
primary key –Account can not exist without PARTNER, PORTFOLIO record can not be
loaded unless attached to a MANDATE;
To enforce conformance to reference data, Lookup table are used to store low
cardinality/repeatedly used set of static data, job fails if a lookup value from transactional
record doesn't have a corresponding entry in lookup table;
DWH : Data Validation Checks –Details (Contd.)
8
Reference Data Checks
Checks for NULL values for key fields & information;
Checks for data types for key fields;
Checks for duplicate records based on primary keys defined;
Conformance to reference data enforced via referential integrity as well as lookup approach;
Transactional Activity Data Checks
Checks for NULL values for key fields & information;
Checks for data types for key fields;
Checks for duplicate records based on primary keys defined;
Conformance to reference data enforced via referential integrity as well as lookup approach;
9
DWH-Sanity Check
Objective
Check reasonableness of data and detect potential problems
Rules
Total 9 Sanity Check Rules
Controlling : 7 rules
DWHBI RtB: 6 rules