this presentation introduces you to data management basic concepts
Size: 2.75 MB
Language: en
Added: Oct 18, 2024
Slides: 54 pages
Slide Content
Data Unlocked: The Startup Edge “Where There is Data Smoke, There is Business Fire”
Let’s Talk Data
Data Driven Company! ?
Data-Driven! Why ? 23% Inventory management and recommendation engine 23% personalized recommendation 10% Weather driven marketing campaign 5% Consumer Insights and Product Development
Understanding the Pitfall Data Breach $6B 300K Poor data quality in the risk models used for trading decision 2M£ Poor Data quality 2M$ Data Breach
Why!!
What is Data Management ?
Data Management
What is Data Management ? Data management is the process of ingesting, storing, organizing and maintaining the data created and collected by an organization. Effective data management in IT systems is crucial to running business operations and delivering information that helps drive decision-making by corporate executives, business managers and other end users.
! What Are data Management Functions
Data Management Functions (Aiken Model) DW/BI Reference & master data Docs & content Data integration & interoperability Data Storage & operations Data security Data Modeling & design Data Governance Data Arch Data Quality Metadata Data Science
Data Management Functions (Aiken Model)
SalesElevate Use Case
SalesElevate Use case SalesElevate is a product that aims to increase retail stores sales by 30% SalesElevate is an AI tool that generates promotions, based on cross-selling, up-selling purchased items SallesElevate provides promotions for any customer, and personalized promos for their loyal customers
SalesElevete Scenarios Scenario 1: SalesElevate is retail store owner and wants to implement an in-house solution on their data to achieve the goals(Data Owner) Scenario 2: SalesElevate is a POS vendor and has their own APIs are connected with their customers(API Owner) Scenario 3: SalesElevate is a startup that targets retails stores and sell them product to increase their sales(Third Party)
Data Acquisition
Data Acquisition Data Management Functions (Aiken Model) DW/BI Reference & master data Docs & content Data integration & interoperability Data Storage & operations Data security Data Modeling & design Data Governance Data Arch Data Quality Metadata Data Science
What is Data Acquisition ? In Data management Data acquisition refers to the process of gathering, collecting, and obtaining data from various sources to serve specific business or analytical needs. It is the first step in the data management lifecycle and plays a critical role in ensuring that the right data is available, accurate, and timely for decision-making, analysis, or operational purposes.
Data-Business Alignment
What is Data-Business Alignment Data-Business alignment ensures that the data acquisition strategy is directly tied to the organization's goals. This involves understanding what the business wants to achieve and identifying the business entities that generate or require the necessary data to meet these objectives.
Risks & Tips Risk: Collecting irrelevant data that doesn't support business goals, leading to wasted resources. Tips to Overcome: Well Defined the business entities Well defined business/ product goals Regularly review and adjust data collection efforts to stay aligned with changing business needs. Clear vision for the final product
Why It's Important ? Without clear alignment, data acquisition may focus on irrelevant or low-value data, leading to wasted resources. Aligning the strategy with business objectives ensures that data collection is purposeful and directly contributes to business success.
SalesElevate Data-Business Alignment What are the business entity for each business goal? SalesElevate is a product that aims to increase retail stores sales by 30% Store Sales Products SalesElevate is an AI tool that generates promotions, based on cross-selling, up-selling purchased items Transactions SallesElevate provides promotions for any customer, and personalized promos for their loyal customers Customers data
Data Acquisition Methods
What is Data Acquisition Data acquisition methods refer to the different techniques used to collect data. Depending on the business entity and data needs, different methods might be appropriate, such as automated data collection, APIs, or third-party data purchases.
Why It's Important ? Choosing the right method ensures that data is relevant, timely, and cost-effective. The wrong method could lead to irrelevant or outdated data, hindering decision-making
Risks & Tips Risk: Using inappropriate data acquisition methods could result in poor data quality or increased costs. Tips to Overcome: Specify the main data acquisition method for all/each business entity Evaluate the pros and cons of each method in the context of business needs. Pilot new methods on a small scale before full implementation.
SalesElevate Data Acquisition First step is specifying the Data acquisition method SalesElevate Sc.1: Database SalesElevate Sc.2: APIs SalesElevate Sc.3: When you’re not data owner you have to Implement a tool that extracts receipts data Buy API from POS vendor that collects
SalesElevate Data Acquisition cont. Second step is specifying the check if chosen data source is sufficient and can answer all business questions SalesElevate Sc.2: APIs SalesElevate Sc.3: When you’re not data owner you have to find a way to collect data Implement a tool that extracts receipts data Buy API from POS vendor
Budgeting
Budgeting Golden Rules Get & Store What’s needed And process when needed Don’t Spend Money On ROT(Redundant, Obsolete, Trivial) Data Pay for whatever keeps you safe and legal
Get/Store What’s Needed When Needed Pt.1 On-prem vs Cloud Batch-processing vs. Real-time Data Storage types (Data warehouse, Database, Data lake,etc..)
On Prem Or Cloud? On-prem Use Case: Storing highly sensitive financial records. Scenario: The company decides to store all sensitive financial data, such as customer credit card information and internal financial reports, on on-premise servers to ensure compliance with strict regulatory requirements and maintain complete control over data security. Impact: Higher initial costs for infrastructure, maintenance, and security personnel but greater control over data and compliance. Cloud Use Case: Storing marketing data for customer segmentation. Scenario: The company stores and processes all customer interaction data, like website visits and social media interactions, on a cloud platform. This allows for flexible scaling, easy access for the marketing team, and cost savings on infrastructure. Impact: Lower upfront costs, pay-as-you-go model, and easy scalability but may require strong data security measures to protect customer information.
Batch processing Vs. Real time Batch-Processing: Use Case: Monthly sales report generation. Scenario: Sales data from various stores are collected throughout the month and then processed in a batch at the end of the month to generate a comprehensive sales report. Impact: Suitable for non-urgent reporting, reduces processing costs, but insights are delayed Real-time Use Case: Real-time inventory management. Scenario: The inventory levels of a retail store are updated in real-time as products are sold. This data is instantly processed to trigger automatic reorder processes when stock levels are low. Impact: Allows for immediate decision-making and automated responses, but requires continuous processing power and can be more expensive.
Don’t Store ROT DATA Record 1: Transaction ID: TX1234 Date: 2023-01-01 Customer: John Doe Product: Product A Quantity: 2 Total: $50 Source: POS System A Record 2: Transaction ID: TX1234 Date: 2023-01-01 Customer: John Doe Product: Product A Quantity: 2 Total: $50 Source: POS System B Record1: Customer ID: CUST5678 Last Purchase Date: 2018-05-15 Preferred Product: Product B Last Promotion Engaged: 10% off on Product B - 2018 Summer Sale Status: Dead Log Entry 1: Timestamp: 2023-09-12 10:05:01 User: Admin1 Action: Opened Dashboard Description: User Admin1 accessed the sales overview dashboard. R edundant T rivial O bsolete
Pay for whatever keeps you safe and legally Data Security Compliance with data Privacy regulation Plan for Secure backup and Disaster Recovery
Risk Management
What is the RISK The for each scenario Scenario 1: SalesElevate is the Data Owner: Data breaches, data loss, unauthorized access, data corruption, compliance violations, inadequate data encryption, insider threats, third-party risks.
What is the RISK The for each scenario Scenario 2: SalesElevate is a the API Owner Data breaches, unauthorized access, data leakage, man-in-the-middle attacks, rate limiting abuse, injection attacks, insecure endpoints, data integrity issues.
What is the RISK The for each scenario Scenario 2: SalesElevate is a third party Lack of Data Sources Data extraction products costs Data breaches, unauthorized access, compliance violations, data misuse, data loss or corruption, insecure data transfer, liability for security incidents, reputation damage.
Data Lifecycle
Data Lifecycle D esign & enable C reate & obtain S tore & maintain Use Dispose Plan Enhance
Data Lifecycle for a SalesElevate Plan: salesElevate decided that they want to know which products their customers like the most and how often they shop. They made a plan to collect data from both online and physical stores. Design & Enable: They setup a simple system that asks customers for their email addresses and favorite products at checkout, both in-store and online. Create & Obtain: Over a few months, salesElevate collects this information from hundreds of customers. Now they knows what their customers like and has their contact information.
Data Lifecycle for a Retail Store Store & Maintain: salesElevate saves this data in an Database on a cloud-based DB server and makes that get updated every day at 12 AM Use: They use this information to send personalized emails to their customers with special offers on their favorite products. Enhance: salesElevate noticed some customers also engage with their social media posts. They decided to add this information to their data, understanding even more about their customers’ interests. and started planning it Dispose: After a year, salesElevate noticed that some of the email addresses are no longer in use. They archived these emails are not in use for the last 5 years, and found that there are some records were not used in any operation and they decided to delete it
Industry specific use cases
What is the RISK The for each scenario Scenario 1: SalesElevate is retail store owner and wants to implement an in-house solution on their data to achieve the goals(Data Owner) Scenario 2: SalesElevate is a POS vendor and has their own APIs are connected with their customers(API Owner) Scenario 3: SalesElevate is a startup that targets retails stores and sell them product to increase their sales(Third Party)
Healthcare Industry: Data-Driven strategy Step 1: Define Business Goals and Objectives Improve patient satisfaction scores by 15% within 12 months. Reduce hospital readmission rates by 10% through predictive analytics. Step 2: Data Acquisition Methods Patient Data: Collect EHR (Electronic Health Record) data from hospitals and clinics. Wearable Devices: Integrate data from patient wearables to monitor vital signs and activity levels. Third-Party Data Sources: Use insurance and pharmaceutical data to enhance treatment plans. Step 3: Data Integration and Governance Establish a unified data platform to integrate data from multiple sources (EHR, wearables, etc.). Ensure compliance with HIPAA and other regional data protection regulations. Step 4: Data Analytics and Insights Use AI and machine learning models to predict patient readmissions and personalize treatment plans. Develop dashboards for real-time monitoring of patient vitals and predictive alerts for clinicians.
E-commerce : Data-Driven strategy Step 1: Define Business Goals and Objectives Increase average order value by 15% using personalized recommendations. Reduce inventory holding costs by 10% through better demand forecasting. Step 2: Data Acquisition Methods Customer Data: Collect data on browsing behavior, purchase history, and preferences. Supply Chain Data: Integrate data from suppliers and logistics partners for real-time inventory management. Social Media and Web Analytics: Use third-party tools to gather insights on customer sentiment and trends. Step 3: Data Integration and Personalization Implement a Customer Data Platform (CDP) to unify customer data across all channels. Use machine learning algorithms to create personalized product recommendations and targeted marketing campaigns. Step 4: Data-Driven Decision-Making Develop dashboards to monitor key performance indicators (KPIs) such as conversion rates, average order value, and customer lifetime value. Optimize pricing, promotions, and inventory levels based on real-time data analytics.
Behind the scenes
Data Storage: Databases, Data lake, W arehouses Basis of Comparison Data Lake Data Warehouse Database Data Structure Raw Structured Structured Purpose of Data Still to be determined Currently in use Used for daily operations Users Data Scientists Business Professionals Application Developers and End Users Accessibility Highly accessible and updated quickly Changes are more difficult and expensive to implement Immediate access for real-time operations Storage Type Stores all data types (structured, semi-structured, unstructured) Stores structured, processed data Stores structured data Processing Speed Lower (due to raw data processing needs) High (optimized for fast querying) High (optimized for transactional speed)
Batch Processing Vs. Real-time Record 1: Transaction ID: TX1234 Source: POS System A Stored At : 2024-09-17 09:30:47 Date : 2024-09-17 09:30:45 Record 2: Transaction ID: TX1235 Source: POS System A Stored At : 2024-09-17 09:30:47 Date : 2024-09-17 09:31:48 Record 1: Transaction ID: TX1234 Source: POS System A Stored At : 2024-09-17 09:30:45 Date : 2024-09-17 09:30:45 Record 2: Transaction ID: TX1235 Source: POS System A Stored At : 2024-09-17 09:30:45 Date : 2024-09-17 09:30:45 Batch Processing Real-time
Use Case: Customer relationship management (CRM). Scenario: Customer contact information, purchase history, and interaction logs are stored in a structured relational database for easy access and management by the sales and customer support teams. Impact: Efficient for structured data, quick access to records, suitable for transactional purposes but limited in handling unstructured data. Use Case: Business intelligence and analytics. Scenario: Aggregated sales, marketing, and customer data from various systems are stored in a data warehouse to support historical trend analysis and business decision-making. Impact: Optimized for query performance, suitable for complex analytics, but can be costly and requires structured data. Database Data Warehouse Which data storage ?