Course: Program Elective-I
(Data Mining and Warehousing )
Course Teachers: Course Chairman:
Dr K.Rajeswari Dr Swati Shinde
Dr Avinash Bhute
Module: Knowledge Engineering, Module Coordinator: Dr Mubin Tamboli.
Agenda
Teaching and Examination Scheme
Relevance of the Course and Prerequisites
Course Objectives and Outcomes
Why Data Analytics and Data Mining
Applications and Job opportunity
Teaching & Examination Scheme
Teaching Scheme : Lecture : 2 hrs/week
Planned 30 (Syllabus)
Examination Scheme (100 Marks)
FA= 40 Marks
(Formative Assessment- 10 marks for each
unit, which will be continuously evaluated with
classroom participation and will be verified in
2 instances)
Relevance of the Course and Prerequisites
Prerequisites
Database Management systems (DBMS), Engineering
Mathematics
Relevance of the Course:
Professional elective course
Prerequisite course for Machine Learning, Artificial
Intelligence.
To learn Data Mining technique for statistical analysis to
develop effective decision support system
Lot of career oppurtunities.
The course will help to learn and apply preprocessing
techniques, various data mining functionalities, post
processing methods for different applications in data mining
and data warehousing.
Text Books
Jiawei Han, Micheline Kamber, “Data mining: concepts and
techniques", Morgan Kaufmann Publisher 2012, third
edition, ISBN 978-0-12-381479-1.
G. K. Gupta, “Introduction to Data mining with Case
Studies", PHI Learning Private Limited, Delhi 2014, third
edition, ISBN-978-81-203-5002-1.
William H Inmon, “Building the data Warehouse”, Wiley
Publication 2005, fourth edition, ISBN: 978-0-764-59944-6.
Reference Books
Dunham, M. H., “Data mining: Introductory and advanced
topics”, Upper Saddle River, N.J: Pearson education /Prentice
Hall 2003.
Ralph Kimball, Margy Ross, “The Data Warehouse Toolkit”,
3rd Edition, Wiley 2013, ISBN-13: 978-1118530801.
Ian H. Witten and Eibe Frank, “Data Mining: Practical
Machine Learning Tools and Techniques”, Second Edition,
Morgan Kaufmann Publishers 2005, ISBN: 0-12-088407-0.
Case Study – Canteen Data
DBMS Vs Data Mining
Parameters DBMS
Applications
Data Mining
Applications
Uses Day today TransactionsWeekly /Monthly
Analysis
Data Current Historical
Operations INSERT
UPDATE
DELETE
READ/SELECT
LOAD
READ
ANALYSIS
USERS END USERS to perform
above operations.
Such as Clerk, operators
BUSINESS
ANALYST
TOP MANAGEMENT
MANAGER
EXECUTIVE Director
Examples ERP system Decision Making
System
DBMS Vs Data Mining
Paramet
ers
DBMS Applications Data Mining
Applications
Examples ERP system Performance Analysis &
Decision Making System
Course Objectives
To introduce the fundamentals of Data mining and Data Warehousing.
To develop skills to select appropriate multi-dimensional schemas to
design data warehouse model.
To develop skills to identify the appropriateness and need of data mining.
To study and use preprocessing techniques for preparing suitable dataset
for data mining.
To apply data similarity and dissimilarity measures for statistical analysis
To study and apply various methods and algorithms in data mining for
solving real world problems.
.
Course Outcomes
After learning the course, students will be able to:
1. Use data preprocessing techniques for preparing suitable
dataset for data mining.
2. Select appropriate multi-dimensional schema to design data
warehouse model.
3. Apply data similarity and dissimilarity measures for
statistical analysis.
4. Apply Data Mining functionalities to solve real world
problems.
Growing demand
More vigorous competition
High customer expectations
Advancements in technology
Speed of product obsolescence has increased
Its effect
•Reducing sales and market shares
•Decreasing profit margins
•Difficult to survive and grow
BUSINESS SCENARIO
Example of Data Analytics
Companies using data analytics
Wallmart
Flipkart
Amazon
Accenture
Cigna (American health care organisation)
Rapido( Indian Bike rental company,
Bangalore)
Why Data Analytics
•Data is being produced in large quantities
•The computing power is available
•The computing power is affordable
•The competitive pressures are strong
•Commercial products are available
•Terabytes – 10 ^ 12 bytes – Walmart – 24 Terabytes
•Petabytes – 10^15 bytes – GIS database
•Exabytes - 10^18 bytes – National Medical Record
•Zettabytes – 10^21 bytes – Weather images
•Zottabytes – 10^24 bytes – Intelligence Agency Video
15
EXAMPLES OF DATA ANALYTICS
9000 stores
More than 100 Countries
10,000 to 1,00,000 Stock Keeping Units (SKUs)
1 Million Transactions Every Hour
EXAMPLES OF DATA ANALYTICS
1800 stores
35 Million Club Customers
1 Billion Items Home Delivered Annually
EXAMPLES OF DATA ANALYTICS
261 Million Subscribers
8 Billion Calls Every Day
Hundreds of Different Call Plans
EXAMPLES OF DATA ANALYTICS
10 Million Transactions on Busy Days
EXAMPLES OF DATA ANALYTICS
100 Million Transactions Per Day
Why Data Analytics and Data Mining
Companies generate large volumes of data every hour
Data may be in the form of transactional data, log files,
customer data etc
Data generated rapidly with social media like twitter,
facebook, whatsapp, twitter
Companies want to use this data to make their further
business decisions and improve their profits --- and hence
come DATA analytics
Data Mining
Process of exploring and analysing LARGE
DATASETS to find
Why to use data analytics
Improved decision making :
To improve decisions (speeds up) in business
Without guesswork
More personalisation
understanding customer’s need and interest
thoroughly
Better recommendation for products and
services
Way to use data analytics
Efficient operations
When the interest of audience is known, time is not
wasted in posting irrelevant contents
Effective content management
Helps to optimize campaigns
Even ads as per the interest of customers
Hence Improves results
Effective marketing
Knowing customers, relevant campaigning
Customers get converted to leads.
APPLICATIONS OF DATA ANALYTICS AND JOB
OPPURTUNITIES
DATA ANALYST IN
Management
•Marketing
•Finance
•Human Resource
•Operations
•Supply Chain Management
Industries:
•Retail
•Banking
•Telecom etc.
Best Videos Links for beginners
https://youtu.be/ukzFI9rgwfU (8 minutes)
Machine learning basics
https://youtu.be/X3paOmcrTjQ (5 minutes )
Data Science basics
https://www.youtube.com/watch?v=zwasdVPPFFw :
Home ( 1 hour)
Data Analytics for Beginners
THANK YOU!
Data Mining
Introduction to Data Mining
Gold Mining
is a process of separating gold from
rocks and other materials
Similarly Data Mining is an Extraction of
interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or
knowledge from huge amount of data.
Exploration and Analysis of large quantities of data
to discover meaningful patterns and rules hidden in
the data.
29
Introduction to Data Mining
Data Mining is an application of Machine Learning
Machine Learning is nothing but a type of Artificial
Intelligence (AI) which enables computers to learn
the data without help of any explicit programs.
30
Example: Jewelry Shops sales
Analysis
Data set : Ex.
Name of jewelry item, Design pattern , Gender, Age
Information:
No. of items sold today
No. of items sold on first day of the month
No. of items sold at the end of the month
Knowledge / Interesting patterns:
If Ring is purchased Bangles are purchased
If Age is Middle age then Earring design1 is purchased
frequently and in more quantity.
Alternative names
Knowledge Discovery (mining) from Data
(KDD)
-- Data Mining is one step of KDD
knowledge extraction,
data/pattern analysis,
data archeology,
data dredging,
information harvesting,
business intelligence, etc.
Need of Data Mining
The ability of the knowledge workers to make
decisions, is one of the primary factors that influence
the performance and competitive strength of a given
organization.
The main purpose of Data Mining technique is to
provide knowledge workers, KNOWLEDGE
extracted from data that allow them to make
effective and timely decisions.
To improve the overall quality of the decision-making
process
34
KDD Process….Cont.
Input Data
Data
Mining
Data Pre-
Processing
Post-
Processing
Data integration
Normalization
Feature selection
Dimension reduction
Pattern discovery
Association &
correlation
Classification
Clustering
Outlier analysis
… … … …
Pattern evaluation
Pattern selection
Pattern interpretation
Pattern visualization
Architecture of a typical DM System
36
Knowledge Discovery (KDD) Process
This is a view from typical
database systems and data
warehousing communities
Data mining plays an essential role
in the knowledge discovery process
Data Cleaning
Data Integration
Databases
Data
Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
Data Mining in KDD
Steps in KDD process
1. Data cleaning (to remove noise and inconsistent data)
2. Data integration (where multiple data sources may be
combined)
3. Data selection (where data relevant to the analysis task are
retrieved from the database)
4. Data transformation (where data are transformed or
consolidated into forms appropriate for mining by performing
summary or aggregation operations, for instance)
5. Data mining (an essential process where intelligent methods
are applied in order to extract data patterns)
6. Pattern evaluation (to identify the truly interesting patterns
representing knowledge based on some interestingness measures;
7. Knowledge presentation (where visualization and knowledge
representation techniques are used to present the mined
knowledge to the user)
38
Data Mining Task Primitives
40
Example: Medical Data Mining
Health care & medical data mining – often adopted
such a view in statistics and machine learning
Preprocessing of the data (including feature extraction
and dimension reduction)
Classification or/and clustering processes
Post-processing for presentation
41
Example: A Web Mining Framework
Web mining usually involves
Data cleaning
Data integration from multiple sources
Warehousing the data
Data cube construction
Data selection for data mining
Data mining
Presentation of the mining results
Patterns and knowledge to be used or stored into knowledge-
base
42
Data Mining in Business Intelligence
Increasing potential
to support
business decisions
End User
Business
Analyst
Data
Analyst
DBA
Decision
Making
Data Presentation
Visualization Techniques
Data Mining
Information Discovery
Data Exploration
Statistical Summary, Querying, and Reporting
Data Preprocessing/Integration, Data Warehouses
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
THANK YOU
https://youtu.be/1NjPTh0Eoeg
KDD Process
Data, Information and Knowledge
Data gathered from different sources cannot be used
directly for data mining process and decision-making
purposes.
They need to be processed
by means of appropriate extraction tools and
analytical methods capable of
transforming them into information and knowledge
that can be subsequently used by decision makers.
Data, Information and Knowledge
Information: Information is the outcome of extraction
and processing activities carried out on data.
Knowledge: Information is transformed into knowledge
when it is used to make decisions and develop the
corresponding actions.
Operational Data and Informational Data
Operational Systems
(Operational Data)
Informational or knowledge-
based systems (Informational Data)
1. Systems that help us to run the enterprise
operation day-to-day. (Daily transactional
data)
1. Systems to provide functions that go on
within the enterprise that have to do with
planning, forecasting and managing the
organization.
2. These are the backbone systems of any
enterprise, Because of their importance to
the organization, operational systems were
almost always the first parts of the
enterprise to be computerized
2. "Informational systems" have to do with
analyzing data and making decisions, often major
decisions, about how the enterprise will operate,
now and in the future
3. Operational data needs are normally
focused upon a single area
3. Informational data needs often span a
number of different areas and need large
amounts of related operational data
4. Examples: operations or functions in
OLTP "order entry', "inventory",
"manufacturing", "payroll" and
"accounting" systems.
4. Examples: Functions like "marketing
planning", "engineering planning" and
"financial analysis" also require information
systems to support them.
Operational Data and Informational Data
OLTP (Operational Data) OLAP(Informational Data)
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date
detailed, flat relational
isolated
historical,
summarized, multidimensional
integrated, consolidated
usage repetitive ad-hoc
access read/write
index/hash on prim. key
lots of scans
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
DBMS, OLAP, and Data Mining
DBMS
OLAP (Data
warehouse )
Data Mining
Task
Extraction of
detailed and
summary data
Summaries, trends
and forecasts
Knowledge
discovery of
hidden
patterns and
insights
Type of resultInformation Analysis
Insight and
Prediction
Method
Deduction (Ask
the question,
verify with
data)
Multidimensional
data modeling,
Aggregation,
Statistics
Induction (Build
the model,
apply it to new
data, get the
result)
Example
question
Who purchased
mutual funds in
the last 3
years?
What is the average
income of
mutual fund
buyers by
region by year?
Who will buy a
mutual fund in
the next 6
months and
why?
Example of DBMS, OLAP and Data Mining:
Weather Data
By querying a DBMS containing the above table we
may answer questions like:
•What was the temperature in the sunny days? {85,
80, 72, 69, 75}
•Which days the humidity was less than 75? {6, 7, 9,
11}
•Which days the temperature was greater than 70?
{1, 2, 3, 8, 10, 11, 12, 13, 14}
•Which days the temperature was greater than 70
and the humidity was less than 75? The
intersection of the above two: {11}
Example of DBMS, OLAP and Data Mining:
Weather Data
OLAP:
•Using OLAP we can create a Multidimensional Model of
our data (Data Cube).
•For example using the dimensions: time, outlook and play
we can create the following model.
9 (Y) / 5(N)sunny rainy overcast
Week 1 0 / 2 2 / 1 2 / 0
Week 2 2 / 1 1 / 1 2 / 0
Example of DBMS, OLAP and Data
Mining: Weather Data
Data Mining:
•Using the ID3 algorithm we can produce the following
decision tree:
•outlook = sunny
–humidity = high: then play =>no
–humidity = normal: then play=> yes
•outlook = overcast: then play=> yes
•outlook = rainy
–windy = true: then play=> no
–windy = false: then play=> yes
THANK YOU
59
Attributes types in data mining
Input Data
Data
Mining
Data Pre-
Processing
Post-
Processing
Data integration
Normalization
Feature selection
Dimension reduction
Pattern discovery
Association &
correlation
Classification
Clustering
Outlier analysis
… … … …
Pattern evaluation
Pattern selection
Pattern interpretation
Pattern visualization
Attributes Types:
80% time to
prepare dataset
Attributes types in data mining
What is Attribute?
The attribute is the property of the object. The attribute
represents different features of the object.
Example:
In this example, RollNo, Name, and Result are attributes of
the object student.
RollNo Name Result
1 Ali Pass
2 Akram Fail
Types Of Attributes
Nominal Attribute
Nominal data:
Nominal data is in alphabetical form and not in integer.
Example:
Attribute Value
Categorical dataLecturer, Assistant Professor, Professor
States New, Pending, Working, Complete, Finish
Colors Black, Brown, White, Red
Binary Attribute
Binary data:
Binary data have only two values/states.
Example:
Binary attribute is of two types:
1)Symmetric binary
2)Asymmetric binary
Attribute Value
HIV detected Yes, No
Result Pass, Fail
Symmetric data:
Both values are equally important
Example:
Asymmetric data:
Both values are not equally important
Example:
Attribute Value
Gender Male, Female
Attribute Value
HIV detected Yes, No
Result Pass, Fail
Ordinal Attribute
Ordinal data:
All Values have a meaningful order.
Example:
Attribute Value
Grade A,
B, C, D, F
BPS- Basic pay scale16, 17, 18
Discrete Attribute
Discrete Data:
Discrete data have finite value. It can be in numerical form
and can also be in categorical form.
Example:
Attribute Value
ProfessionTeacher, Business Man, Peon etc.
Postal Code 42200, 42300 etc.
Continuous Attribute
Continuous data:
Continuous data technically have an infinite number of
steps.
Continuous data is in float type. There can be many
numbers in between 1 and 2
Example:
Attribute Value
Height 5.4…, 6.5….. etc.
Weight 50.09….
etc.