Data Mining Core Concept & Introduction.

AliEducationCenter 9 views 11 slides Oct 19, 2025

Slide 1 of 11

About This Presentation

Core Concept About Data Mining, Patterns, Algorithms & Real World Examples.

Size: 722.29 KB

Language: en

Added: Oct 19, 2025

Slides: 11 pages

Slide Content

Data
Mining By Engr. Ahsan Shah

Example Explanation
?????? Market Basket Rule: “If a
customer buys bread, they also
buy butter.”
Association pattern
?????? Sales increase during
holidays.
Trend pattern
??????‍⚕️ Certain symptoms often
appear together in patients.
Correlation pattern
?????? Unusually high transaction =
possible fraud.
Anomaly pattern
What is Data
Mining?
Data Mining is the process of discovering useful patterns,
relationships, and insights from large sets of data using
statistical, mathematical, and computational techniques.
In Data Mining, a pattern means a useful, meaningful, and valid
relationship or structure found in data.
These patterns help understand behavior, predict outcomes,
and make business decisions.

Transaction ID Items Bought
T1 Milk, Bread, Butter
T2 Milk, Bread
T3 Bread, Butter
T4
Milk, Bread,
Butter, Eggs
Association Patterns
(Dependency Rules)
Shows relationships between items or variables.
Common in market basket analysis.
If {Milk, Bread} → {Butter} (Support = 30%, Confidence = 80%)
30% of transactions include milk, bread, and butter;
80% of customers who buy milk and bread also buy butter.
Technique Used
Apriori Algorithm
FP-Growth Algorithm
What is the Apriori Algorithm?
The Apriori Algorithm is a classic data mining algorithm used
to find frequent itemsets and generate association rules from
large transactional databases.
Find patterns like
“If a customer buys Milk and Bread, they also buy
Butter.”
Before
understanding how
Apriori works, you
must know these
three important
terms:
1.ItemsetA set of items bought together in a transaction.
Example: {Milk, Bread}
2. Support
Measures how frequently an itemset
appears in the dataset.
Support(A→B)= Total transactions
/
Transactions containing (A ∪ B)
If 2 out of 4 transactions contain {Milk, Bread},
Support=2/4=0.5=50%

What does the formula mean?
It is used to measure the confidence of the rule A ⇒ B, which asks:
"How likely is B to be true when A is true?"
Support(A):
The proportion of transactions in the dataset that contain itemset A.
Support(A ∪ B):
The proportion of transactions that contain both A and B together.
Confidence(A → B):
The likelihood that a transaction containing A also contains B.
It's calculated by:
Confidence(A→B)
=
Transactions with both A and B
Transactions with A
Suppose you have 100 transactions from a store:
40 contain milk (A)
30 contain both milk and bread (A ∪ B)
Support(A) = 40/100 = 0.40
Support(A ∪ B) = 30/100 = 0.30
Confidence(A→B)= 0.30/0.40 = 0.75
So, 75% of the customers who bought milk also bought bread.
The confidence formula
helps quantify the
strength of a rule in
association mining. A
higher confidence
means a stronger rule—
i.e., if A happens, B is
more likely to happen.
3. Confidence

Candidate Support (%)
{Milk, Bread} 75%
{Milk, Butter} 50%
{Bread, Butter} 75%
Candidate Support (%)
{Milk, Bread,
Butter}
50%
Rule Support Confidence Lift
Milk → Bread 75% 100% 1
Bread → Butter 75% 75% 1
Milk, Bread →
Butter
50% 66% >1
4.Lift
✅ If Lift > 1 → positive relationship
✅ If Lift = 1 → no relationship
✅ If Lift < 1 → negative relationship
Steps in the Apriori Algorithm
Step 1: Set Minimum Support and Confidence
Decide thresholds (e.g., support ≥ 50%, confidence ≥ 60%).
Step 2: Find All Frequent 1-Itemsets
Count support for each single item.
Keep only items meeting the minimum support threshold.
Step 3: Generate Candidate 2-Itemsets
Step 4: Generate Candidate 3-Itemsets
Combine frequent 2-itemsets.
Step 5: Generate Association Rules

Step Description Example Output
1 Choose min support/confidence Support ≥ 50%, Confidence ≥ 60%
2 Generate 1-itemsets {Milk}, {Bread}, {Butter}
3 Generate 2-itemsets {Milk, Bread}, {Bread, Butter}
4 Generate 3-itemsets {Milk, Bread, Butter}
5 Generate rules Milk & Bread → Butter

Domain Application
??????️ Retail
Market Basket Analysis (find products bought
together)
?????? Banking
Detecting services used together
?????? Web Usage Mining
Finding patterns in website navigation
?????? Healthcare
Identifying symptom-disease relationships
?????? Marketing
Cross-selling and recommendation systems
Real-Life Applications
of Apriori

Advantages
and Limitations
Advantages Limitations
Simple and easy to implement
Works well for small to medium data
Provides clear rules
High computation cost for large datasets
Generates too many candidate sets
Doesn’t handle numeric or continuous data easily

Transaction ID Items Bought
T1 Milk, Bread, Butter
T2 Bread, Butter
T3 Milk, Bread
T4 Milk, Bread, Butter
FP-Growth
Algorithm?
Instead of generating candidate itemsets one by one (like Apriori),
FP-Growth uses a compact data structure called the FP-Tree
(Frequent Pattern Tree).
Point 01
Scan the transaction database once to find frequent items.
Sort items in each transaction by their frequency.
Build a tree structure that stores items and their occurrence
counts.
Step 1: Build the FP-Tree
FP-Growth (Frequent Pattern Growth) is a data mining
algorithm used to find frequent itemsets in large datasets
just like the Apriori algorithm, but faster and more
efficient.
Step 2: Extract Frequent Item sets
Starting from the bottom of the tree, recursively find prefix
paths (patterns).
Generate conditional FP-Trees for each item.
Combine them to form frequent itemsets.

Limitation Description
?????? Complex Tree Structure
Can be hard to
understand and
?????? Memory Usage
May grow large for
sparse data (many
?????? Not Easy for Dynamic
Data
Tree must be
rebuilt if dataRoot
├── Bread (4)
│ ├── Milk (3)
│ └── Butter (3)
Then, frequent patterns are extracted such as:
{Bread}
{Bread, Milk}
{Bread, Butter}
{Bread, Milk, Butter}

Step 2: KDD Process
(Knowledge Discovery in
Database)
Data Cleaning – Remove noise and missing values
Data Integration – Combine data from multiple sources
Data Selection – Choose relevant data
Data Transformation – Convert into suitable format
Data Mining – Apply algorithms and find patterns
Pattern Evaluation – Identify meaningful results
Knowledge Presentation – Visualize results for users
Before mining, raw data must be cleaned, organized, and transformed.
This process is called Data Preprocessing.
Step 3: Preparing the Data
(Data Preprocessing)
Steps in Data Preparation:
1. Data Cleaning
Remove duplicate, missing, or inconsistent data.
Example: Replacing blank values with the average of the column.
2. Data Integration
Combine data from multiple sources (e.g., sales + customer + location data).
Helps create a unified dataset.
3. Data Selection
Choose only relevant attributes for mining.
Example: For predicting sales, choose “price,” “discount,” “region” but not “employee age
4. Data Transformation
Convert data into a suitable format for analysis.
Includes:
Normalization: Scale values into a fixed range (e.g., 0–1)
Aggregation: Summarize data (e.g., weekly → monthly sales)
Encoding: Convert text data into numbers for algorithms.
To make the dataset accurate, consistent, and
ready for mining algorithms.
Poor data quality = poor results, no matter how
good the algorithm is.

Data Mining Core Concept & Introduction.

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Data Mining Core Concept &amp; Introduction.

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx

Data Mining Core Concept & Introduction.