Core Concept About Data Mining, Patterns, Algorithms & Real World Examples.
Size: 722.29 KB
Language: en
Added: Oct 19, 2025
Slides: 11 pages
Slide Content
Data
Mining By Engr. Ahsan Shah
Example Explanation
?????? Market Basket Rule: “If a
customer buys bread, they also
buy butter.”
Association pattern
?????? Sales increase during
holidays.
Trend pattern
??????⚕️ Certain symptoms often
appear together in patients.
Correlation pattern
?????? Unusually high transaction =
possible fraud.
Anomaly pattern
What is Data
Mining?
Data Mining is the process of discovering useful patterns,
relationships, and insights from large sets of data using
statistical, mathematical, and computational techniques.
In Data Mining, a pattern means a useful, meaningful, and valid
relationship or structure found in data.
These patterns help understand behavior, predict outcomes,
and make business decisions.
Transaction ID Items Bought
T1 Milk, Bread, Butter
T2 Milk, Bread
T3 Bread, Butter
T4
Milk, Bread,
Butter, Eggs
Association Patterns
(Dependency Rules)
Shows relationships between items or variables.
Common in market basket analysis.
If {Milk, Bread} → {Butter} (Support = 30%, Confidence = 80%)
30% of transactions include milk, bread, and butter;
80% of customers who buy milk and bread also buy butter.
Technique Used
Apriori Algorithm
FP-Growth Algorithm
What is the Apriori Algorithm?
The Apriori Algorithm is a classic data mining algorithm used
to find frequent itemsets and generate association rules from
large transactional databases.
Find patterns like
“If a customer buys Milk and Bread, they also buy
Butter.”
Before
understanding how
Apriori works, you
must know these
three important
terms:
1.ItemsetA set of items bought together in a transaction.
Example: {Milk, Bread}
2. Support
Measures how frequently an itemset
appears in the dataset.
Support(A→B)= Total transactions
/
Transactions containing (A ∪ B)
If 2 out of 4 transactions contain {Milk, Bread},
Support=2/4=0.5=50%
What does the formula mean?
It is used to measure the confidence of the rule A ⇒ B, which asks:
"How likely is B to be true when A is true?"
Support(A):
The proportion of transactions in the dataset that contain itemset A.
Support(A ∪ B):
The proportion of transactions that contain both A and B together.
Confidence(A → B):
The likelihood that a transaction containing A also contains B.
It's calculated by:
Confidence(A→B)
=
Transactions with both A and B
Transactions with A
Suppose you have 100 transactions from a store:
40 contain milk (A)
30 contain both milk and bread (A ∪ B)
Support(A) = 40/100 = 0.40
Support(A ∪ B) = 30/100 = 0.30
Confidence(A→B)= 0.30/0.40 = 0.75
So, 75% of the customers who bought milk also bought bread.
The confidence formula
helps quantify the
strength of a rule in
association mining. A
higher confidence
means a stronger rule—
i.e., if A happens, B is
more likely to happen.
3. Confidence
Candidate Support (%)
{Milk, Bread} 75%
{Milk, Butter} 50%
{Bread, Butter} 75%
Candidate Support (%)
{Milk, Bread,
Butter}
50%
Rule Support Confidence Lift
Milk → Bread 75% 100% 1
Bread → Butter 75% 75% 1
Milk, Bread →
Butter
50% 66% >1
4.Lift
✅ If Lift > 1 → positive relationship
✅ If Lift = 1 → no relationship
✅ If Lift < 1 → negative relationship
Steps in the Apriori Algorithm
Step 1: Set Minimum Support and Confidence
Decide thresholds (e.g., support ≥ 50%, confidence ≥ 60%).
Step 2: Find All Frequent 1-Itemsets
Count support for each single item.
Keep only items meeting the minimum support threshold.
Step 3: Generate Candidate 2-Itemsets
Step 4: Generate Candidate 3-Itemsets
Combine frequent 2-itemsets.
Step 5: Generate Association Rules
Domain Application
??????️ Retail
Market Basket Analysis (find products bought
together)
?????? Banking
Detecting services used together
?????? Web Usage Mining
Finding patterns in website navigation
?????? Healthcare
Identifying symptom-disease relationships
?????? Marketing
Cross-selling and recommendation systems
Real-Life Applications
of Apriori
Advantages
and Limitations
Advantages Limitations
Simple and easy to implement
Works well for small to medium data
Provides clear rules
High computation cost for large datasets
Generates too many candidate sets
Doesn’t handle numeric or continuous data easily
Transaction ID Items Bought
T1 Milk, Bread, Butter
T2 Bread, Butter
T3 Milk, Bread
T4 Milk, Bread, Butter
FP-Growth
Algorithm?
Instead of generating candidate itemsets one by one (like Apriori),
FP-Growth uses a compact data structure called the FP-Tree
(Frequent Pattern Tree).
Point 01
Scan the transaction database once to find frequent items.
Sort items in each transaction by their frequency.
Build a tree structure that stores items and their occurrence
counts.
Step 1: Build the FP-Tree
FP-Growth (Frequent Pattern Growth) is a data mining
algorithm used to find frequent itemsets in large datasets
just like the Apriori algorithm, but faster and more
efficient.
Step 2: Extract Frequent Item sets
Starting from the bottom of the tree, recursively find prefix
paths (patterns).
Generate conditional FP-Trees for each item.
Combine them to form frequent itemsets.
Limitation Description
?????? Complex Tree Structure
Can be hard to
understand and
?????? Memory Usage
May grow large for
sparse data (many
?????? Not Easy for Dynamic
Data
Tree must be
rebuilt if dataRoot
├── Bread (4)
│ ├── Milk (3)
│ └── Butter (3)
Then, frequent patterns are extracted such as:
{Bread}
{Bread, Milk}
{Bread, Butter}
{Bread, Milk, Butter}
Step 2: KDD Process
(Knowledge Discovery in
Database)
Data Cleaning – Remove noise and missing values
Data Integration – Combine data from multiple sources
Data Selection – Choose relevant data
Data Transformation – Convert into suitable format
Data Mining – Apply algorithms and find patterns
Pattern Evaluation – Identify meaningful results
Knowledge Presentation – Visualize results for users
Before mining, raw data must be cleaned, organized, and transformed.
This process is called Data Preprocessing.
Step 3: Preparing the Data
(Data Preprocessing)
Steps in Data Preparation:
1. Data Cleaning
Remove duplicate, missing, or inconsistent data.
Example: Replacing blank values with the average of the column.
2. Data Integration
Combine data from multiple sources (e.g., sales + customer + location data).
Helps create a unified dataset.
3. Data Selection
Choose only relevant attributes for mining.
Example: For predicting sales, choose “price,” “discount,” “region” but not “employee age
4. Data Transformation
Convert data into a suitable format for analysis.
Includes:
Normalization: Scale values into a fixed range (e.g., 0–1)
Aggregation: Summarize data (e.g., weekly → monthly sales)
Encoding: Convert text data into numbers for algorithms.
To make the dataset accurate, consistent, and
ready for mining algorithms.
Poor data quality = poor results, no matter how
good the algorithm is.