Unit 5_Association and Dimensionality Reduction.pdf
KanchanPatil34
29 views
62 slides
Oct 28, 2025
Slide 1 of 62
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
About This Presentation
Association Rules – Market Basket Analysis, The Apriori Algorithm, Performance Measures – Support, Confidence, Lift. Dimensionality, Reduction: Principal Component Analysis, Partial Least Squares Subset Selection, Feature Reduction/Dimensionality reduction, Principal components Analysis (Eigenva...
Association Rules – Market Basket Analysis, The Apriori Algorithm, Performance Measures – Support, Confidence, Lift. Dimensionality, Reduction: Principal Component Analysis, Partial Least Squares Subset Selection, Feature Reduction/Dimensionality reduction, Principal components Analysis (Eigenvalues, Eigenvectors, Orthogonality)
Size: 1.99 MB
Language: en
Added: Oct 28, 2025
Slides: 62 pages
Slide Content
Machine Learning
Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423603
(An Autonomous Institute Affiliated to Savitribai Phule Pune University, Pune)
NAAC ‘A’ Grade Accredited
Department of Information Technology
NBA Accredited-UG Programme
Ms. K. D. Patil
Assistant Professor
Contents – Association and Dimentionality
Reduction
•Association Rules – Market Basket Analysis, The Apriori Algorithm,
Performance Measures –Support, Confidence, Lift. Dimensionality,
Reduction: Principal Component Analysis, Partial Least Squares Subset
Selection, Feature Reduction/Dimensionality reduction, Principal
components Analysis (Eigenvalues, Eigenvectors, Orthogonality)
Machine Learning Department of Information Technology
Course Outcome
•CO5: To apply the Association rule and Principle Component Analysis
(PCA).
Machine Learning Department of Information Technology
Introduction
Machine Learning Department of Information Technology
•Association (or Association Rule Learning or Association Rule Mining)
•It identifies relationships or dependencies between variables in large
datasets.
•It’s mainly used to find frequent patterns, correlations, or associations
among a set of items.
•The most common application of association is in Market Basket Analysis.
•Goal: Discover interesting relationships between variables.
•Key Algorithms: Apriori, Eclat, FP-Growth.
Introduction
Machine Learning Department of Information Technology
•Example 1:
•In a grocery store, association learning might reveal that “if a customer buys
bread, they are likely to buy butter.”
•These patterns help retailers place related items close together or bundle
them in promotions.
•Example 2:
•Amazon uses association rules to recommend products to customers.
•If many customers who bought a camera also bought a memory card, the
algorithm suggests memory cards when someone is browsing cameras.
Apriori algorithm
Machine Learning Department of Information Technology
•Introduction:
•Apriori algorithm is a classical algorithm in Data Mining that is used for mining
frequent item sets and association rule mining.
•Say, Ram goes to buy a Bread from the supermarket. He also grabs Butter as well.
The manager there analyses that, not only Ram, people often tend to buy Bread
and Butter together.
•After finding out the pattern, the manager starts to arrange these items together
and notices an increase in sales.
•This process of identifying an association between products/items is called
association rule mining.
•To implement association rule mining, many algorithms have been developed.
•Apriori algorithm is one of the most popular and arguably the most efficient
algorithms among them.
Apriori algorithm
Machine Learning Department of Information Technology
•What is Association Rule Mining?
•Association rule mining is a technique to identify frequent patterns and
associations among a set of items,
•For example, understanding customer buying habits.
•By finding correlations and associations between different items that
customers place in their ‘shopping basket,’ recurring patterns can be derived.
Apriori algorithm
Machine Learning Department of Information Technology
•Introduction continue..
•Apriori algorithm assumes that any subset of a frequent item set must be
frequent.
•Say, a transaction containing {Bread, Butter, Jam} also contains {Bread, Jam}.
So, according to the principle of Apriori, if {Bread, Butter, Jam} is frequent,
then {Bread, Jam} must also be frequent.
•The key concept in the Apriori algorithm is that it assumes all subsets of a
frequent itemset to be frequent.
•Similarly, for any infrequent itemset, all its supersets must also be
infrequent.
Apriori algorithm
Machine Learning Department of Information Technology
•Introduction continue..
•Association rules analysis is a technique to uncover how items are associated
to each other.
•There are three common ways to measure association.
•Lets consider the following dataset to understand the association
Apriori algorithm
Machine Learning Department of Information Technology
•Measure 1: Support
•This says how popular an itemset is, as measured by the proportion of
transactions in which an itemset appears.
•In Table above , the support of {apple} is 4 out of 8, or 50%. Item sets can
also contain multiple items.
•For instance, the support of {apple, juice, rice} is 2 out of 8, or 25%.
Apriori algorithm
Machine Learning Department of Information Technology
•Measure 1: Support
•If you discover that sales of items beyond a certain proportion tend to have a
significant impact on your profits, you may consider that proportion as your
support threshold.
•You may then identify itemsets with support values above this threshold as
significant itemsets.
•Support refers to the default popularity of an item and can be calculated by
finding the number of transactions containing a particular item divided by
the total number of transactions.
Apriori algorithm
Machine Learning Department of Information Technology
•Measure 2: Confidence
•This says how likely item Y is purchased when item X is purchased, expressed
as {X -> Y}.
•This is measured by the proportion of transactions with item X, in which item
Y also appears.
•In Table above, the confidence of {apple -> juice} is 3 out of 4, or 75%.
Confidence(A→B)
= (Transactions containing both (A and B))/(Transactions containing A)
Apriori algorithm
Machine Learning Department of Information Technology
•Measure 2: Confidence
•One drawback of the confidence measure is that it might misrepresent the
importance of an association.
•This is because it only accounts for how popular apples are, but not juice. If
juices are also very popular in general, there will be a higher chance that a
transaction containing apples will also contain juices, thus inflating the
confidence measure.
•Confidence refers to the likelihood that item B (mouse) is also bought if item
A (keyboard) is bought.
•Confidence(Keyboard→Mouse) = (Transactions containing both (Keyboard
and Mouse))/(Transactions containing Keyboard)
Apriori algorithm
Machine Learning Department of Information Technology
•Measure 3: Lift
•This says how likely item Y is purchased when item X is purchased,
while controlling for how popular item Y is.
•In Table above, the lift of {apple -> juice} is 1,which implies no
association between items.
•A lift value greater than 1 means that item Y is likely to be bought if
item X is bought, while a value less than 1 means that item Y is unlikely
to be bought if item X is bought.
Lift(A→B) = (Confidence (A→B))/(Support (B))
Apriori algorithm
Machine Learning Department of Information Technology
Measure 1: Support
Support(A) = (Transactions containing (A))/(Total Transactions)
Support(B) = (Transactions containing (B))/(Total Transactions)
Measure 2: Confidence
Confidence(A→B) = (Transactions containing both (A and B))/(Transactions
containing A)
Apriori algorithm
Machine Learning Department of Information Technology
•Example:
•Suppose we have a record of 1 thousand customer transactions, and we want
to find the Support, Confidence, and Lift for two items e.g. burgers and
ketchup.
•Out of one thousand transactions, 100 contain ketchup while 150 contain a
burger.
•Out of 150 transactions where a burger is purchased, 50 transactions
contain ketchup as well.
•Using this data, we want to find the support, confidence, and lift.
Apriori algorithm
Machine Learning Department of Information Technology
•Example:
•Support(B) = (Transactions containing (B))/(Total Transactions)
•For instance if out of 1000 transactions, 100 transactions contain Ketchup
then the support for item Ketchup can be calculated as:
Support(Ketchup) = (Transactions containing Ketchup)/(Total Transactions)
Support(Ketchup) = 100/1000 = 10%
•For instance if out of 1000 transactions, 150 transactions contain Burger
then the support for item Burger can be calculated as:
Support(Burger) = (Transactions containing Burger)/(Total Transactions)
Support(Burger) = 150/1000 = 15%
Apriori algorithm
Machine Learning Department of Information Technology
•Example:
•Confidence(A→B) = (Transactions containing both (A and B))/(Transactions
containing A)
•We had 50 transactions where Burger and Ketchup were bought together.
While in 150 transactions, burgers are bought.
•Then we can find likelihood of buying ketchup when a burger is bought can
be represented as confidence of Burger -> Ketchup and can be
mathematically written as:
Confidence(Burger→Ketchup) = (Transactions containing both (Burger and
Ketchup))/(Transactions containing Burger)
Confidence(Burger→Ketchup) = 50/150 = 33.3%
Apriori algorithm
Machine Learning Department of Information Technology
•Example:
•Lift(A→B) = (Confidence (A→B))/(Support (B))
•The Lift(Burger -> Ketchup) can be calculated as:
Lift(Burger→Ketchup) = (Confidence (Burger→Ketchup))/(Support(Ketchup))
Lift(Burger→Ketchup) = 33.3/10 = 3.33
•Lift basically tells us that the likelihood of buying a Burger and Ketchup
together is 3.33 times more than the likelihood of just buying the ketchup.
•A Lift of 1 means there is no association between products A and B. Lift of
greater than 1 means products A and B are more likely to be bought
together.
•Finally, Lift of less than 1 refers to the case where two products are unlikely
to be bought together.
Apriori algorithm
Machine Learning Department of Information Technology
•Practice Examples:
•Solve: A Grocery shop has 2000 purchased transaction data, out of which 200
contain Bread while 300 contain Jam. Out of 300 transaction where Jam is
purchased 100 transactions contains Bread as well. Using above data calculate
Support, Confidence, and Lift.
•Solve: A fruit shop has 4000 purchased transaction data, out of which 400
contain Berry while 600 contain Apple. Out of 600 transaction where Apple is
purchased 200 transactions contains Berry as well.Using above data calculate
Support, Confidence, and Lift.
Apriori algorithm - Working
Machine Learning Department of Information Technology
Apriori algorithm - Working
Machine Learning Department of Information Technology
•Step 1. Identifying Frequent Item-Sets
•The Apriori algorithm starts by looking through all the data to count how
many times each single item appears. These single items are called 1-Item-
Sets.
•Next, it uses a rule called minimum support.
•This is a number that tells us how often an item or group of items needs
to appear to be important.
•If an item appears often enough meaning its count is above this
minimum support it is called a frequent Item-Set.
Apriori algorithm - Working
Machine Learning Department of Information Technology
•Step 2. Creating Possible Item Group
•After finding the single items that appear often enough (frequent 1-item
groups) the algorithm combines them to create pairs of items (2-item
groups).
•Then it checks which pairs are frequent by seeing if they appear enough
times in the data.
•This process keeps going step by step making groups of 3 items, then 4 items
and so on.
•The algorithm stops when it can’t find any bigger groups that happen often
enough.
Apriori algorithm - Working
Machine Learning Department of Information Technology
•Step 3. Removing Infrequent Item Groups
•The Apriori algorithm uses a helpful rule to save time.
•This rule says: if a group of items does not appear often enough then any
larger group that includes these items will also not appear often.
•Because of this, the algorithm does not check those larger groups.
•This way it avoids wasting time looking at groups that won’t be important
make the whole process faster.
Apriori algorithm - Working
Machine Learning Department of Information Technology
•Step 3. Removing Infrequent Item Groups
•The Apriori algorithm uses a helpful rule to save time.
•This rule says: if a group of items does not appear often enough then any
larger group that includes these items will also not appear often.
•Because of this, the algorithm does not check those larger groups.
•This way it avoids wasting time looking at groups that won’t be important
make the whole process faster.
•Step 4. Generating Association Rules
•The algorithm makes rules to show how items are related.
•It checks these rules using support, confidence and lift to find the
strongest ones.
Market Basket Analysis
Machine Learning Department of Information Technology
•Introduction:
•Market basket analysis is a strategic data mining technique used by retailers
to enhance sales by gaining a deeper understanding of customer purchasing
patterns.
•This method involves examining substantial datasets, such as historical
purchase records, to unveil inherent product groupings and identify items
that customers tend to buy together.
•By recognizing these patterns of co-occurrence, retailers can make informed
decisions to optimize inventory management, devise effective marketing
strategies, employ cross-selling tactics, and even refine store layout for
improved customer engagement.
Market Basket Analysis
Machine Learning Department of Information Technology
•Introduction:
•Market basket analysis is a strategic data mining technique used by retailers
to enhance sales by gaining a deeper understanding of customer purchasing
patterns.
•Market basket analysis mainly works with the ASSOCIATION RULE {IF} ->
{THEN}.
•IF means Antecedent: An antecedent is an item found within the data
•THEN means Consequent: A consequent is an item found in combination
with the antecedent.
Market Basket Analysis
Machine Learning Department of Information Technology
•How Does Market Basket Analysis Work?
1.Collect data on customer transactions, such as the items purchased in each
transaction, the time and date of the transaction, and any other relevant
information.
2.Clean and preprocess the data, removing any irrelevant information,
handling missing values, and converting the data into a suitable format for
analysis.
3.Use association rules mining algorithms such as Apriori or FP-Growth to
identify frequent item sets, sets of items often appearing together in a
transaction.
4.Calculate the support and confidence for each frequent itemset, expressing
the likelihood of one item being purchased given the purchase of another
item.
Market Basket Analysis
Machine Learning Department of Information Technology
•How Does Market Basket Analysis Work?
5.Generate association rules based on the frequent itemsets and their
corresponding support and confidence values. Association rules indicate the
likelihood of purchasing one item given the purchase of another item.
6.Interpret the results of the market basket analysis, identifying frequent
purchases, assessing the strength of the association between items, and
uncovering other relevant insights into customer behavior and preferences.
7.Use the insights from the market basket analysis to inform business
decisions such as product recommendations, store layout optimization, and
targeted marketing campaigns.
Market Basket Analysis - Types
Machine Learning Department of Information Technology
•Descriptive market basket analysis:
•This sort of analysis looks for patterns and connections in the data that exist
between the components of a market basket.
•This kind of study is mostly used to understand consumer behavior,
including what products are purchased in combination and what the most
typical item combinations.
•Retailers can place products in their stores more profitably by understanding
which products are frequently bought together with the aid of descriptive
market basket analysis.
Market Basket Analysis - Types
Machine Learning Department of Information Technology
•Predictive Market Basket Analysis:
•Market basket analysis that predicts future purchases based on past
purchasing patterns is known as predictive market basket analysis.
•Large volumes of data are analyzed using machine learning algorithms in this
sort of analysis in order to create predictions about which products are most
likely to be bought together in the future.
•Retailers may make data-driven decisions about which products to carry,
how to price them, and how to optimize shop layouts with the use of
predictive market basket research.
Market Basket Analysis - Types
Machine Learning Department of Information Technology
•Differential Market Basket Analysis:
•Differential market basket analysis analyses two sets of market basket data
to identify variations between them.
•Comparing the behavior of various client segments or the behavior of
customers over time is a common usage for this kind of study.
•Retailers can respond to shifting consumer behavior by modifying their
marketing and sales tactics with the help of differential market basket
analysis.
Market Basket Analysis - Applications
Machine Learning Department of Information Technology
•Retail : Identify frequently purchased product combinations and create promotions or
cross-selling strategies
•E-commerce: Suggest complementary products to customers and improve the
customer experience
•Hospitality: Identify which menu items are often ordered together and create meal
packages or menu recommendations
•Healthcare: Understand which medications are often prescribed together and identify
patterns in patient behavior or treatment outcomes
•Banking/Finance: Identify which products or services are frequently used together by
customers and create targeted marketing campaigns or bundle deals
•Telecommunications: Understand which products or services are often purchased
together and create bundled service packages that increase revenue and improve the
customer experience
Market Basket Analysis - Advantages
Machine Learning Department of Information Technology
•It helps retailers in the following ways:
•Increases customer engagement
•Boosts sales and increases RoI
•Improves customer experience
•Optimizes marketing strategies and campaigns
•Helps in demographic data analysis
•Identifies customer behavior and pattern
Principal Component Analysis (PCA)
Machine Learning Department of Information Technology
•Principal Component Analysis (PCA) is a powerful technique used in data
analysis, particularly for reducing the dimensionality of datasets while
preserving crucial information.
•Dimensionality reduction is the process of simplifying a high-dimensional
dataset by reducing the number of features (or variables) while retaining the
most important information and original data patterns.
•Dimensionality can be reduced by transforming the original variables into a set
of new, uncorrelated variables called principal components.
•It helps to remove redundancy, improve computational efficiency and make data
easier to visualize and analyze especially when dealing with high-dimensional
data.
Principal Component Analysis (PCA)
Machine Learning Department of Information Technology
•Principal Component Analysis (PCA) is a powerful technique used in data
analysis, particularly for reducing the dimensionality of datasets while
preserving crucial information.
•Dimensionality reduction is the process of simplifying a high-dimensional
dataset by reducing the number of features (or variables) while retaining the
most important information and original data patterns.
•Dimensionality can be reduced by transforming the original variables into a set
of new, uncorrelated variables called principal components.
•It helps to remove redundancy, improve computational efficiency and make data
easier to visualize and analyze especially when dealing with high-dimensional
data.
•It prioritizes the directions where the data varies the most because,
more variation = more useful information.
Principal Component Analysis (PCA)
Machine Learning Department of Information Technology
•Key Aspects:
•Data Exploration and Visualization: It plays a significant role in data
exploration and visualization, aiding in uncovering hidden patterns and
insights.
•Linear Transformation: PCA performs a linear transformation of data,
seeking directions of maximum variance.
•Feature Selection: Principal components are ranked by the variance they
explain, allowing for effective feature selection.
•Data Compression: PCA can compress data while preserving most of the
original information.
•Clustering and Classification: It finds applications in clustering and
classification tasks by reducing noise and highlighting underlying structure.
Principal Component Analysis (PCA)
Machine Learning Department of Information Technology
•Key Aspects:
•Advantages: PCA offers linearity, computational efficiency, and scalability for
large datasets.
•Limitations: It assumes data normality and linearity and may lead to
information loss.
•Matrix Requirements: PCA works with symmetric correlation or covariance
matrices and requires numeric, standardized data.
•Eigenvalues and Eigenvectors: Eigenvalues represent variance magnitude,
and eigenvectors indicate variance direction.
•Number of Components: The number of principal components chosen
determines the number of eigenvectors computed.
Principal Component Analysis
Machine Learning Department of Information Technology
•Basic Terminology:
•Variance – for calculating the variation of data distributed across
dimensionality of graph.
•Covariance – calculating dependencies and relationship between features.
•Standardizing data – Scaling our dataset within a specific range for unbiased
output.
Principal Component Analysis
Machine Learning Department of Information Technology
•Basic Terminology:
•Covariance matrix - Used for calculating interdependencies between the
features or variables and also helps in reducing it to improve the
performance
Principal Component Analysis
Machine Learning Department of Information Technology
•Basic Terminology:
•EigenValues and EigenVectors: Eigenvectors’ purpose is to find out the largest variance
that exists in the dataset to calculate Principal Component. Eigenvalue means the
magnitude of the Eigenvector. Eigenvalue indicates variance in a particular direction and
whereas eigenvector is expanding or contracting X-Y (2D) graph without altering the
direction.
•In this shear mapping, the blue arrow changes direction,
whereas the pink arrow does not.
•In this instance, the pink arrow is an eigenvector because of
its constant orientation.
•The length of this arrow is also unaltered, and its eigenvalue is
1.
•Technically, a PC is a straight line that captures the data’s
maximum variance (information).
•PC shows direction and magnitude. PCs are perpendicular to
each other.
Principal Component Analysis
Machine Learning Department of Information Technology
•Basic Terminology:
•Dimensionality Reduction: Transpose of original data and multiply it by
transposing of the derived feature vector. Reducing the features without
losing information.
Principal Component Analysis – Why?
Machine Learning Department of Information Technology
•Why Do We Need PCA in Machine Learning?
•Overfitting issues will arise while working with high-dimensional data, and
dimensionality reduction will be used to address them.
•Increasing interpretability and minimizing information loss.
•aids in locating important characteristics.
•When to use PCA?
•Whenever we need to know our features are independent of each other
•Whenever we need fewer features from higher features
Principal Component Analysis - Working
Machine Learning Department of Information Technology
•Step 1: Standardize the Data:
•If the features of your dataset are on different scales, it's essential to
standardize them (subtract the mean and divide by the standard deviation).
•Step 2: Compute the Covariance Matrix:
•Calculate the covariance matrix for the standardized dataset.
•Step 3: Compute Eigenvectors and Eigenvalues:
•Find the eigenvectors and eigenvalues of the covariance matrix. The
eigenvectors represent the directions of maximum variance, and the
corresponding eigenvalues indicate the magnitude of variance along those
directions.
Principal Component Analysis - Working
Machine Learning Department of Information Technology
•Step 4: Sort Eigenvectors by Eigenvalues:
•Sort the eigenvectors based on their corresponding eigenvalues in
descending order.
•Step 5: Choose Principal Components:
•Select the top k eigenvectors (principal components) where k is the desired
dimensionality of the reduced dataset.
•Step 6: Transform the Data:
•Multiply the original standardized data by the selected principal components
to obtain the new, lower-dimensional representation of the data.
Principal Component Analysis
Machine Learning Department of Information Technology
•Advantages:
•Used for Dimensionality Reduction
•PCA will assist you in eliminating all related features, sometimes referred to
as multi-collinearity.
•The time required to train your model is now substantially shorter because
to PCA's reduction in the number of features.
•PCA aids in overcoming overfitting by eliminating the extraneous features
from your dataset.
Principal Component Analysis
Machine Learning Department of Information Technology
•Disadvantages:
•Useful for quantitative data but not effective with qualitative data.
•Interpretation of PC is difficult from original data
•Applications:
•Computer Vision
•Bio-informatics application
•For compressed images or resizing of the image
•Discovering patterns from high-dimensional data
•Reduction of dimensions
•Multidimensional Data – Visualization
Dimensionality Reduction
Machine Learning Department of Information Technology
•Space required to store the data is reduced as the number of dimensions comes
down
•Less dimensions lead to less computation/training time
•Some algorithms do not perform well when we have a large dimensions. So
reducing these dimensions needs to happen for the algorithm to be useful
•It takes care of multicollinearity by removing redundant features.
•For example, you have two variables „time spent on treadmill in minutes‟ and
„calories burnt‟. These variables are highly correlated as the more time you
spend running on a treadmill, the more calories you will burn. Hence, there is no
point in storing both as just one of them does what you require
Dimensionality Reduction
Machine Learning Department of Information Technology
•Common Dimensionality Reduction Techniques:
•Dimensionality reduction can be done in two different ways:
•By only keeping the most relevant variables from the original dataset (this
technique is called feature selection)
•By finding a smaller set of new variables, each being a combination of the
input variables, containing basically the same information as the input
variables (this technique is called dimensionality reduction)
Dimensionality Reduction
Machine Learning Department of Information Technology
•Missing Value Ratio:
•While exploring dataset, if it is found that your dataset has some missing
values.
•We will try to find out the reason for these missing values and then impute
them or drop the variables entirely which have missing values (using
appropriate methods).
•What if we have too many missing values (say more than 50%)? Should we
impute the missing values or drop the variable? I would prefer to drop the
variable since it will not have much information.
•We can even set a threshold value and if the percentage of missing values in
any variable is more than that threshold, we will drop the variable.
Dimensionality Reduction
Machine Learning Department of Information Technology
•Low Variance Filter:
•Consider a variable in our dataset where all the observations have the same
value, say 1.
•If we use this variable, do you think it can improve the model we will build?
The answer is no, because this variable will have zero variance.
•So, we need to calculate the variance of each variable we are given.
•Then drop the variables having low variance as compared to other variables
in our dataset.
•The reason for doing this, as I mentioned above, is that variables with a low
variance will not affect the target variable.
Dimensionality Reduction
Machine Learning Department of Information Technology
•High Correlation filter:
•High correlation between two variables means they have similar trends and
are likely to carry similar information.
•This can bring down the performance of some models drastically (linear and
logistic regression models, for instance).
•We can calculate the correlation between independent numerical variables
that are numerical in nature.
•If the correlation coefficient crosses a certain threshold value, we can drop
one of the variables (dropping a variable is highly subjective and should
always be done keeping the domain in mind).
Dimensionality Reduction
Machine Learning Department of Information Technology
•Backward Feature Elimination:
•We first take all the n variables present in our dataset and train the model
using them
•We then calculate the performance of the model
•Now, we compute the performance of the model after eliminating each
variable (n times), i.e., we drop one variable every time and train the model
on the remaining n-1 variables
•We identify the variable whose removal has produced the smallest (or no)
change in the performance of the model, and then drop that variable
•Repeat this process until no variable can be dropped
Dimensionality Reduction
Machine Learning Department of Information Technology
•Forward Feature Selection:
•This is the opposite process of the Backward Feature Elimination we saw
above. Instead of eliminating features, we focus on finding the best features
that enhance the model's performance. This technique, known as feature
extraction, operates as follows:
•We start with a single feature. Essentially, we train the model n number of
times using each feature separately
•The variable giving the best performance is selected as the starting variable
•Then we repeat this process and add one variable at a time. The variable
that produces the highest increase in performance is retained
•We repeat this process until no significant improvement is seen in the
model's performance
Dimensionality Reduction
Machine Learning Department of Information Technology
•Eigenvalues and Eigenvectors:
•Eigenvalues and eigenvectors are concepts from linear algebra that are used
to analyse and understand linear transformations, particularly those
represented by square matrices.
•They are used in many different areas of mathematics, including machine
learning and artificial intelligence.
•In machine learning, eigenvalues and eigenvectors are used to represent
data, to perform operations on data, and to train machine learning models.
•In artificial intelligence, eigenvalues and eigenvectors are used to develop
algorithms for tasks such as image recognition, natural language processing,
and robotics.
Dimensionality Reduction
Machine Learning Department of Information Technology
•Eigenvalue (λ):
•An eigenvalue of a square matrix A is a scalar (a single number) λ such that
there exists a non-zero vector v (the eigenvector) for which the following
equation holds:
Av = λv
•In other words, when you multiply the matrix A by the eigenvector v, you get
a new vector that is just a scaled version of v (scaled by the eigenvalue λ).
Dimensionality Reduction
Machine Learning Department of Information Technology
•Eigenvector:
•The vector v mentioned above is called an eigenvector corresponding to the
eigenvalue λ. Eigenvectors only change in scale (magnitude) when multiplied
by the matrix A; their direction remains the same.
•Mathematically, to find eigenvalues and eigenvectors, you typically solve the
following equation for λ and v:
(A — λI)v = 0
Where:
A is the square matrix for which you want to find eigenvalues and eigenvectors.
λ is the eigenvalue you’re trying to find.
I is the identity matrix (a diagonal matrix with 1s on the diagonal and 0s elsewhere).
v is the eigenvector you’re trying to find.
Dimensionality Reduction
Machine Learning Department of Information Technology
•Use of Eigenvector and Eigenvalue:
•Dimensionality Reduction (PCA):
•In Principal Component Analysis (PCA), you calculate the eigenvectors
and eigenvalues of the covariance matrix of your data.
•The eigenvectors (principal components) with the largest eigenvalues
capture the most variance in the data and can be used to reduce the
dimensionality of the dataset while preserving important information.
•Image Compression:
•Eigenvectors and eigenvalues are used in techniques like Singular Value
Decomposition (SVD) for image compression.
•By representing images in terms of their eigenvectors and eigenvalues,
you can reduce storage requirements while retaining essential image
features.
Dimensionality Reduction
Machine Learning Department of Information Technology
•Use of Eigenvector and Eigenvalue:
•Support vector machines:
•Support vector machines (SVMs) are a type of machine learning
algorithm that can be used for classification and regression tasks.
•SVMs work by finding a hyperplane that separates the data into two
classes.
•The eigenvalues and eigenvectors of the kernel matrix of the SVM can be
used to improve the performance of the algorithm.
•Graph Theory:
•Eigenvectors play a role in analyzing networks and graphs.
•They can be used to find important nodes or communities in social
networks or other interconnected systems.
Dimensionality Reduction
Machine Learning Department of Information Technology
•Use of Eigenvector and Eigenvalue:
•Natural Language Processing (NLP):
•In NLP, eigenvectors can help identify the most relevant terms in a large
document-term matrix, enabling techniques like Latent Semantic Analysis
(LSA) for document retrieval and text summarization.
•Machine Learning Algorithms:
•Eigenvalues and eigenvectors can be used to analyze the stability and
convergence properties of machine learning algorithms, especially in
deep learning when dealing with weight matrices in neural networks.
References
Machine Learning Department of Information Technology
•Ethem Alpaydin, “Introduction to Machine Learning”, PHI 4th Edition-2020, The MIT Press,
ISBN:9780262043793.
•Ian Goodfelllow, Yoshua Benjio, Aaron Courville, “Deep Learning”, The MIT Press
ISBN:97802620356133.
•Tom M. Mitchell, “Machine Learning”, McGraw Hill, 1997 ISBN: 0071154671,
9780071154673
•Peter Flach, “Machine Learning The Art and Science of Algorithms that Make Sense of Data”,
Cambridge University Press India. ISBN 13: 9781107422223
•Christopher Bishop, “Pattern Recognition and Machine Learning”, Springer. 2006, ISBN-13:
978-1493938438
•Shai Shalev, Shwartz and Shai Ben-David, “Understanding Machine Learning”, Cambridge
University, Press 2017. ISBN:978-1-107-05713-5.
Machine Learning Department of Information Technology