PCA,LCA,T-SNE(Sophia Sarah 2023UCA1863).pptx

way2bubbles 21 views 18 slides Sep 01, 2025
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

principal component analysis


Slide Content

Principal Component Analysis (PCA) By Sophia Sarah 2023UCA1863

Introduction to PCA Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction. It transforms correlated variables into a set of uncorrelated variables called principal components. PCA helps in data visualization and reducing computational complexity. Developed by Karl Pearson in 1901, PCA is widely used in data science and machine learning.

Why PCA? Reduces the number of dimensions while preserving important information. Helps in removing noise and redundancy. Useful for feature extraction in machine learning and data analysis. Commonly used in image processing, genomics, finance, and text analysis. Reduces overfitting in machine learning models.

Mathematical Foundation PCA is based on linear algebra concepts like eigenvectors and eigenvalues. Steps involve computing the covariance matrix and performing eigenvalue decomposition. Principal components are ordered based on variance explained. The first principal component captures the highest variance, followed by the second, and so on. Formula: X' = X * W , where W is the eigenvector matrix.

Step-by-Step PCA Process Standardize the dataset (mean = 0, variance = 1) to ensure equal importance of features. Compute the covariance matrix to measure feature relationships. Compute eigenvalues and eigenvectors to determine principal components. Select top k principal components based on variance explained. Transform original data into a new subspace using principal components. Reconstruct the data if necessary by reversing the PCA transformation.

Example Dataset & Implementation

PCA vs Other Dimensionality Reduction Techniques Method Linear/Non-linear Use Case Pros Cons PCA Linear Data compression Fast, simple Loss of interpretability LDA Linear Classification Maximizes class separability Requires labeled data ICA Non-Linear Signal separation Finds independent signals Computationally expensive t-SNE Non-Linear Data visualization Preserves local structure Hard to interpret quantitatively

Choosing the Number of Components Explained variance ratio helps determine the optimal number of components. Scree plot visualizes cumulative variance to select components. Rule of thumb: Select components that explain at least 95% variance. Use pca.explained_variance_ratio _ in Python to determine variance contribution.

Advantages & Disadvantages of PCA Advantages Reduces dimensionality and computation cost. Enhances visualization of high-dimensional data. Removes multicollinearity, improving model performance. Works well with continuous numerical data. Disadvantages Loss of interpretability, making it hard to understand transformed features. Sensitive to scaling of data—requires standardization. Assumes linear relationships, which may not hold in real-world cases. Not suitable for categorical data.

Real Life applications Image Processing: Face recognition, object detection, handwriting analysis. Finance: Stock market prediction, risk management. Genomics: Gene expression analysis, bioinformatics. Medical Diagnosis: Disease classification, MRI scan analysis. Speech Recognition: Audio signal processing.

What is LDA( Linear Discriminant Analysis ) ? . A supervised learning technique for dimensionality reduction. Finds a linear combination of features that separates different classes. Used for classification (e.g., face recognition, speech recognition). How LDA Works Compute the Mean Vectors for each class. Compute Scatter Matrices Within-class scatter matrix SWS_WSW​ Between-class scatter matrix SBS_BSB​ Compute the Eigenvalues & Eigenvectors of SW−1SBS_W^{-1} S_BSW−1​SB​. Select Top K Eigenvectors to project data.

What is ICA (Independent Component Analysis)? How ICA Works A statistical technique to separate a mixed signal into independent sources. Unlike PCA, it assumes non-Gaussian, statistically independent components . Commonly used in blind source separation (e.g., separating mixed audio signals) Maximizes statistical independence of components. Uses techniques like FastICA or Infomax Algorithm . Example: Separating voice signals from multiple speakers (cocktail party problem) .

What is t-SNE (t-Distributed Stochastic Neighbor Embedding)? How t-SNE Works A nonlinear dimensionality reduction technique. Preserves local structures in high-dimensional data. Mainly used for visualization of complex datasets (e.g., clustering in ML). . Converts high-dimensional Euclidean distances into probability distributions . Minimizes Kullback-Leibler divergence between high and low-dimensional distributions. Works well for image, text, and genomic data visualization

Comparison of LCA, ICA, and t-SNE Technique Type Strengths Common Applications LCA Linear Sparse representation Image processing, deep learning ICA Nonlinear Independent components Signal processing, EEG analysis t-SNE Nonlinear Data visualization Clustering, deep learning

References I. T. Jolliffe, "Principal Component Analysis: A Review and Recent Developments." Python documentation for sklearn PCA . Laurens van der Maaten & Geoffrey Hinton, "Visualizing Data using t-SNE" (for comparison purposes). Pearson (1901) for PCA, Fisher (1936) for LDA, Comon (1994) for ICA, and van der Maaten & Hinton (2008) for t-SNE. McLachlan (2004) for LDA vs. PCA and Hyvärinen & Oja (2000) for ICA vs. PCA. Shlens (2014) for PCA, van der Maaten (2014) for t-SNE, and Hyvärinen (1999) for ICA.
Tags