Visualization using tSNE

xuyangela 5,338 views 20 slides Jan 29, 2014

Slide 1 of 20

About This Presentation

An introduction to tSNE in the background of dimension reduction

Size: 4.96 MB

Language: en

Added: Jan 29, 2014

Slides: 20 pages

Slide Content

Visualization using tSNE Yan Xu Jun 7, 2013

Linear (PCA) Nonlinear Non-parametric Parametric (LDA) Dimension reduction Global (ISOMAP,MDS) Local (LLE, SNE) Dimension Reduction Overview MDS SNE sym SNE UNI-SNE tSNE Barnes-Hut-SNE Local+probability crowding problem more stable and faster solution tSNE (t-distributed Stochastic Neighbor Embedding) easier implementation 2002 2008 2013 O(N 2 )->O( NlogN ) 2007

MDS: Multi-Dimensional Scaling M ulti- D imensional S caling arranges the low-dimensional points so as to minimize the discrepancy between the pairwise distances in the original space and the pairwise distances in the low-D space.

high-D distance low-D distance Sammon mapping from MDS It puts too much emphasis on getting very small distances exactly right. It’s slow to optimize and also gets stuck in different local optima each time Global to Local?

The idea is to make the local configurations of points in the low-dimensional space resemble the local configurations in the high-dimensional space. Maps that preserve local geometry fixed weights Find the y that minimize the cost subject to the constraint that the y have unit variance on each dimension. LLE (Locally Linear E mbedding)

A probabilistic version of local MDS: Stochastic Neighbor Embedding (SNE) It is more important to get local distances right than non-local ones. Stochastic neighbor embedding has a probabilistic way of deciding if a pairwise distance is “ local ” . Convert each high-dimensional similarity into the probability that one data point will pick the other data point as its neighbor. probability of picking j given i in high D probability of picking j given i in low D

Picking the radius of the Gaussian that is used to compute the p ’ s We need to use different radii in different parts of the space so that we keep the effective number of neighbors about constant. A big radius leads to a high entropy for the distribution over neighbors of i . A small radius leads to a low entropy. So decide what entropy you want and then find the radius that produces that entropy. Its easier to specify perplexity:

The cost function for a low-dimensional representation Gradient descent: Gradient update with a momentum term: Learning rate Momentum

Simpler version SNE: Turning conditional probabilities into pairwise probabilities

MNIST Database of handwritten digits 28×28 images Problem?

Why SNE does not have gaps between classes A uniform background model (UNI-SNE) eliminates this effect and allows gaps between classes to appear. q ij can never fall below Crowding problem : the area accommodating moderately distant datapoints is not large enough compared with the area accommodating nearby datapoints .

From UNI-SNE to t-SNE High dimension: Convert distances into probabilities using a Gaussian distribution Low dimension: Convert distances into probabilities using a probability distribution that has much heavier tails than a Gaussian. Student’s t -distribution V : the number of degrees of freedom Standard Normal Dis. T-Dis. With V = 1

Compare tSNE with SNE and UNI-SNE 10 12 14 16 18 10 12 14 -2 -4

Optimization method for tSNE

Optimization method for tSNE Tricks: Keep momentum term small until the map points have become moderately well organized. Use adaptive learning rate described by Jacobs (1988), which gradually increases the learning rate in directions where the gradient is stable. Early compression: force map points to stay close together at the start of the optimization. Early exaggeration: multiply all the p ij ’s by 4, in the initial stages of the optimization.

6000 MNIST digits Isomap Locally Linear Embedding t-SNE Sammon mapping

tSNE vs Diffusion maps Diffusion distance: Diffusion maps:

Weakness It’s unclear how t-SNE performs on general dimensionality reduction task; The relative local nature of t-SNE makes it sensitive to the curse of the intrinsic dimensionality of the data; It’s not guaranteed to converge to a global optimum of its cost function.

References: t-SNE homepage: http :// homepage.tudelft.nl/19j49/t-SNE.html Advanced Machine Learning: Lecture11: Non-linear Dimensionality Reduction http ://www.cs.toronto.edu/~hinton/csc2535/lectures.html Thank you Plugin Ad: tSNE in Farsight splot = new SNEPlotWindow (this ); splot - > setPerplexity (perplexity ); splot - > setModels (table, selection)) splot - >show();

Visualization using tSNE

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Visualization using tSNE

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......