Review of the t-SNE algorithm which helps visualizing the high dimensional data on manifold by projecting them onto 2D or 3D space with metric preserving.
Size: 2.43 MB
Language: en
Added: Oct 30, 2014
Slides: 33 pages
Slide Content
Visualizing Data using t-SNE
Laurens van der Maaten and Georey Hinton, JMLR 2008
Kevin Zhao [email protected]
October 30, 2014
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 1 / 33
Overview
1
Overview
2
t-Distributed Stochastic Neighbor Embedding
3
Experiment Setup and Results
4
Code and Web Resources
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 2 / 33
Introduction
Overview
We are given a collection ofNhigh-dimensional objectsx1; :::xN
How can we get a feel for how these objects are arranged in the data
space?
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 3 / 33
Introduction
Principal Components Analysis
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 4 / 33
Introduction
Principal Components Analysis
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 5 / 33
Introduction
Swiss Roll
PCA is mainly concerned dimensionality, with preserving when large
pairwise distances in the map
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 6 / 33
t-Distributed Stochastic Neighbor Embedding
Introduction
Distance Perservation
Neighbor Perservation
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 7 / 33
t-Distributed Stochastic Neighbor Embedding
Introduction
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 8 / 33
t-Distributed Stochastic Neighbor Embedding
Introduction
Preserve the neighborhood
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 9 / 33
t-Distributed Stochastic Neighbor Embedding
Introduction
Measure pairwise similarities between high-dimensional and
low-dimensonal objects
p
jji=
exp(jjxixjjj
2
=2
2
i
)
P
k6=i
exp(jjxixkjj
2
=2
2
i
)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 10 / 33
t-Distributed Stochastic Neighbor Embedding
Stochastic Neighbor Embedding
Converting the high-dimensional Euclidean distances into conditional
probabilities that represent similarities
Similarity of datapoints in High Dimension
p
jji=
exp(jjxixjjj
2
=2
2
i
)
P
k6=i
exp(jjxixkjj
2
=2
2
i
)
Similarity of datapoints in Low Dimension
q
jji=
exp(jjyiyjjj
2
)
P
k6=i
exp(jjyiykjj
2
)
Cost function
C=
X
i
KL(PijjQi) =
X
i
X
j
p
jjilog
p
jji
q
jji
Minimize the cost function using gradient descent
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 11 / 33
t-Distributed Stochastic Neighbor Embedding
Stochastic Neighbor Embedding
Gradient has a surprisingly simple form
@C
@yi
=
X
j6=i
(p
jjiq
jji+p
ijjq
ijj)(yiyj)
The gradient update with momentum term is given by
Y
(t)
=Y
(t1)
+
@C
@yi
+(t)(Y
(t1)
Y
(t2)
)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 12 / 33
t-Distributed Stochastic Neighbor Embedding
Symmetric SNE
Minimize the sum of the KL divergences between the conditional
probabilities
C=
X
i
KL(PijjQi) =
X
i
X
j
p
jjilog
p
jji
q
jji
Minimize a single KL divergence between a joint probability
distribution
C=KL(PjjQ) =
X
i
X
j6=i
pijlog
pij
qij
The obvious way to redene the pairwise similarities is
pij=
exp(jjxixjjj
2
=2
2
)
P
k6=l
exp(jjxlxkjj
2
=2
2
)
qij=
exp(jjyiyjjj
2
)
P
k6=l
exp(jjylykjj
2
)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 13 / 33
t-Distributed Stochastic Neighbor Embedding
Symmetric SNE
Such thatpij=pji;qij=qji, the main advantage is simpling the gradient
@C
@yi
= 2
X
j
(pijqij)(yiyj)
However, in practice we symmetrize (or average) the conditionals
pij=
p
jji+p
ijj
2N
Set the bandwidthisuch that the conditional has a xed perplexity
(eective number of neighbors)Perp(Pi) = 2
H(Pi)
, typical value is about 5
to 50
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 14 / 33
t-Distributed Stochastic Neighbor Embedding
t-Distribution
Use heavier tail distribution than Gaussian in low-dim space, we choose
qij/(1 +jjyiyjjj
2
)
1
Then the gradient could be
@C
@yi
= 4
X
j6=i
(pijqij)(1 +jjyiyjjj
2
)
1
(yiyj)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 15 / 33
t-Distributed Stochastic Neighbor Embedding
t-Distributed Stochastic Neighbor Embedding
Similarity of datapoints in High Dimension
pij=
exp(jjxixjjj
2
=2
2
)
P
k6=l
exp(jjxlxkjj
2
=2
2
)
Similarity of datapoints in Low Dimension
qij=
(1 +jjyiyjjj
2
)
1
P
k6=l
(1 +jjykyljj
2
)
1
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 16 / 33
t-Distributed Stochastic Neighbor Embedding
t-Distributed Stochastic Neighbor Embedding
Cost function
C=KL(PjjQ) =
X
i
X
j
pijlog
pij
qij
Largepijmodeled by smallqij: Large penalty
Smallpijmodeled by largeqij: Small penalty
t-SNE mainly preserves local similarity structure of the data
Gradient
@C
@yi
= 4
X
j6=i
(pijqij)(1 +jjyiyjjj
2
)
1
(yiyj)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 17 / 33
t-Distributed Stochastic Neighbor Embedding
Gradient Interpretation
Pairwise Euclidean distance between two points in the high-dim and in
low-dim data representation
Figure :
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 18 / 33
t-Distributed Stochastic Neighbor Embedding
Gradient Interpretation
We can interpret the t-SNE gradient as a simulation of an N-body system
@C
@yi
= 4
X
j6=i
(pijqij)(1 +jjyiyjjj
2
)
1
(yiyj)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 19 / 33
t-Distributed Stochastic Neighbor Embedding
Gradient Interpretation
We can interpret the t-SNE gradient as a simulation of an N-body system
Displacement
(yiyj)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 20 / 33
t-Distributed Stochastic Neighbor Embedding
Gradient Interpretation
We can interpret the t-SNE gradient as a simulation of an N-body system
Exertion / Compression
(pijqij)(1 +jjyiyjjj
2
)
1
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 21 / 33
t-Distributed Stochastic Neighbor Embedding
Gradient Interpretation
We can interpret the t-SNE gradient as a simulation of an N-body system
N-Body, summation
@C
@yi
= 4
X
j6=i
(pijqij)(1 +jjyiyjjj
2
)
1
(yiyj)
Reduce Complexity fromO(N
2
) toO(NlogN) via Barnes Hut
(tree-based) algorithm
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 22 / 33
Experiment Setup and Results
Experiment & Results
MNIST
Randomly selected 6,000 images
2828 = 784 pixels
Olivetti faces
400 images (10 per individual)
92112 = 10;304 pixels
COIL-20
20 dierent objects and 72 equally spaced orientations, yielding a
total of 1,440 images
3232 = 1024 pixels
Start by using PCA to reduce the dimensionality of the data to 30
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 23 / 33
Experiment Setup and Results
Experiment & Results
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 24 / 33
Experiment Setup and Results
MNIST t-SNE
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 25 / 33
Experiment Setup and Results
MNIST Sammon
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 26 / 33
Experiment Setup and Results
MNIST Isomap
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 27 / 33
Experiment Setup and Results
MNIST LLE
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 28 / 33
Experiment Setup and Results
Olivetti faces
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 29 / 33
Experiment Setup and Results
COIL-20
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 30 / 33
Code and Web Resources
Web Resources
Google: t-sne
Link: http://homepage.tudelft.nl/19j49/t-SNE.html
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 31 / 33
Code and Web Resources
Source Codes
t-SNE (Matlab, CUDA, Binary, Python, Torch, Julia, R and
JavaScript)
Parametric t-SNE (Matlab)
Barnes-Hut-SNE (with C++, Matlab, Python, Torch, and R
wrappers)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 32 / 33
Code and Web Resources
Thanks for your patience
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 33 / 33