High Dimensional Data Visualization using t-SNE

ssuserb667a8 7,284 views 33 slides Oct 30, 2014
Slide 1
Slide 1 of 33
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33

About This Presentation

Review of the t-SNE algorithm which helps visualizing the high dimensional data on manifold by projecting them onto 2D or 3D space with metric preserving.


Slide Content

Visualizing Data using t-SNE
Laurens van der Maaten and Georey Hinton, JMLR 2008
Kevin Zhao
[email protected]
October 30, 2014
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 1 / 33

Overview
1
Overview
2
t-Distributed Stochastic Neighbor Embedding
3
Experiment Setup and Results
4
Code and Web Resources
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 2 / 33

Introduction
Overview
We are given a collection ofNhigh-dimensional objectsx1; :::xN
How can we get a feel for how these objects are arranged in the data
space?
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 3 / 33

Introduction
Principal Components Analysis
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 4 / 33

Introduction
Principal Components Analysis
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 5 / 33

Introduction
Swiss Roll
PCA is mainly concerned dimensionality, with preserving when large
pairwise distances in the map
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 6 / 33

t-Distributed Stochastic Neighbor Embedding
Introduction
Distance Perservation
Neighbor Perservation
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 7 / 33

t-Distributed Stochastic Neighbor Embedding
Introduction
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 8 / 33

t-Distributed Stochastic Neighbor Embedding
Introduction
Preserve the neighborhood
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 9 / 33

t-Distributed Stochastic Neighbor Embedding
Introduction
Measure pairwise similarities between high-dimensional and
low-dimensonal objects
p
jji=
exp(jjxixjjj
2
=2
2
i
)
P
k6=i
exp(jjxixkjj
2
=2
2
i
)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 10 / 33

t-Distributed Stochastic Neighbor Embedding
Stochastic Neighbor Embedding
Converting the high-dimensional Euclidean distances into conditional
probabilities that represent similarities
Similarity of datapoints in High Dimension
p
jji=
exp(jjxixjjj
2
=2
2
i
)
P
k6=i
exp(jjxixkjj
2
=2
2
i
)
Similarity of datapoints in Low Dimension
q
jji=
exp(jjyiyjjj
2
)
P
k6=i
exp(jjyiykjj
2
)
Cost function
C=
X
i
KL(PijjQi) =
X
i
X
j
p
jjilog
p
jji
q
jji
Minimize the cost function using gradient descent
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 11 / 33

t-Distributed Stochastic Neighbor Embedding
Stochastic Neighbor Embedding
Gradient has a surprisingly simple form
@C
@yi
=
X
j6=i
(p
jjiq
jji+p
ijjq
ijj)(yiyj)
The gradient update with momentum term is given by
Y
(t)
=Y
(t1)
+
@C
@yi
+(t)(Y
(t1)
Y
(t2)
)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 12 / 33

t-Distributed Stochastic Neighbor Embedding
Symmetric SNE
Minimize the sum of the KL divergences between the conditional
probabilities
C=
X
i
KL(PijjQi) =
X
i
X
j
p
jjilog
p
jji
q
jji
Minimize a single KL divergence between a joint probability
distribution
C=KL(PjjQ) =
X
i
X
j6=i
pijlog
pij
qij
The obvious way to redene the pairwise similarities is
pij=
exp(jjxixjjj
2
=2
2
)
P
k6=l
exp(jjxlxkjj
2
=2
2
)
qij=
exp(jjyiyjjj
2
)
P
k6=l
exp(jjylykjj
2
)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 13 / 33

t-Distributed Stochastic Neighbor Embedding
Symmetric SNE
Such thatpij=pji;qij=qji, the main advantage is simpling the gradient
@C
@yi
= 2
X
j
(pijqij)(yiyj)
However, in practice we symmetrize (or average) the conditionals
pij=
p
jji+p
ijj
2N
Set the bandwidthisuch that the conditional has a xed perplexity
(eective number of neighbors)Perp(Pi) = 2
H(Pi)
, typical value is about 5
to 50
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 14 / 33

t-Distributed Stochastic Neighbor Embedding
t-Distribution
Use heavier tail distribution than Gaussian in low-dim space, we choose
qij/(1 +jjyiyjjj
2
)
1
Then the gradient could be
@C
@yi
= 4
X
j6=i
(pijqij)(1 +jjyiyjjj
2
)
1
(yiyj)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 15 / 33

t-Distributed Stochastic Neighbor Embedding
t-Distributed Stochastic Neighbor Embedding
Similarity of datapoints in High Dimension
pij=
exp(jjxixjjj
2
=2
2
)
P
k6=l
exp(jjxlxkjj
2
=2
2
)
Similarity of datapoints in Low Dimension
qij=
(1 +jjyiyjjj
2
)
1
P
k6=l
(1 +jjykyljj
2
)
1
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 16 / 33

t-Distributed Stochastic Neighbor Embedding
t-Distributed Stochastic Neighbor Embedding
Cost function
C=KL(PjjQ) =
X
i
X
j
pijlog
pij
qij
Largepijmodeled by smallqij: Large penalty
Smallpijmodeled by largeqij: Small penalty
t-SNE mainly preserves local similarity structure of the data
Gradient
@C
@yi
= 4
X
j6=i
(pijqij)(1 +jjyiyjjj
2
)
1
(yiyj)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 17 / 33

t-Distributed Stochastic Neighbor Embedding
Gradient Interpretation
Pairwise Euclidean distance between two points in the high-dim and in
low-dim data representation
Figure :
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 18 / 33

t-Distributed Stochastic Neighbor Embedding
Gradient Interpretation
We can interpret the t-SNE gradient as a simulation of an N-body system
@C
@yi
= 4
X
j6=i
(pijqij)(1 +jjyiyjjj
2
)
1
(yiyj)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 19 / 33

t-Distributed Stochastic Neighbor Embedding
Gradient Interpretation
We can interpret the t-SNE gradient as a simulation of an N-body system
Displacement
(yiyj)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 20 / 33

t-Distributed Stochastic Neighbor Embedding
Gradient Interpretation
We can interpret the t-SNE gradient as a simulation of an N-body system
Exertion / Compression
(pijqij)(1 +jjyiyjjj
2
)
1
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 21 / 33

t-Distributed Stochastic Neighbor Embedding
Gradient Interpretation
We can interpret the t-SNE gradient as a simulation of an N-body system
N-Body, summation
@C
@yi
= 4
X
j6=i
(pijqij)(1 +jjyiyjjj
2
)
1
(yiyj)
Reduce Complexity fromO(N
2
) toO(NlogN) via Barnes Hut
(tree-based) algorithm
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 22 / 33

Experiment Setup and Results
Experiment & Results
MNIST
Randomly selected 6,000 images
2828 = 784 pixels
Olivetti faces
400 images (10 per individual)
92112 = 10;304 pixels
COIL-20
20 dierent objects and 72 equally spaced orientations, yielding a
total of 1,440 images
3232 = 1024 pixels
Start by using PCA to reduce the dimensionality of the data to 30
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 23 / 33

Experiment Setup and Results
Experiment & Results
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 24 / 33

Experiment Setup and Results
MNIST t-SNE
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 25 / 33

Experiment Setup and Results
MNIST Sammon
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 26 / 33

Experiment Setup and Results
MNIST Isomap
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 27 / 33

Experiment Setup and Results
MNIST LLE
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 28 / 33

Experiment Setup and Results
Olivetti faces
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 29 / 33

Experiment Setup and Results
COIL-20
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 30 / 33

Code and Web Resources
Web Resources
Google: t-sne
Link: http://homepage.tudelft.nl/19j49/t-SNE.html
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 31 / 33

Code and Web Resources
Source Codes
t-SNE (Matlab, CUDA, Binary, Python, Torch, Julia, R and
JavaScript)
Parametric t-SNE (Matlab)
Barnes-Hut-SNE (with C++, Matlab, Python, Torch, and R
wrappers)
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 32 / 33

Code and Web Resources
Thanks for your patience
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 33 / 33