PR 103: t-SNE

Visualizing Data usingt-SNE

Credits
▷Hyeongmin Lee, MVPLAB, Yonsei Univ
▷https://www.slideshare.net/ssuser06e0c5/visualizing-data-using-tsne-73621033

t-SNE:
Student TDistributed-Stochastic Neighbor Embedding
▷Nonlinear Dimension Reduction for Visualization (2-D or 3-D)
▷Advance Version of SNE (G. Hinton, NIPS 2003)
▷Gradient-based Machine Learning Algorithm

Dimension Reduction

RealWorld Data = Very High Dimension
= 3145728 Dimension per Sample (ProGAN)

Manifold Hypothesis –Dimension Reduction
Ref) PR-010, PR-101
Slide from H.Lee (MVPLAB)

History of Dimension Reduction
Slide from H.Lee (MVPLAB)
Linear
▷Principal Component Analysis (1901)
Non-Linear
▷Multidimentional Scaling (1964)
▷Sammon Mapping (1969)
▷IsoMap (2000)
▷Locally Linear Embedding (2000)
▷Stochasitic Neighbor Embedding (2002)

Swiss Roll Data
Slide from H.Lee (MVPLAB)

IsoMap
Slide from H.Lee (MVPLAB)

Locally Linear Embedding
Slide from H.Lee (MVPLAB)

Problem?
Good at Local Representation = Poor at Global Representation
Good at Swiss Roll = Poor at Real Data

Stochastic Neighbor Embedding (SNE)

UpdateLow-DimensionalMapping
by Considering Pairwise Relations in High-Dimension
Iterative Update
Cost Function
Label
Prediction

Distance
Similarity
&#3627408477;
&#3627408471;|&#3627408470;=
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408471;
2
2??????
&#3627408470;
2
σ
&#3627408472;≠&#3627408470;
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408472;
2
2??????
&#3627408470;
2
??????

Distance
Similarity
??????
??????
&#3627408478;
&#3627408471;|&#3627408470;=
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
σ
&#3627408472;≠&#3627408470;
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408472;
2

Distance
Similarity
??????
&#3627408478;
&#3627408471;|&#3627408470;=
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
σ
&#3627408472;≠&#3627408470;
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408472;
2
??????

??????
????????????
&#3627408477;
&#3627408471;|&#3627408470;
&#3627408478;
&#3627408471;|&#3627408470;
&#3627408478;
&#3627408471;|&#3627408470;

??????=&#3627408446;&#3627408447;(&#3627408451;|&#3627408452;=෍
&#3627408470;
෍
&#3627408471;
&#3627408477;
&#3627408471;|&#3627408470;log
&#3627408477;
&#3627408471;|&#3627408470;
&#3627408478;
&#3627408471;|&#3627408470;
??????
??????
for Every Data
????????????
??????&#3627408486;
&#3627408470;
=2෍
&#3627408471;
(&#3627408477;
&#3627408471;|&#3627408470;−&#3627408478;
&#3627408471;|&#3627408470;+&#3627408477;
&#3627408470;|&#3627408471;−&#3627408478;
&#3627408470;|&#3627408471;)(&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;)
&#3627408477;
&#3627408471;|&#3627408470;&#3627408478;
&#3627408471;|&#3627408470;

??????=෍
&#3627408470;
෍
&#3627408471;
&#3627408477;
&#3627408471;|&#3627408470;log
&#3627408477;
&#3627408471;|&#3627408470;
&#3627408478;
&#3627408471;|&#3627408470;
KL-Divergenceis Asymmetric
If High-D becomes Smaller
Low-D should Smaller
For Equal Cost

Appendix A: Gradient of SNE
??????=෍
&#3627408470;
෍
&#3627408471;
&#3627408477;
&#3627408471;|&#3627408470;log
&#3627408477;
&#3627408471;|&#3627408470;
&#3627408478;
&#3627408471;|&#3627408470;&#3627408478;
&#3627408471;|&#3627408470;=
??????
−&#3627408486;&#3627408470;−&#3627408486;&#3627408471;
2
σ
&#3627408472;≠&#3627408470;
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408472;
2
??????
12…i…N
1
2
…
i 0
…
N
????????????
??????&#3627408486;
&#3627408470;
=−෍
&#3627408471;
&#3627408477;
&#3627408471;|&#3627408470;log&#3627408478;
&#3627408471;|&#3627408470;−෍
&#3627408471;
&#3627408477;
&#3627408470;|&#3627408471;log&#3627408478;
&#3627408470;|&#3627408471;
??????σ
&#3627408471;
&#3627408477;
&#3627408471;|&#3627408470;log&#3627408478;
&#3627408471;|&#3627408470;
??????&#3627408486;
&#3627408470;
=෍
&#3627408471;
&#3627408477;
&#3627408471;|&#3627408470;
??????log&#3627408478;
&#3627408471;|&#3627408470;
??????&#3627408486;
&#3627408470;
=෍
&#3627408471;
&#3627408477;
&#3627408471;|&#3627408470;(
??????log&#3627408478;
&#3627408471;|&#3627408470;??????
??????&#3627408486;
&#3627408470;
−
??????log??????
??????&#3627408486;
&#3627408470;
)
=෍
&#3627408471;
&#3627408477;
&#3627408471;|&#3627408470;(
1
&#3627408478;
&#3627408471;|&#3627408470;??????
??????&#3627408478;
&#3627408471;|&#3627408470;??????
??????&#3627408486;
&#3627408470;
−
1
??????
????????????
??????&#3627408486;
&#3627408470;
)=෍
&#3627408471;
&#3627408477;
&#3627408471;|&#3627408470;(
1
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
??????−
1
??????
????????????
??????&#3627408486;
&#3627408470;
)
=෍
&#3627408471;
&#3627408477;
&#3627408471;|&#3627408470;??????−෍
&#3627408471;
1
??????
෍
&#3627408472;≠&#3627408470;
&#3627408477;
&#3627408472;|&#3627408470;??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
??????=−2෍
&#3627408471;
(&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;)(&#3627408477;
&#3627408471;|&#3627408470;−&#3627408478;
&#3627408471;|&#3627408470;)
&#3627408478;
&#3627408471;|&#3627408470;??????=??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
??????&#3627408478;
&#3627408471;|&#3627408470;??????
??????&#3627408486;
&#3627408470;
=??????=−2(&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;)

t-Distributed SNE

Problem of SNE t-SNE
▷Hard to Optimize Symmetric Probability
▷Crowding Problem Student t-Distribution

SNE Symmetric SNE t-SNE
Prob. In
High-D
&#3627408477;
&#3627408471;|&#3627408470;=
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408471;
2
2??????
&#3627408470;
2
σ
&#3627408472;≠&#3627408470;
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408472;
2
2??????
&#3627408470;
2
&#3627408477;
&#3627408470;&#3627408471;=
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408471;
2
2??????
2
σ
&#3627408472;≠&#3627408473;
??????
−
&#3627408485;
&#3627408472;−&#3627408485;
&#3627408473;
2
2??????
2
&#3627408477;
&#3627408470;&#3627408471;=
&#3627408477;
&#3627408471;|&#3627408470;+&#3627408477;
&#3627408470;|&#3627408471;
2??????
Prob. In
Low-D
&#3627408478;
&#3627408471;|&#3627408470;=
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
σ
&#3627408472;≠&#3627408470;
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408472;
2 &#3627408478;
&#3627408470;&#3627408471;=
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
σ
&#3627408472;≠&#3627408473;
??????
−&#3627408486;
&#3627408472;−&#3627408486;
&#3627408473;
2 &#3627408478;
&#3627408470;&#3627408471;=
1+&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
−1
σ
&#3627408472;≠&#3627408473;
1+&#3627408486;
&#3627408472;−&#3627408486;
&#3627408473;
2−1
Cost
Function
??????=෍
&#3627408470;
෍
&#3627408471;
&#3627408477;
&#3627408471;|&#3627408470;log
&#3627408477;
&#3627408471;|&#3627408470;
&#3627408478;
&#3627408471;|&#3627408470;
??????=෍
&#3627408470;
෍
&#3627408471;
&#3627408477;
&#3627408470;&#3627408471;log
&#3627408477;
&#3627408470;&#3627408471;
&#3627408478;
&#3627408470;&#3627408471;
Gradient of
Cost
Function
2෍
&#3627408471;
(&#3627408477;
&#3627408471;|&#3627408470;−&#3627408478;
&#3627408471;|&#3627408470;+&#3627408477;
&#3627408470;|&#3627408471;−&#3627408478;
&#3627408470;|&#3627408471;)(&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;) 4෍
&#3627408471;
(&#3627408477;
&#3627408470;&#3627408471;−&#3627408478;
&#3627408470;&#3627408471;)(&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;) 4෍
&#3627408471;
&#3627408477;
&#3627408470;&#3627408471;−&#3627408478;
&#3627408470;&#3627408471;&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;1+&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
−1

SNE t-SNE
▷Hard to Optimize Symmetric Probability (Simpler Gradient)
&#3627408477;
&#3627408470;&#3627408471;=
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408471;
2
2??????
2
σ
&#3627408472;≠&#3627408473;
??????
−
&#3627408485;&#3627408472;−&#3627408485;&#3627408473;
2
2??????
2
&#3627408478;
&#3627408470;&#3627408471;=
??????
−&#3627408486;&#3627408470;−&#3627408486;&#3627408471;
2
σ
&#3627408472;≠&#3627408473;
??????
−&#3627408486;&#3627408472;−&#3627408486;&#3627408473;
2
Single Scale
All Other Pairs

SNE t-SNE
▷Hard to Optimize Symmetric Probability (Simpler Gradient)
&#3627408477;
&#3627408470;&#3627408471;=
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408471;
2
2??????
2
σ
&#3627408472;≠&#3627408473;
??????
−
&#3627408485;&#3627408472;−&#3627408485;&#3627408473;
2
2??????
2
&#3627408478;
&#3627408470;&#3627408471;=
??????
−&#3627408486;&#3627408470;−&#3627408486;&#3627408471;
2
σ
&#3627408472;≠&#3627408473;
??????
−&#3627408486;&#3627408472;−&#3627408486;&#3627408473;
2
Single Scale
All Other Pairs
Outlier = Very Small &#3627408477;
&#3627408470;&#3627408471;= No Contribution to the Cost

SNE t-SNE
▷Hard to Optimize Symmetric Probability (Simpler Gradient)
&#3627408477;
&#3627408470;&#3627408471;=
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408471;
2
2??????
2
σ
&#3627408472;≠&#3627408473;
??????
−
&#3627408485;&#3627408472;−&#3627408485;&#3627408473;
2
2??????
2
&#3627408478;
&#3627408470;&#3627408471;=
??????
−&#3627408486;&#3627408470;−&#3627408486;&#3627408471;
2
σ
&#3627408472;≠&#3627408473;
??????
−&#3627408486;&#3627408472;−&#3627408486;&#3627408473;
2
All Other Pairs
&#3627408477;
&#3627408470;&#3627408471;=
&#3627408477;
&#3627408471;|&#3627408470;+&#3627408477;
&#3627408470;|&#3627408471;
2??????
Ensures that σ
&#3627408471;&#3627408477;
&#3627408470;&#3627408471;>
1
2??????
for all data, contributes to the cost

SNE Symmetric SNE t-SNE
Prob. In
High-D
&#3627408477;
&#3627408471;|&#3627408470;=
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408471;
2
2??????
&#3627408470;
2
σ
&#3627408472;≠&#3627408470;
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408472;
2
2??????
&#3627408470;
2
&#3627408477;
&#3627408470;&#3627408471;=
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408471;
2
2??????
2
σ
&#3627408472;≠&#3627408473;
??????
−
&#3627408485;
&#3627408472;−&#3627408485;
&#3627408473;
2
2??????
2
&#3627408477;
&#3627408470;&#3627408471;=
&#3627408477;
&#3627408471;|&#3627408470;+&#3627408477;
&#3627408470;|&#3627408471;
2??????
Prob. In
Low-D
&#3627408478;
&#3627408471;|&#3627408470;=
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
σ
&#3627408472;≠&#3627408470;
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408472;
2 &#3627408478;
&#3627408470;&#3627408471;=
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
σ
&#3627408472;≠&#3627408473;
??????
−&#3627408486;
&#3627408472;−&#3627408486;
&#3627408473;
2 &#3627408478;
&#3627408470;&#3627408471;=
1+&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
−1
σ
&#3627408472;≠&#3627408473;
1+&#3627408486;
&#3627408472;−&#3627408486;
&#3627408473;
2−1
Cost
Function
??????=෍
&#3627408470;
෍
&#3627408471;
&#3627408477;
&#3627408471;|&#3627408470;log
&#3627408477;
&#3627408471;|&#3627408470;
&#3627408478;
&#3627408471;|&#3627408470;
??????=෍
&#3627408470;
෍
&#3627408471;
&#3627408477;
&#3627408470;&#3627408471;log
&#3627408477;
&#3627408470;&#3627408471;
&#3627408478;
&#3627408470;&#3627408471;
Gradient of
Cost
Function
2෍
&#3627408471;
(&#3627408477;
&#3627408471;|&#3627408470;−&#3627408478;
&#3627408471;|&#3627408470;+&#3627408477;
&#3627408470;|&#3627408471;−&#3627408478;
&#3627408470;|&#3627408471;)(&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;) 4෍
&#3627408471;
(&#3627408477;
&#3627408470;&#3627408471;−&#3627408478;
&#3627408470;&#3627408471;)(&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;) 4෍
&#3627408471;
&#3627408477;
&#3627408470;&#3627408471;−&#3627408478;
&#3627408470;&#3627408471;&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;1+&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
−1

SNE t-SNE
▷Crowding Problem Student t-Distribution
Slide from H.Lee (MVPLAB)
Solution?
▷Close Points Closer
▷Moderate Points More Far Away

SNE t-SNE
▷Crowding Problem Student t-Distribution
Student t-Distribution in Low-Dimension

SNE Symmetric SNE t-SNE
Prob. In
High-D
&#3627408477;
&#3627408471;|&#3627408470;=
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408471;
2
2??????
&#3627408470;
2
σ
&#3627408472;≠&#3627408470;
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408472;
2
2??????
&#3627408470;
2
&#3627408477;
&#3627408470;&#3627408471;=
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408471;
2
2??????
2
σ
&#3627408472;≠&#3627408473;
??????
−
&#3627408485;
&#3627408472;−&#3627408485;
&#3627408473;
2
2??????
2
&#3627408477;
&#3627408470;&#3627408471;=
&#3627408477;
&#3627408471;|&#3627408470;+&#3627408477;
&#3627408470;|&#3627408471;
2??????
Prob. In
Low-D
&#3627408478;
&#3627408471;|&#3627408470;=
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
σ
&#3627408472;≠&#3627408470;
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408472;
2 &#3627408478;
&#3627408470;&#3627408471;=
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
σ
&#3627408472;≠&#3627408473;
??????
−&#3627408486;
&#3627408472;−&#3627408486;
&#3627408473;
2 &#3627408478;
&#3627408470;&#3627408471;=
1+&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
−1
σ
&#3627408472;≠&#3627408473;
1+&#3627408486;
&#3627408472;−&#3627408486;
&#3627408473;
2−1
Cost
Function
??????=෍
&#3627408470;
෍
&#3627408471;
&#3627408477;
&#3627408471;|&#3627408470;log
&#3627408477;
&#3627408471;|&#3627408470;
&#3627408478;
&#3627408471;|&#3627408470;
??????=෍
&#3627408470;
෍
&#3627408471;
&#3627408477;
&#3627408470;&#3627408471;log
&#3627408477;
&#3627408470;&#3627408471;
&#3627408478;
&#3627408470;&#3627408471;
Gradient of
Cost
Function
2෍
&#3627408471;
(&#3627408477;
&#3627408471;|&#3627408470;−&#3627408478;
&#3627408471;|&#3627408470;+&#3627408477;
&#3627408470;|&#3627408471;−&#3627408478;
&#3627408470;|&#3627408471;)(&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;) 4෍
&#3627408471;
(&#3627408477;
&#3627408470;&#3627408471;−&#3627408478;
&#3627408470;&#3627408471;)(&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;) 4෍
&#3627408471;
&#3627408477;
&#3627408470;&#3627408471;−&#3627408478;
&#3627408470;&#3627408471;&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;1+&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
−1

SNE t-SNE
▷Crowding Problem Student t-Distribution
Student t-Distribution in Low-Dimension
This High-Dimension Data

SNE t-SNE
▷Crowding Problem Student t-Distribution
Student t-Distribution in Low-Dimension
This High-Dimension Data
Loses its Probability
Closer

SNE t-SNE
▷Crowding Problem Student t-Distribution
Student t-Distribution in Low-Dimension
This High-Dimension Data

SNE t-SNE
▷Crowding Problem Student t-Distribution
Student t-Distribution in Low-Dimension
This High-Dimension Data
Gains its Probability
Morefaraway

High-D Low-D &#3627408477;
&#3627408470;&#3627408471;&#3627408478;
&#3627408470;&#3627408471;(&#3627408477;
&#3627408470;&#3627408471;−&#3627408478;
&#3627408470;&#3627408471;)(&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;) Gradient
Large Large 1 1 0 Large 0
Small Small 0 0 0 Small 0
Small Large 0 1 -1 Large Large
Attraction
Large Small 1 0 1 Small Small
Repulsion
Small Replusion

Adding Slight Repulsion (Uniform Dist. in &#3627408478;
&#3627408470;&#3627408471;)
Often Not the Case
Low-D Initialized by Gaussian

High-D Low-D &#3627408477;
&#3627408470;&#3627408471;&#3627408478;
&#3627408470;&#3627408471;(&#3627408477;
&#3627408470;&#3627408471;−&#3627408478;
&#3627408470;&#3627408471;) (&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;) 1+&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
−1
Gradient
Large Large 1 1 0 Large Small 0
Small Small 0 0 0 Small Large 0
Small Large 0 1 -1 Large Small Attraction
Large Small 1 0 1 Small Large Repulsion
Strong Replusion

SNE Symmetric SNE t-SNE
Prob. In
High-D
&#3627408477;
&#3627408471;|&#3627408470;=
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408471;
2
2??????
&#3627408470;
2
σ
&#3627408472;≠&#3627408470;
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408472;
2
2??????
&#3627408470;
2
&#3627408477;
&#3627408470;&#3627408471;=
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408471;
2
2??????
2
σ
&#3627408472;≠&#3627408473;
??????
−
&#3627408485;
&#3627408472;−&#3627408485;
&#3627408473;
2
2??????
2
&#3627408477;
&#3627408470;&#3627408471;=
&#3627408477;
&#3627408471;|&#3627408470;+&#3627408477;
&#3627408470;|&#3627408471;
2??????
Prob. In
Low-D
&#3627408478;
&#3627408471;|&#3627408470;=
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
σ
&#3627408472;≠&#3627408470;
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408472;
2 &#3627408478;
&#3627408470;&#3627408471;=
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
σ
&#3627408472;≠&#3627408473;
??????
−&#3627408486;
&#3627408472;−&#3627408486;
&#3627408473;
2 &#3627408478;
&#3627408470;&#3627408471;=
1+&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
−1
σ
&#3627408472;≠&#3627408473;
1+&#3627408486;
&#3627408472;−&#3627408486;
&#3627408473;
2−1
Cost
Function
??????=෍
&#3627408470;
෍
&#3627408471;
&#3627408477;
&#3627408471;|&#3627408470;log
&#3627408477;
&#3627408471;|&#3627408470;
&#3627408478;
&#3627408471;|&#3627408470;
??????=෍
&#3627408470;
෍
&#3627408471;
&#3627408477;
&#3627408470;&#3627408471;log
&#3627408477;
&#3627408470;&#3627408471;
&#3627408478;
&#3627408470;&#3627408471;
Gradient of
Cost
Function
2෍
&#3627408471;
(&#3627408477;
&#3627408471;|&#3627408470;−&#3627408478;
&#3627408471;|&#3627408470;+&#3627408477;
&#3627408470;|&#3627408471;−&#3627408478;
&#3627408470;|&#3627408471;)(&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;) 4෍
&#3627408471;
(&#3627408477;
&#3627408470;&#3627408471;−&#3627408478;
&#3627408470;&#3627408471;)(&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;) 4෍
&#3627408471;
&#3627408477;
&#3627408470;&#3627408471;−&#3627408478;
&#3627408470;&#3627408471;&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;1+&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
−1

Effects of t-Distribution
Close Points Closer

Results & Add-On

Slide from H.Lee (MVPLAB)

SNE Symmetric SNE t-SNE
Prob. In
High-D
&#3627408477;
&#3627408471;|&#3627408470;=
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408471;
2
2??????
&#3627408470;
2
σ
&#3627408472;≠&#3627408470;
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408472;
2
2??????
&#3627408470;
2
&#3627408477;
&#3627408470;&#3627408471;=
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408471;
2
2??????
2
σ
&#3627408472;≠&#3627408473;
??????
−
&#3627408485;
&#3627408472;−&#3627408485;
&#3627408473;
2
2??????
2
&#3627408477;
&#3627408470;&#3627408471;=
&#3627408477;
&#3627408471;|&#3627408470;+&#3627408477;
&#3627408470;|&#3627408471;
2??????
Prob. In
Low-D
&#3627408478;
&#3627408471;|&#3627408470;=
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
σ
&#3627408472;≠&#3627408470;
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408472;
2 &#3627408478;
&#3627408470;&#3627408471;=
??????
−&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
σ
&#3627408472;≠&#3627408473;
??????
−&#3627408486;
&#3627408472;−&#3627408486;
&#3627408473;
2 &#3627408478;
&#3627408470;&#3627408471;=
1+&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
−1
σ
&#3627408472;≠&#3627408473;
1+&#3627408486;
&#3627408472;−&#3627408486;
&#3627408473;
2−1
Cost
Function
??????=෍
&#3627408470;
෍
&#3627408471;
&#3627408477;
&#3627408471;|&#3627408470;log
&#3627408477;
&#3627408471;|&#3627408470;
&#3627408478;
&#3627408471;|&#3627408470;
??????=෍
&#3627408470;
෍
&#3627408471;
&#3627408477;
&#3627408470;&#3627408471;log
&#3627408477;
&#3627408470;&#3627408471;
&#3627408478;
&#3627408470;&#3627408471;
Gradient of
Cost
Function
2෍
&#3627408471;
(&#3627408477;
&#3627408471;|&#3627408470;−&#3627408478;
&#3627408471;|&#3627408470;+&#3627408477;
&#3627408470;|&#3627408471;−&#3627408478;
&#3627408470;|&#3627408471;)(&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;) 4෍
&#3627408471;
(&#3627408477;
&#3627408470;&#3627408471;−&#3627408478;
&#3627408470;&#3627408471;)(&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;) 4෍
&#3627408471;
&#3627408477;
&#3627408470;&#3627408471;−&#3627408478;
&#3627408470;&#3627408471;&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;1+&#3627408486;
&#3627408470;−&#3627408486;
&#3627408471;
2
−1

Set ??????
&#3627408470;
2
Calculate &#3627408451;??????&#3627408479;&#3627408477;(&#3627408451;
&#3627408470;)&#3627408451;??????&#3627408479;&#3627408477;&#3627408451;
&#3627408470;=&#3627408451;??????&#3627408479;&#3627408477;????????????&#3627408485;????????????&#3627408486;?
&#3627408477;
&#3627408471;|&#3627408470;=
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408471;
2
2??????
&#3627408470;
2
σ
&#3627408472;≠&#3627408470;
??????
−
&#3627408485;
&#3627408470;−&#3627408485;
&#3627408472;
2
2??????
&#3627408470;
2
&#3627408451;??????&#3627408479;&#3627408477;&#3627408451;
&#3627408470;=2
−σ
&#3627408471;
??????
&#3627408471;|&#3627408470;log2??????
&#3627408471;|&#3627408470;
Hyper-parameter: Perplexity
Paper Suggests 5~50
Perplexity = Smoothed Measure of the Effective Number of Neighbors
Perplexity = Balancing between Local and Global Aspects of Data

Reference
▷How to use t-SNE Effectively, https://distill.pub/2016/misread-tsne/
▷Automatic Selection of t-SNE Perplexity, ICML17 AutoML Workshop
??????&#3627408451;??????&#3627408479;&#3627408477;=2&#3627408446;&#3627408447;(&#3627408451;|&#3627408452;+
log??????&#3627408451;??????&#3627408479;&#3627408477;
??????

Thank You

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

PR 103: t-SNE

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 43

Slide 44

Slide 46

Slide 47

Slide 48

Slide 49

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......