Speed-accuracy trade-off for the diffusion models

sosukeito 536 views 35 slides Jul 19, 2024
Slide 1
Slide 1 of 35
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35

About This Presentation

The presentation of Frontiers in Nonequilibrium Physics at YITP about the preprint https://arxiv.org/abs/2407.04495.
Thermodynamic trade-off between the accuracy of the data generation and the diffusion speed for the diffusion models. We show thermodynamically that the optimal transport provides the...


Slide Content

0QUJNBMUSBOTQPSUBOE
UIFSNPEZOBNJDTGPSUIFMFBSOJOH
"QQMJDBUJPOUPUIFEJffVTJPONPEFM
4PTVLF*UP
'SPOUJFSTJO/POFRVJMJCSJVN1IZTJDT:*51+VMUI

3FGFSFODFBOEDPMMBCPSBUPST
K. Ikeda, T. Uda, D. Okanohara and SI, arXiv:2407.04495.
Main topic (Diffusion model)
Related topic (Thermodynamics and optimal transport)
SI, Information geometry, Information Geometry 7.Suppl 1, 441-483 (2024).
M. Nakazato and SI. Phys. Rev. Res. 3, 043093 (2021).
A. Dechant, S-I Sasa and SI. Phys. Rev. Res. 4, L012034 (2022).
A. Dechant, S-I Sasa and SI, Phys. Rev. E. 106, 024125 (2022).
K. Yoshimura, A. Kolchinsky, A. Dechant and SI. Phys. Rev. Res. 5, 013017 (2023).
Y. Fujimoto and SI, Phys. Rev. Res. 6, 013023 (2024).
K. Yoshimura and SI, Phys. Rev. Res. 6, L022057 (2024).
A. Kolchinsky, A. Dechant, K. Yoshimura and SI, arXiv:2206.14599.
R. Nagayama, K. Yoshimura, A. Kolchinsky and SI. arXiv: 2311.16569.
D. Sekizawa, SI, M. Oizumi, arXiv:2312.03489.
Kotaro Ikeda (UTokyo)Tomoya Uda (UTokyo)Daisuke Okanohara (Preferred Networks Inc.)
Collaborators:
yLab members (+alumni): Muka Nakazato, Kohei Yoshimura, Yuma Fujimoto, Artemy Kolchinsky, Ryan Nagayama
yAndreas Dechant (KyotoU), Shin-ichi Sasa (KyotoU), Daiki Sekizawa (UTokyo), Masafumi Oizumi (UTokyo)

0VUMJOF
w*OUSPEVDUJPO(FOFSBUJWFNPEFMTBOEEJffVTJPONPEFMT
w4UPDIBTUJDUIFSNPEZOBNJDTCBTFEPOPQUJNBMUSBOTQPSU
w.BJOSFTVMUT4QFFEBDDVSBDZUSBEFPffGPSUIFEJffVTJPONPEFMT
K. Ikeda, T. Uda, D. Okanohara and SI, arXiv:2407.04495.

(FOFSBUJWFNPEFM
Stable diffusion (2022)
w5FYUUPJNBHFNPEFM
w(FOFSBUJWFBSUJfiDJBMJOUFMMJHFODF
%SFBNTUVEJPCZTUBCJMJUZBJIUUQTCFUBESFBNTUVEJPBJHFOFSBUF
w5IFEJffVTJPONPEFMT

(FOFSBUJWFNPEFM
Stable diffusion (2022)
w5FYUUPJNBHFNPEFM
w(FOFSBUJWFBSUJfiDJBMJOUFMMJHFODF
%SFBNTUVEJPCZTUBCJMJUZBJIUUQTCFUBESFBNTUVEJPBJHFOFSBUF
w5IFEJffVTJPONPEFMT

%JffVTJPONPEFM0SJHJOBMQBQFS
'PSXBSEEJffVTJPOQSPDFTT
<MFBSOJOH>
3FWFSTFEJffVTJPOQSPDFTT
<EBUBHFOFSBUJPO>
5SBJOJOHEBUB
(FOFSBUFEEBUB
9G
J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, PMLR, pp. 2256–2265 (2015).

??????
E
({x})=P

0
(x
τ
)

i
T

i
(x
i
|x
i+1
)
&TTFOUJBMJEFB
'PSXBSEEJffVTJPOQSPDFTT<MFBSOJOH>
3FWFSTFEJffVTJPOQSPDFTT<EBUBHFOFSBUJPO>
??????
F
({x})=q(x
0
)

i
T
i
(x
i+1
|x
i
)
&TUJNBUJOHUIFSFWFSTFQSPDFTT̂T

i
=T

i
9G
J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, PMLR, pp. 2256–2265 (2015).
P
0
(x
0
)=q(x
0
)
5SBJOJOHEBUBq
(FOFSBUFEEBUBp
p(x
0
)(≃q(x
0
)) P

0
(x
N
)(≃P
τ
(x
N
))
P
τ
(x
N
)

7BSJBOUTPGUIFEJffVTJPONPEFMT
4DPSFCBTFEHFOFSBUJWFNPEFM
Score-based generative model
Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. In International Conference on Learning Representations. (2021).
%BUBHFOFSBUJPOCZUIFSFWFSTFTUPDIBTUJDEJffFSFOUJBMFRVBUJPO
%BUBHFOFSBUJPOCZUIFPSEJOBSZEJffFSFOUJBMFRVBUJPO QSPCBCJMJUZflPX0%&

t
P
t
(x)=−∇⋅(ν
t
(x)P
t
(x))ν
t
(x)=F
t
(x)−T
t
∇lnP
t
(x)
·
x
˜t
=F
τ−˜t
(x
˜t
)−2̂ν
τ−˜t
(x
˜t
)+2T
τ−˜t
ξ
τ−˜t
·
x
˜t
=−̂ν
τ−˜t
(x
˜t
)
&TUJNBUJOH
WJBUIFTDPSFGVODUJPO
̂ν
t(x)=F
t(x)−T
ts
t(x)
s
t
=∇lnP
t
3FWFSTFEUJNF˜t=τ−t
'PLLFS1MBODLFRVBUJPO 'PSXBSEEJffVTJPOQSPDFTT

7BSJBOUTPGUIFEJffVTJPONPEFMT
'MPXCBTFEHFOFSBUJWFNPEFM
Flow-based generative model
Y. Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, International Conference on Learning Representations (2022)
%BUBHFOFSBUJPOCZUIFPSEJOBSZEJffFSFOUJBMFRVBUJPO

tP
t(x)=−∇⋅(ν
t(x)P
t(x))
·
x
˜t
=−̂ν
τ−˜t
(x
˜t
)
&TUJNBUJOHUIFWFMPDJUZfiFMÊν
t
(x)=ν
t
(x)
3FWFSTFEUJNF˜t=τ−t
$POUJOVJUZFRVBUJPO 'PSXBSEQSPDFTT

&YBNQMFT'PSXBSEEJffVTJPOQSPDFTT
GPSBDDVSBUFEBUBHFOFSBUJPO
F
t
(x)=??????
t
x+b
t
P
t
(x)=

dyP
c
t
(x|y)P
0
(y)
-JOFBSGPSDF
P
c
t
(x|y)=??????(x|μ
t
(y),Σ
t
)
(BVTTJBOUSBOTJUJPOQSPCBCJMJUZ
μ
0
(y)=δ(x−y)Σ
0
=??????
μ
t
(y)=m
t
y
A. Q. Nichol, & P. Dhariwal, In International conference on machine learning (pp. 8162-8171). PMLR (2021)
Σ
t

2
t
??????
m
t
=cos
(
π
2
t
τ)
t∈[0,τ]σ
t
=sin
(
π
2
t
τ)
Y. Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, International Conference on Learning Representations (2022)
μ
t
(y)=m
t

t

2
t
??????m
t
=1−
t
τ
σ
t
=
t
τ
Cosine schedule
Conditional optimal transport schedule (Approximate optimal transport)
t∈[0,τ]
(Figure from) K. Ikeda, T. Uda, D. Okanohara and SI, arXiv:2407.04495.
m
2
t

2
t
=1

.PUJWBUJPO
*TTUPDIBTUJDUIFSNPEZOBNJDTTUJMMVTFGVMGPSVOEFSTUBOEJOHUIF
DVSSFOUUFDIOJRVF FHPQUJNBMUSBOTQPSUJOUIFEJffVTJPONPEFMT
5IFEJffVTJPONPEFMTBSFJOTQJSFECZOPOFRVJMJCSJVNUIFSNPEZOBNJDT
*OUFSNTPGTUPDIBTUJDUIFSNPEZOBNJDTCBTFEPOPQUJNBMUSBOTQPSU
UIFBDDVSBDZPGEBUBHFOFSBUJPOJOUIFEJffVTJPONPEFMTDBOCF
EJTDVTTFEUIFSNPEZOBNJDBMMZ
2VFTUJPO
0VSSFTVMUT
J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, PMLR, pp. 2256–2265 (2015).

w*OUSPEVDUJPO(FOFSBUJWFNPEFMTBOEEJffVTJPONPEFMT
w4UPDIBTUJDUIFSNPEZOBNJDTCBTFEPOPQUJNBMUSBOTQPSU
w.BJOSFTVMUT4QFFEBDDVSBDZUSBEFPffGPSUIFEJffVTJPONPEFMT
K. Ikeda, T. Uda, D. Okanohara and SI, arXiv:2407.04495.

0VUMJOF

0QUJNBMUSBOTQPSUQ8BTTFSTUFJOEJTUBODF
??????
p
(P,Q)=
(
inf
π∈Π(P,Q)

dx

dyπ(x,y)∥x−y∥
p
)
1
p
Π(P,Q)=
{
π(x,y)

dyπ(x,y)=P(x),

dxπ(x,y)=Q(y),π(x,y)≥0
}
P(x)
Q(y)
.FUSJDƒ??????
p
(P,Q)≥0„??????
p
(P,Q)=0⇔P=Q…??????
p
(P,Q)=??????
p
(Q,P)
†??????
p
(P,R)+??????
p
(R,Q)≥??????
p
(P,Q)
*OFRVBMJUZp≥q≥1⇒??????
p
(P,Q)≥??????
q
(P,Q)
p-Wasserstein distance
Textbook: Villani, C. (2009). Optimal transport: old and new (Vol. 338, p. 23). Berlin: springer.

0QUJNBMUSBOTQPSU
&YQSFTTJPOTCBTFEPOEVBMQSPCMFNT
⟨f⟩
P
=

dxf(x)P(x)
Textbook: Villani, C. (2009). Optimal transport: old and new (Vol. 338, p. 23). Berlin: springer.
??????
1
(P,Q)=sup
f∈Lip
1
[⟨f⟩
P
−⟨f⟩
Q
]
Lip
1
={f(x)|∥∇f(x)∥
2
≤1}
??????
2
(P,Q)= inf
{u
t
,Q
t
}
0≤t≤τ
τ

τ
0
dt

dx∥u
t
(x)∥
2
Q
t
(x)
#FOBNPV#SFOJFSGPSNVMB

tQ
t(x)=−∇⋅(u
t(x)Q
t(x))Q
0(x)=P(x)Q
τ(x)=Q(x)
J-D. Benamou & Y. Brenier. Numerische Mathematik 84, 375-393 (2000).
,BOUPSPWJDI3VCJOTUFJOEVBMJUZ
1-Wasserstein distance
2-Wasserstein distance

4UPDIBTUJDUIFSNPEZOBNJDT
GPSUIFEJffVTJPOTZTUFNT

t
P
t
(x)=−∇⋅(ν
t
(x)P
t
(x))
ν
t
(x)=F
t
(x)−T
t
∇lnP
t
(x)
'PLLFS1MBODLFRVBUJPO
·
S
tot
t
=
1
T
t

dx∥ν
t
(x)∥
2
P
t
(x)
The entropy production rate
The entropy production
S
tot
τ
=

τ
0
dt
·
S
tot
t
Review: U. Seifert, Reports on progress in physics, 75, 126001 (2012).

-PXFSCPVOEPOUIFFOUSPQZQSPEVDUJPOSBUF
v
2
(t)=lim
Δt→+0
??????
2
(P
t
,P
t+Δt
)
Δt
=

dx∥ν
ex
t
(x)∥
2
P
t
(x)
Speed in the space of the 2-Wasserstein distance
·
S
tot
t≥
[v
2
(t)]
2
T
t
=
1
T
t

dx∥ν
ex
t(x)∥
2
P
t(x)
Lower bound on the entropy production rate
(Excess entropy production rate*)
M. Nakazato and SI. Phys. Rev. Res. 3, 043093 (2021).
A. Dechant, S-I Sasa and SI. Phys. Rev. Res. 4, L012034 (2022).

t
P
t
(x)=−∇⋅(ν
t
(x)P
t
(x))=−∇⋅(ν
ex
t
(x)P
t
(x))
P
t
(x)
ν
ex
t
(x)=∇ϕ
t
(x)
(Figure from) D. Sekizawa, SI and M. Oizumi, arXiv:2312.03489.
ν
hk
t
(x)=ν
t
(x)−ν
ex
t
(x)
∇⋅(ν
hk
t
(x)P
t
(x))=0
DPOTFSWBUJWF HSBEJFOUflPX
OPODPOTFSWBUJWF
DZDMJD
DG#FOBNPV#SFOJFSGPSNVMB

* Maes, C., & Netočný, K. Journal of Statistical Physics, 154, 188-203 (2014).

.JOJNVNFOUSPQZQSPEVDUJPO
BOEHFPEFTJD PQUJNBMUSBOTQPSU
Thermodynamic speed limit S
tot
τ

[∫
τ
0
dtv
2
(t)]
2
τT

[??????
2
(P
0
,P
τ
)]
2
τT
Minimum entropy production: Geodesic + Conservative
T
t
=T=const.5JNFJOEFQFOEFOUUFNQFSBUVSF
·
S
tot
t
=
[v
2
(t)]
2
T
:Conservative ( or )ν
t
(x)=∇ϕ
t
(x)F
t
(x)=−∇U
t
(x)
E. Aurell, K. Gawȩdzki, C. Mejía-Monasterio, R. Mohayaee, & P. Muratore-Ginanneschi, Journal of statistical physics, 147, 487-505 (2012).
M. Nakazato and SI. Phys. Rev. Res. 3, 043093 (2021).
:Geodesic (optimal transport)v
2
(t)=
??????
2
(P
0
,P
τ
)
τ
=const.
S
tot
τ
=
[??????
2
(P
0
,P
τ
)]
2
τT
??????
2(P
0,P
τ)
P
0
P
τ

τ
0
dtv2(t)
Geodesic:v
2
(t)=const.

5IFSNPEZOBNJDVODFSUBJOUZSFMBUJPO
GPSUIFFYDFTTFOUSPQZQSPEVDUJPOSBUF
A. Dechant, S-I Sasa and SI. Phys. Rev. Res. 4, L012034 (2022).
A.Dechant, S-I Sasa and SI, Phys. Rev. E. 106, 024125 (2022).
·
S
tot
t

[v
2
(t)]
2
T
t

|∂
t
⟨r⟩
P
t
|
2
T
t⟨∥∇r∥
2

P
t
Thermodynamic uncertainty relation
v
r
(t)=
|∂
t⟨r⟩
P
t
|
⟨∥∇r∥
2

P
t
(Normalized) speed of observable r(x)
UJNFJOEFQFOEFOUPCTFSWBCMFr(x)
v
2
(t)≥v
r
(t)
Speed in the space of the 2-Wasserstein distance
is the upper bound on the speed of any observable.
DG??????
2
(P,Q)≥??????
1
(P,Q),r(x)∈Lip
1
R. Nagayama, K. Yoshimura, A. Kolchinsky and SI. arXiv: 2311.16569.

cf.) Cramér–Rao bound: SI and A. Dechant, Physical Review X, 10, 021056 (2020).

S
tot
τ

[∫
τ
0
dtv
2
(t)]
2
τT
Trade-offs: Our results
"OBMPHPVTUPUIFSNPEZOBNJDTQFFEMJNJUBOE
UIFSNPEZOBNJDVODFSUBJOUZSFMBUJPO
*TTUPDIBTUJDUIFSNPEZOBNJDTTUJMMVTFGVMGPSVOEFSTUBOEJOHUIF
DVSSFOUUFDIOJRVF FHPQUJNBMUSBOTQPSUJOUIFEJffVTJPONPEFMT
2VFTUJPO
Stochastic thermodynamics Diffusion models
Optimal transport
= Minimum entropy production
(Approximate) optimal transport
= Accurate data generation
y(empirical finding)
Trade-offs
v
2
(t)≥v
r
(t)
·
S
tot
t

[v
2
(t)]
2
T
t
Analogy

w*OUSPEVDUJPO(FOFSBUJWFNPEFMTBOEEJffVTJPONPEFMT
w4UPDIBTUJDUIFSNPEZOBNJDTCBTFEPOPQUJNBMUSBOTQPSU
w.BJOSFTVMUT4QFFEBDDVSBDZUSBEFPffGPSUIFEJffVTJPONPEFMT
K. Ikeda, T. Uda, D. Okanohara and SI, arXiv:2407.04495.

0VUMJOF

&TUJNBUJPOFSSPSJOUIFEJffVTJPONPEFMT
P
t
(x)=P

τ−t
(x)
P

0
(x)≠˜P

0
(x)
P
0
(x) P
τ
(x)
P
t
(x)
Estimation error (measured by the 1-Wasserstein distance)
??????
1
(p,q) e.g.,) K. Oko, S. Akiyama & T. Suzuki, In International Conference on Machine Learning (pp. 26517-26582). PMLR (2023).
P

˜t
(x)
P

0
(x)P

τ
(x)
˜P

τ
(x)
˜P

0
(x)
˜P

t
(x)

1FSUVSCBUJPOBOESFTQPOTF
D
0
=

dx
(P

0
(x)−˜P

0
(x))
2
P

0
(x)
Initial perturbation
: -divergenceχ
2
Response function
Δ??????
2
1
D
0
=
[??????
1
(p,q)−??????
1
(P

0
,˜P

0
)]
2
D
0
'PSXBSEQSPDFTT3FWFSTFQSPDFTT

t
P
t
(x)=−∇⋅(ν
t
(x)P
t
(x))

˜t
P

˜t
(x)=∇⋅(ν
τ−˜t
(x)P

˜t
(x))
&TUJNBUFEQSPDFTT
<1SPCBCJMJUZflPX0%&'MPXCBTFEHFOFSBUJWFNPEFMJOH>
P
t
(x)=P

τ−t
(x)
˜t=τ−t

˜t
˜P

˜t
(x)=∇⋅(ν
τ−˜t
(x)˜P

˜t
(x))
Estimation error
Perturbation
P

0
(x)
˜P

0
(x)
JTTNBMMN%BUBHFOFSBUJPOJTSPCVTUUPUIFJOJUJBMQFSUVSCBUJPO
Δ??????
2
1
D
0

.BJOSFTVMUT
4QFFEBDDVSBDZUSBEFPffGPSUIFEJffVTJPONPEFMT
Δ??????
2
1
τD
0


τ
0
dtT
t
·
S
tot
t
Δ??????
2
1
τD
0


τ
0
dt[v
2(t)]
2
Conservative case
( or )ν
t(x)=∇ϕ
t(x)F
t(x)=−∇U
t(x)
5IFSPCVTUOFTTPGEBUBHFOFSBUJPOJTHFOFSBMMZMJNJUFECZUIFEJffVTJPOTQFFE
PSUIFFOUSPQZQSPEVDUJPOSBUFJOUIFGPSXBSEQSPDFTT
v
2
(t)
·
S
tot
t
Speed-accuracy trade-off
P

0
(x)
˜P

0
(x)

.BJOSFTVMUT
4QFFEBDDVSBDZUSBEFPffGPSUIFEJffVTJPONPEFMT *OTUBOUBOFPVT
|∂
t
??????
1
(˜P

τ−t
,P

τ−t
)|
2
D
0
≤T
t
·
S
tot
t
cf.) Thermodynamic uncertainty relation v
r
(t)≤v
2
(t)
Conservative case ( or )ν
t(x)=∇ϕ
t(x)F
t(x)=−∇U
t(x)
v
loss
(t)≤v
2
(t) v
loss
(t)=
|∂
t
??????
1
(˜P

τ−t
,P

τ−t
)|
D
0
Instantaneous speed-accuracy trade-off

4LFUDIPGQSPPG*OTUBOUBOFPVTUSBEFPf

˜t
P

˜t
(x)=∇⋅(ν
τ−˜t
(x)P

˜t
(x))

˜t
˜P

˜t
(x)=∇⋅(ν
τ−˜t
(x)˜P

˜t
(x))

t
[P

τ−t
(x)−˜P

τ−t
(x)]=−∇⋅(ν
t
(x)[P

τ−t
(x)−˜P

τ−t
(x)])
f∈Lip
1 |∂
t
(⟨f⟩
P

τ−t
−⟨f⟩˜P

τ−t
)|
2
=
(∫
dxf(x)∂
t
[P

˜t
(x)−˜P

˜t
(x)]
)
2
=
(∫
dx∇f(x)⋅ν
t
(x)[P

τ−t
(x)−˜P

τ−t
(x)]
)
2
˜t=τ−t
(∫
dx∥ν
t
(x)∥
2
P
t
(x)
)(∫
dx
[P

τ−t
(x)−˜P

τ−t
(x)]
2
P

τ−t(x) )
Cauchy-Schwartz inequality
+ 1-Lipshitz ( )∥∇f(x)∥≤1

≤|∂
t
(⟨f⟩
P

τ−t
−⟨f⟩˜P

τ−t
)|
2
∃f∈Lip
1
Continuity equation
+ Kantrovich-Rubinstein duality
(Time-independent)T
t
·
S
tot
t D
0
|∂
t
??????
1
(˜P

τ−t
,P

τ−t
)|
2
D
0
≤T
t
·
S
tot
t
|∂
t??????
1(P

τ−t
,˜P

τ−t
)|
2
Instantaneous speed-accuracy trade-off

4LFUDIPGQSPPGTQFFEBDDVSBDZUSBEFPf
|∂
t
??????
1
(˜P

τ−t
,P

τ−t
)|
2
D
0
≤T
t
·
S
tot
t
Instantaneous speed-accuracy trade-off

τ
0
dtT
t
·
S
tot
t


τ
0
dt
|∂
t
??????
1
(˜P

τ−t
,P

τ−t
)|
2
D
0
Cauchy-Schwartz inequality≥
(Δ??????
1
)
2
τD
0
Δ??????
2
1
τD
0


τ
0
dtT
t
·
S
tot
t
Speed-accuracy trade-off

l0QUJNBMGPSXBSEQSPDFTT
GPSBDDVSBUFEBUBHFOFSBUJPO
Δ??????
2
1
τD
0


τ
0
dt[v
2
(t)]
2
.JOJNJ[JOHUIFVQQFSCPVOE

τ
0
dt[v
2
(t)]
2

??????
2
(P
0
,P
τ
)
2
τ
v
2
(t)=
??????
2
(P
0
,P
τ
)
τ
=const.
:Geodesic (optimal transport)
DG.JOJNVNFOUSPQZQSPEVDUJPO
5IFPQUJNBMGPSXBSEQSPDFTTJTBEZOBNJDTESJWFOCZPQUJNBMUSBOTQPSU
JFHFPEFTJDJOUIFTQBDFPGUIF8BTTFSTUFJOEJTUBODF

τ
0
dt[v
2
(t)]
2
=
??????
2(P
0,P
τ)
2
τ
Minimum value

l4VCPQUJNBMzGPSXBSEQSPDFTT
GPSBDDVSBUFEBUBHFOFSBUJPO
N. Shaul, R. T. Chen, M. Nickel, M. Le, and Y. Lipman, in International Conference on Machine Learning, PMLR, pp. 30883–30907 (2023)
If the number of data is small enough compared to the dimension of the data
( ),
N
D n
d
N
D
/n
d
→0
Theorem

τ
0
dt[v
2
(t)]
2
≃n
d

τ
0
dt[(∂
t
σ
t
)
2
+(∂
t
m
t
)
2
]
P
t
(x)=

dyP
c
t
(x|y)P
0
(y)
P
c
t
(x|y)=??????(x|m
t
y,σ
2
t
??????)
Δ??????
2
1
τD
0


τ
0
dt[v
2
(t)]
2
≃n
d

τ
0
dt[(∂
t
σ
t
)
2
+(∂
t
m
t
)
2
]
.JOJNJ[JOHUIFBQQSPYJNBUFVQQFSCPVOE
TVCPQUJNBM

l4VCPQUJNBMzGPSXBSEQSPDFTT
$POEJUJPOBMPQUJNBMUSBOTQPSUTDIFEVMF
n
d

τ
0
dt[(∂
t
σ
t
)
2
+(∂
t
m
t
)
2
]≥n
d

0
−σ
τ
)
2
+(m
0
−m
τ
)
2
τ
:Conditional optimal transport schedule Minimum value
m
t
=1−
t
τ
σ
t
=
t
τ
n
d

τ
0
dt[(∂
t
σ
t
)
2
+(∂
t
m
t
)
2
]=n
d

0
−σ
τ
)
2
+(m
0
−m
τ
)
2
τ
5IFlTVCPQUJNBMGPSXBSEQSPDFTTJTBEZOBNJDTESJWFO
CZUIFDPOEJUJPOBMPQUJNBMUSBOTQPSUTDIFEVMF

l4VCPQUJNBMzGPSXBSEQSPDFTT
$PTJOFTDIFEVMF
n
d

τ
0
dt[(∂
t
σ
t
)
2
+(∂
t
m
t
)
2
]≥n
d

0
−θ
τ
)
2
τ
:Cosine schedule Minimum value
5IFlTVCPQUJNBMGPSXBSEQSPDFTTVOEFSUIFDPOTUSBJOUJTBEZOBNJDT
ESJWFOCZUIFDPTJOFTDIFEVMF
Constraint: N m
2
t+σ
2
t=1(m
t,σ
t)=(cosθ
t,sinθ
t)
m
t
=cos
(
π
2
t
τ)
σ
t
=sin
(
π
2
t
τ)
n
d

τ
0
dt[(∂
t
σ
t
)
2
+(∂
t
m
t
)
2
]=n
d

0
−θ
τ
)
2
τ

&YBNQMFTPGPQUJNBMBOETVCPQUJNBMEZOBNJDT
GPSUIFEJffVTJPONPEFMT4XJTTSPMM
.PTUSPCVTU

&YBNQMFTPGPQUJNBMBOETVCPQUJNBMEZOBNJDT
GPSUIFEJffVTJPONPEFMT(BVTTJBONJYUVSF
:Forward processP
t(x)
:Estimated process˜P

τ−t
(x)
˜P

0
(x) ˜P

0
(x) ˜P

0
(x)
˜P

0
(x)
˜P

τ−t
(x)
P
t
(x) P
t
(x)
˜P

τ−t
(x)
˜P

τ−t
(x)
P
t
(x)
Initial perturbation
5IFEBUBTUSVDUVSF UIFUXPQFBLTJTXFMMSFDPWFSFEFWFOEVSJOH
UIFEZOBNJDTPGUIFFTUJNBUFEQSPDFTTJOUIFDBTFPGPQUJNBMUSBOTQPSU

4QFFEBDDVSBDZUSBEFPffGPSUIFEJffVTJPONPEFMT
*OTUBOUBOFPVT
|∂
t
??????
1
(˜P

τ−t
,P

τ−t
)|
2
D
0
=[v
loss
(t)]
2
≤[v
2
(t)]
2
*OUIFDBTFPGPQUJNBMUSBOTQPSUUIFEBUBTUSVDUVSFJTOPUXFMMSBQJEMZDIBOHFE
EVSJOHUIFEZOBNJDTPGUIFFTUJNBUFEQSPDFTT

4QFFEBDDVSBDZUSBEFPff
GPSUIFEJffVTJPONPEFMT
(Δ??????
1
)
2
τD
0


τ
0
dt[v
loss
(t)]
2


τ
0
dt[v
2
(t)]
2
5IFCPVOETBSFUJHIUFSJOUIFDBTFPGUIFPQUJNBMUSBOTQPSU
DPNQBSFEUPUIFDPTJOFBOEDPOEJUJPOBMPQUJNBMUSBOTQPSU
TDIFEVMFT
5IFWBMVFPGGPSUIFPQUJNBMUSBOTQPSUJTUIF
TNBMMFTUGPSBOZTDIFEVMFT
(Δ??????
1
)
2
/(τD
0
)
(Δ??????
1
)
2
τD
0

τ
0
dt[v
loss
(t)]
2

τ
0
dt[v
2
(t)]
2
*OUFSFTUJOHMZUIFDPTJOFBOEDPOEJUJPOBMPQUJNBMUSBOTQPSUTDIFEVMFTXPSLXFMMJOUIFEBUBHFOFSBUJPO
GPSUIJTTJNQMFDBTFCFDBVTFUIFSFTQPOTFGVODUJPOJTTNBMMFOPVHI(Δ??????
1)
2
/(τD
0)

(Δ??????
1
)
2
/(τD
0
)

4VNNBSZ
w8FVTFEUIFUFDIOJRVFPGTUPDIBTUJDUIFSNPEZOBNJDTBOEPQUJNBMUSBOTQPSUUP
EJTDVTTUIFBDDVSBUFEBUBHFOFSBUJPOJOUIFEJffVTJPONPEFMT
w8FEFSJWFEUIFUSBEFPffSFMBUJPOTIJQCFUXFFOUIFSPCVTUEBUBHFOFSBUJPOUPUIF
JOJUJBMQFSUVSCBUJPOBOEUIFEJffVTJPOTQFFEDPTUHJWFOCZUIF8BTTFSTUFJOEJTUBODF
PSUIFFOUSPQZQSPEVDUJPOSBUF
w8FEJTDVTTUIFPQUJNBMJUZBOETVCPQUJNBMJUZPGUIFGPSXBSEEJffVTJPOQSPDFTTJO
UFSNTPGUIFUSBEFPffBOEXFGPVOEUIFUIFPSFUJDBMWBMJEJUZPGUIFXFMMVTFE
NFUIPET JFUIFDPTJOFBOEUIFDPOEJUJPOBMPQUJNBMUSBOTQPSUTDIFEVMF
For more information and examples, see K. Ikeda, T. Uda, D. Okanohara and SI, arXiv:2407.04495.
5BLFIPNFNFTTBHF
4UPDIBTUJDUIFSNPEZOBNJDT CBTFEPOPQUJNBMUSBOTQPSUJTVTFGVMGPSHFOFSBUJWF"*