Tomasi and Kanade - Structure from Motion

juampamuc 35 views 116 slides Jun 22, 2024
Slide 1
Slide 1 of 116
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116

About This Presentation

Slides describing the "structure from motion" problem


Slide Content

Structure from Motion
factorization approach

Structure from Motion
factorization approach

International Journal of Computer Vision, 9:2, 137-154 (1992)
© 1992 Kluwer Academic Publishers, Manufactured in The Netherlands.
Shape and Motion from Image Streams under Orthography:
a Factorization Method
CARLO TOMASI
Department of Computer Science, Cornell University, Ithaca, NY 14850
TAKEO KANADE
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213
Received
Abstract
Inferring scene geometry and camera motion from a stream of images is possible in principle, but is an ill-conditioned
problem when the objects are distant with respect to their size. We have developed a factorization method that
can overcome this difficulty by recovering shape and motion under orthography without computing depth as an
intermediate step.
An image stream can be represented by the 2FxP measurement matrix of the image coordinates of P points
tracked through F frames. We show that under orthographic projection this matrix is of rank 3.
Based on this observation, the factorization method uses the singular-value decomposition technique to factor
the measurement matrix into two matrices which represent object shape and camera rotation respectively. Two
of the three translation components are computed in a preprocessing stage. The method can also handle and obtain
a full solution from a partially filled-in measurement matrix that may result from occlusions or tracking failures.
The method gives accurate results, and does not introduce smoothing in either shape or motion. We demonstrate
this with a series of experiments on laboratory and outdoor image streams, with and without occlusions.
1 Introduction
The structure-from-motion problem--recovering scene
geometry and camera motion from a sequence of
images--has attracted much of the attention of the vi-
sion community over the last decade. Yet it is common
knowledge that existing solutions work well for perfect
images, but are very sensitive to noise. We present a
new method called thefactorization method which can
robustly recover shape and motion from a sequence of
images under orthographic projection. The effects of
camera translation along the optical axis are not ac-
counted for by orthography. Consequently, this com-
ponent of motion cannot be recovered by our method
and must be small relative to the scene distance.
However, this restriction to shallow motion improves
dramatically the quality of the computed shape and of
the remaining five motion parameters. We demonstrate
this with a series of experiments on laboratory and out-
door sequences, with and without occlusions.
In the factorization method, we represent an image
sequence as a 2FxP measurement matrix W, which is
made up of the horizontal and vertical coordinates of
P points tracked through F frames. If image coordinates
are measured with respect to their centroid, we prove
the rank theorem: under orthography, the measurement
matrix is of rank 3. As a consequence of this theorem,
we show that the measurement matrix can be factored
into the product of two matrixes R and S. Here, R is
a 2Fx3 matrix that represents camera rotation, and S
is a 3 xP matrix that represents shape in a coordinate
system attached to the object centroid. The two compon-
ents of the camera translation along the image plane are
computed as averages of the rows of W. When features
appear and disappear in the image sequence because of
occlusions or tracking failures, the resulting measure-
ment matrix W is only partially filled in. The factoriza-
tion method can handle this situation by growing a par-
tial solution obtained from an initial full submatrix into
a complete solution with an iterative procedure.
International Journal of Computer Vision, 1992

Assumptions

Assumptions

Assumptions

given point correspondences across multiple views Assumptions

recover the cameras’ relative 3D rotations Assumptions

recover the cameras’ relative 3D rotations
recover the 3D scene structure Assumptions

Assumptions

Assumptions
all points are seen in all cameras

Assumptions
all points are seen in all cameras
scene is rigid

(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x
00
,y
00
,z
00
)
>
Orthographic

(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
Orthographic

(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
✓
x
y
â—†
=
✓
100
010
â—†
0
@
X
Y
Z
1
A
✓
x
y
â—†
=
✓
100
010
â—†
0
@
X
Y
Z
1
A Orthographic

(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
✓
x
y
â—†
=
✓
100
010
â—†
0
@
X
Y
Z
1
A
✓
x
y
â—†
=
✓
100
010
â—†
0
@
X
Y
Z
1
A
assumes points expressed in camera coordinates Orthographic

0
@R
0
@
X
Y
Z
1
A+t
1
A
(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
✓
x
y
â—†
=
✓
100
010
â—†
0
@
X
Y
Z
1
A
✓
x
y
â—†
=
✓
100
010
â—†
0
@
X
Y
Z
1
A

0
@R
0
@
X
Y
Z
1
A+t
1
A
(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
✓
x
y
â—†
=
✓
100
010
â—†
0
@
X
Y
Z
1
A
✓
x
y
â—†
=
✓
100
010
â—†
0
@
X
Y
Z
1
A
eliminates third row

0
@R
0
@
X
Y
Z
1
A+t
1
A
(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
✓
x
y
â—†
=
✓
100
010
â—†
0
@
X
Y
Z
1
A
✓
x
y
â—†
=
✓
100
010
â—†
0
@
X
Y
Z
1
A
=
✓
a11a12a13
a21a22a23
â—†
0
@
X
Y
Z
1
A+t

3
major steps
Factorization
approach

Step 1
Remove translation

xi,j=AjXi+tj

xi,j=AjXi+tj
point i captured by camera j

xi,j=AjXi+tjxi,j=AjXi+tj

xi,j=AjXi+tjxi,j=AjXi+tj
eliminate translation by subtracting mean

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj xi,j=AjXi+tj
x!
1
n
n
X
k=1
(AjXk+tj)
eliminate translation by subtracting mean

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj xi,j=AjXi+tj
x!
1
n
n
X
k=1
(AjXk+tj)
expand

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj xi,j=AjXi+tj
x!
1
n
n
X
k=1
(AjXk+tj)
=AjXi+tj!
1
n
Aj
n
X
k=1
Xk!tj
expand

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj xi,j=AjXi+tj
x!
1
n
n
X
k=1
(AjXk+tj)
=AjXi+tj!
1
n
Aj
n
X
k=1
Xk!tj
expand

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj xi,j=AjXi+tj
x!
1
n
n
X
k=1
(AjXk+tj)
=AjXi+tj!
1
n
Aj
n
X
k=1
Xk!tj
expand

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj xi,j=AjXi+tj
x!
1
n
n
X
k=1
(AjXk+tj)
=AjXi+tj!
1
n
Aj
n
X
k=1
Xk!tj
expand
factor

x
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj xi,j=AjXi+tj
x
1
n
n
X
k=1
(AjXk+tj)
=AjXi+tj
1
n
Aj
n
X
k=1
Xk tj
=Aj

Xi
1
n
n
X
k=1
Xk
!
expand
factor

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj
=Aj

Xi!
1
n
n
X
k=1
Xk
!

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj
=Aj

Xi!
1
n
n
X
k=1
Xk
!
2D normalized point
(observed)

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj
=Aj

Xi!
1
n
n
X
k=1
Xk
!
2D normalized point
(observed)
3D normalized point
(unobserved)

Step 2
Factor measurement matrix

0
B
B
B
@
x11x12...x1n
x21x22...x2n
.
.
.
xm1xm2...xmn
1
C
C
C
A
measurement matrix of 2D normalized points

0
B
B
B
@
x11x12...x1n
x21x22...x2n
.
.
.
xm1xm2...xmn
1
C
C
C
A 0
B
B
B
@
x11x12...x1n
x21x22...x2n
.
.
.
xm1xm2...xmn
1
C
C
C
A
n points captured by one camera

0
B
B
B
@
x11x12...x1n
x21x22...x2n
.
.
.
xm1xm2...xmn
1
C
C
C
A 0
B
B
B
@
x11x12...x1n
x21x22...x2n
.
.
.
xm1xm2...xmn
1
C
C
C
A
point correspondence across m cameras

0
B
B
B
@
x11x12...x1n
x21x22...x2n
.
.
.
xm1xm2...xmn
1
C
C
C
A

0
B
B
B
@
x11x12...x1n
x21x22...x2n
.
.
.
xm1xm2...xmn
1
C
C
C
A
=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
=
=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(

=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
=
=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
M
2m⇥n

=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
=
=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
M
2m⇥n
measurement
matrix

=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
M
2m⇥n
=
=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
measurement
matrix

R
2m⇥3
M
2m⇥n
=
=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
measurement
matrix

R
2m⇥3
M
2m⇥n
=
=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
motion matrix
measurement
matrix

R
2m⇥3
M
2m⇥n
=
=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
motion matrix
measurement
matrix

R
2m⇥3
M
2m⇥n
=
S
3⇥n
motion matrix
measurement
matrix

R
2m⇥3
M
2m⇥n
=
S
3⇥n
motion matrix
shape matrixmeasurement
matrix

=
M
2m⇥n
R
2m⇥3
S
3⇥n
motion matrix
shape matrixmeasurement
matrix

=
What is the rank of ideal measurement matrix?
M
2m⇥n
R
2m⇥3
S
3⇥n

Theorem:
Let A be an ____matrix. Then
____
m⇥n
rank(A)min(m, n)
.

Theorem:
Let A and B be ____ and____matrices, respectively.
Then _____________________

m⇥n
n⇥k
rank(AB)min{rank(A),rank(B)}.

=
M R S
What is the rank of ideal measurement matrix?
2m⇥n 2m⇥3
3⇥n

=
M R S
2m⇥n 2m⇥3
3⇥n
rank of ideal measurement matrix is at most three

M
2m⇥n

m⇥n
A U
m⇥n
=
D V
>
n⇥n n⇥n
SVD

m⇥n
A U
m⇥n
=
D V
>
n⇥n n⇥n
SVD
rank equals the number of non-zero singular values

M
2m⇥n

=
V
>
n⇥n
2m⇥n
M
2m⇥n
DU
n⇥nn⇥nn⇥n

=
V
>
n⇥n
2m⇥n
M
2m⇥n
DU
ideally there are at most three non-zero singular values
n⇥nn⇥nn⇥n

=
V
>
n⇥n n⇥n
2m⇥n
M
2m⇥n
DU V
>
ideally there are at most three non-zero singular values
n⇥nn⇥n

=
V
>
n⇥n n⇥n
2m⇥n
M
2m⇥n
DU V
>
ideally there are at most three non-zero singular values
3⇥3 3⇥3

=
V
>
n⇥n n⇥n
2m⇥n
M
2m⇥n
DU V
>
3⇥3
rank may exceed three due to noise
3⇥3
3⇥3

Enforce
rank 3
ˆ
M
⇤
= argmin
ˆ
M
k
ˆ
M"Mk
2
subject to rank(
ˆ
M)=3

Enforce
rank 3
ˆ
M
⇤
= argmin
ˆ
M
k
ˆ
M"Mk
2
subject to rank(
ˆ
M)=3
take SVD and set singular values past third to zero

3⇥3
=
V
>
M
2m⇥n
DU V
>
rank may exceed three due to noise
2m⇥n2m⇥n
n⇥nn⇥n

=
2m⇥n
M
2m⇥n
DU V
>
3⇥3
2m⇥n
n⇥nn⇥n

=
M
2m⇥n
DU V
>
3⇥3
2m⇥32m⇥n
n⇥nn⇥n

n⇥n
=
M
2m⇥n
DU V
>
3⇥3
2m⇥32m⇥n
n⇥n

=
M
2m⇥n
DU V
>
3⇥3 3⇥n
2m⇥32m⇥n
n⇥n

=
M
2m⇥n
DU V
>
3⇥3
2m⇥32m⇥n
3⇥nn⇥n

=
M R S
2m⇥n 2m⇥3
3⇥n
motion matrix
shape matrixmeasurement
matrix

=
M
2m⇥n
DU
3⇥3
2m⇥3
3⇥n
V
>

=
M
2m⇥n
U
2m⇥3
3⇥n
D
1
2
3⇥3
D
1
2
3⇥3
V
> D 3⇥3

U
2m⇥3
3⇥n
D
1
2
3⇥3
D
1
2
3⇥3
V
>

U
2m⇥3
3⇥n
D
1
2
3⇥3
D
1
2
3⇥3
V
>
motion matrix

U
2m⇥3
3⇥n
D
1
2
3⇥3
D
1
2
3⇥3
V
>
motion matrix
shape matrix

Step 3
Enforce Euclidean constraints

ˆ
Sˆ
R

ˆ
Sˆ
R
motion-shape decomposition is not unique

C C
!1
ˆ
R
ˆ
S
motion-shape decomposition is not unique

Cˆ
R

Cˆ
R
enforce Euclidean constraints on the motion matrix

0
@R
0
@
X
Y
Z
1
A+t
1
A
(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
✓
x
y
â—†
=
✓
100
010
â—†
0
@
X
Y
Z
1
A
✓
x
y
â—†
=
✓
100
010
â—†
0
@
X
Y
Z
1
A
=
✓
a11a12a13
a21a22a23
â—†
0
@
X
Y
Z
1
A+t

0
@R
0
@
X
Y
Z
1
A+t
1
A
(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
✓
x
y
â—†
=
✓
100
010
â—†
0
@
X
Y
Z
1
A
✓
x
y
â—†
=
✓
100
010
â—†
0
@
X
Y
Z
1
A
=
✓
a11a12a13
a21a22a23
â—†
0
@
X
Y
Z
1
A+t
rows of matrix are unit vectors and orthogonal

Cˆ
R
enforce Euclidean constraints on the motion matrix

a
>
i,1CC
>
ai,1=1
a
>
i,2CC
>
ai,2=1

a
>
i,1CC
>
ai,1=1
a
>
i,2CC
>
ai,2=1
unit vector constraint for each camera row

a
>
i,1CC
>
ai,1=1
a
>
i,2CC
>
ai,2=1
a
>
i,1CC
>
ai,2=0

a
>
i,1CC
>
ai,1=1
a
>
i,2CC
>
ai,2=1
a
>
i,1CC
>
ai,2=0
orthogonality constraint between a camera’s rows

three constraints per camera
a
>
i,1CC
>
ai,1=1
a
>
i,2CC
>
ai,2=1
a
>
i,1CC
>
ai,2=0

L=CC
>
solve linear equations for
a
>
i,1CC
>
ai,1=1
a
>
i,2CC
>
ai,2=1
a
>
i,1CC
>
ai,2=0

recover C from L using Cholesky decomposition
a
>
i,1CC
>
ai,1=1
a
>
i,2CC
>
ai,2=1
a
>
i,1CC
>
ai,2=0

R=
ˆ
RC S=C
!1ˆ
S
Euclidean
upgrade

tl;dl

M

M
take SVD of measurement matrix

=
M DU V
>

=
M DU V
>
extract rank three related matrix components

=
M DU V
>
extract rank three related matrix components

D
1
2D
1
2
=
M DU V
>

D
1
2D
1
2
=
M DU V
>
create motion and shape matrices

U
D
1
2 D
1
2
V
>

U
D
1
2 D
1
2
V
>
motion matrix

U
D
1
2 D
1
2
V
>
motion matrix
shape matrix

U
D
1
2 D
1
2
V
>
motion matrix
shape matrix
remove affine ambiguity with Euclidean constraints

Reconstruction result

Tomasi and Kanade, 1992

Tomasi and Kanade, 1992

tracked points

tracked points

reconstruction by factorization

reconstruction by factorization