Tomasi and Kanade - Structure from Motion

Structure from Motion
factorization approach

International Journal of Computer Vision, 9:2, 137-154 (1992)
© 1992 Kluwer Academic Publishers, Manufactured in The Netherlands.
Shape and Motion from Image Streams under Orthography:
a Factorization Method
CARLO TOMASI
Department of Computer Science, Cornell University, Ithaca, NY 14850
TAKEO KANADE
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213
Received
Abstract
Inferring scene geometry and camera motion from a stream of images is possible in principle, but is an ill-conditioned
problem when the objects are distant with respect to their size. We have developed a factorization method that
can overcome this difficulty by recovering shape and motion under orthography without computing depth as an
intermediate step.
An image stream can be represented by the 2FxP measurement matrix of the image coordinates of P points
tracked through F frames. We show that under orthographic projection this matrix is of rank 3.
Based on this observation, the factorization method uses the singular-value decomposition technique to factor
the measurement matrix into two matrices which represent object shape and camera rotation respectively. Two
of the three translation components are computed in a preprocessing stage. The method can also handle and obtain
a full solution from a partially filled-in measurement matrix that may result from occlusions or tracking failures.
The method gives accurate results, and does not introduce smoothing in either shape or motion. We demonstrate
this with a series of experiments on laboratory and outdoor image streams, with and without occlusions.
1 Introduction
The structure-from-motion problem--recovering scene
geometry and camera motion from a sequence of
images--has attracted much of the attention of the vi-
sion community over the last decade. Yet it is common
knowledge that existing solutions work well for perfect
images, but are very sensitive to noise. We present a
new method called thefactorization method which can
robustly recover shape and motion from a sequence of
images under orthographic projection. The effects of
camera translation along the optical axis are not ac-
counted for by orthography. Consequently, this com-
ponent of motion cannot be recovered by our method
and must be small relative to the scene distance.
However, this restriction to shallow motion improves
dramatically the quality of the computed shape and of
the remaining five motion parameters. We demonstrate
this with a series of experiments on laboratory and out-
door sequences, with and without occlusions.
In the factorization method, we represent an image
sequence as a 2FxP measurement matrix W, which is
made up of the horizontal and vertical coordinates of
P points tracked through F frames. If image coordinates
are measured with respect to their centroid, we prove
the rank theorem: under orthography, the measurement
matrix is of rank 3. As a consequence of this theorem,
we show that the measurement matrix can be factored
into the product of two matrixes R and S. Here, R is
a 2Fx3 matrix that represents camera rotation, and S
is a 3 xP matrix that represents shape in a coordinate
system attached to the object centroid. The two compon-
ents of the camera translation along the image plane are
computed as averages of the rows of W. When features
appear and disappear in the image sequence because of
occlusions or tracking failures, the resulting measure-
ment matrix W is only partially filled in. The factoriza-
tion method can handle this situation by growing a par-
tial solution obtained from an initial full submatrix into
a complete solution with an iterative procedure.
International Journal of Computer Vision, 1992

Assumptions

given point correspondences across multiple views Assumptions

recover the cameras’ relative 3D rotations Assumptions

recover the cameras’ relative 3D rotations
recover the 3D scene structure Assumptions

Assumptions

Assumptions
all points are seen in all cameras

Assumptions
all points are seen in all cameras
scene is rigid

(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x
00
,y
00
,z
00
)
>
Orthographic

(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
Orthographic

(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
✓
x
y
◆
=
✓
100
010
◆
0
@
X
Y
Z
1
A
✓
x
y
◆
=
✓
100
010
◆
0
@
X
Y
Z
1
A Orthographic

(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
✓
x
y
◆
=
✓
100
010
◆
0
@
X
Y
Z
1
A
✓
x
y
◆
=
✓
100
010
◆
0
@
X
Y
Z
1
A
assumes points expressed in camera coordinates Orthographic

0
@R
0
@
X
Y
Z
1
A+t
1
A
(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
✓
x
y
◆
=
✓
100
010
◆
0
@
X
Y
Z
1
A
✓
x
y
◆
=
✓
100
010
◆
0
@
X
Y
Z
1
A

0
@R
0
@
X
Y
Z
1
A+t
1
A
(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
✓
x
y
◆
=
✓
100
010
◆
0
@
X
Y
Z
1
A
✓
x
y
◆
=
✓
100
010
◆
0
@
X
Y
Z
1
A
eliminates third row

0
@R
0
@
X
Y
Z
1
A+t
1
A
(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
✓
x
y
◆
=
✓
100
010
◆
0
@
X
Y
Z
1
A
✓
x
y
◆
=
✓
100
010
◆
0
@
X
Y
Z
1
A
=
✓
a11a12a13
a21a22a23
◆
0
@
X
Y
Z
1
A+t

3
major steps
Factorization
approach

Step 1
Remove translation

xi,j=AjXi+tj

xi,j=AjXi+tj
point i captured by camera j

xi,j=AjXi+tjxi,j=AjXi+tj

xi,j=AjXi+tjxi,j=AjXi+tj
eliminate translation by subtracting mean

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj xi,j=AjXi+tj
x!
1
n
n
X
k=1
(AjXk+tj)
eliminate translation by subtracting mean

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj xi,j=AjXi+tj
x!
1
n
n
X
k=1
(AjXk+tj)
expand

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj xi,j=AjXi+tj
x!
1
n
n
X
k=1
(AjXk+tj)
=AjXi+tj!
1
n
Aj
n
X
k=1
Xk!tj
expand

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj xi,j=AjXi+tj
x!
1
n
n
X
k=1
(AjXk+tj)
=AjXi+tj!
1
n
Aj
n
X
k=1
Xk!tj
expand
factor

x
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj xi,j=AjXi+tj
x
1
n
n
X
k=1
(AjXk+tj)
=AjXi+tj
1
n
Aj
n
X
k=1
Xk tj
=Aj

Xi
1
n
n
X
k=1
Xk
!
expand
factor

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj
=Aj

Xi!
1
n
n
X
k=1
Xk
!

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj
=Aj

Xi!
1
n
n
X
k=1
Xk
!
2D normalized point
(observed)

x!
1
n
n
X
k=1
xi,k
xi,j=AjXi+tj
=Aj

Xi!
1
n
n
X
k=1
Xk
!
2D normalized point
(observed)
3D normalized point
(unobserved)

Step 2
Factor measurement matrix

0
B
B
B
@
x11x12...x1n
x21x22...x2n
.
.
.
xm1xm2...xmn
1
C
C
C
A
measurement matrix of 2D normalized points

0
B
B
B
@
x11x12...x1n
x21x22...x2n
.
.
.
xm1xm2...xmn
1
C
C
C
A 0
B
B
B
@
x11x12...x1n
x21x22...x2n
.
.
.
xm1xm2...xmn
1
C
C
C
A
n points captured by one camera

0
B
B
B
@
x11x12...x1n
x21x22...x2n
.
.
.
xm1xm2...xmn
1
C
C
C
A 0
B
B
B
@
x11x12...x1n
x21x22...x2n
.
.
.
xm1xm2...xmn
1
C
C
C
A
point correspondence across m cameras

0
B
B
B
@
x11x12...x1n
x21x22...x2n
.
.
.
xm1xm2...xmn
1
C
C
C
A

0
B
B
B
@
x11x12...x1n
x21x22...x2n
.
.
.
xm1xm2...xmn
1
C
C
C
A
=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
=
=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(

=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
=
=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
M
2m⇥n

=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
=
=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
M
2m⇥n
measurement
matrix

=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
M
2m⇥n
=
=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
measurement
matrix

R
2m⇥3
M
2m⇥n
=
=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
measurement
matrix

R
2m⇥3
M
2m⇥n
=
=
0
B
B
B
@
A1
A2
.
.
.
Am
1
C
C
C
A
'
X1···Xn
(
motion matrix
measurement
matrix

R
2m⇥3
M
2m⇥n
=
S
3⇥n
motion matrix
measurement
matrix

R
2m⇥3
M
2m⇥n
=
S
3⇥n
motion matrix
shape matrixmeasurement
matrix

=
M
2m⇥n
R
2m⇥3
S
3⇥n
motion matrix
shape matrixmeasurement
matrix

=
What is the rank of ideal measurement matrix?
M
2m⇥n
R
2m⇥3
S
3⇥n

Theorem:
Let A be an ____matrix. Then
____
m⇥n
rank(A)min(m, n)
.

Theorem:
Let A and B be ____ and____matrices, respectively.
Then _____________________

m⇥n
n⇥k
rank(AB)min{rank(A),rank(B)}.

=
M R S
What is the rank of ideal measurement matrix?
2m⇥n 2m⇥3
3⇥n

=
M R S
2m⇥n 2m⇥3
3⇥n
rank of ideal measurement matrix is at most three

M
2m⇥n

m⇥n
A U
m⇥n
=
D V
>
n⇥n n⇥n
SVD

m⇥n
A U
m⇥n
=
D V
>
n⇥n n⇥n
SVD
rank equals the number of non-zero singular values

M
2m⇥n

=
V
>
n⇥n
2m⇥n
M
2m⇥n
DU
n⇥nn⇥nn⇥n

=
V
>
n⇥n
2m⇥n
M
2m⇥n
DU
ideally there are at most three non-zero singular values
n⇥nn⇥nn⇥n

=
V
>
n⇥n n⇥n
2m⇥n
M
2m⇥n
DU V
>
ideally there are at most three non-zero singular values
n⇥nn⇥n

=
V
>
n⇥n n⇥n
2m⇥n
M
2m⇥n
DU V
>
ideally there are at most three non-zero singular values
3⇥3 3⇥3

=
V
>
n⇥n n⇥n
2m⇥n
M
2m⇥n
DU V
>
3⇥3
rank may exceed three due to noise
3⇥3
3⇥3

Enforce
rank 3
ˆ
M
⇤
= argmin
ˆ
M
k
ˆ
M"Mk
2
subject to rank(
ˆ
M)=3

Enforce
rank 3
ˆ
M
⇤
= argmin
ˆ
M
k
ˆ
M"Mk
2
subject to rank(
ˆ
M)=3
take SVD and set singular values past third to zero

3⇥3
=
V
>
M
2m⇥n
DU V
>
rank may exceed three due to noise
2m⇥n2m⇥n
n⇥nn⇥n

=
2m⇥n
M
2m⇥n
DU V
>
3⇥3
2m⇥n
n⇥nn⇥n

=
M
2m⇥n
DU V
>
3⇥3
2m⇥32m⇥n
n⇥nn⇥n

n⇥n
=
M
2m⇥n
DU V
>
3⇥3
2m⇥32m⇥n
n⇥n

=
M
2m⇥n
DU V
>
3⇥3 3⇥n
2m⇥32m⇥n
n⇥n

=
M
2m⇥n
DU V
>
3⇥3
2m⇥32m⇥n
3⇥nn⇥n

=
M R S
2m⇥n 2m⇥3
3⇥n
motion matrix
shape matrixmeasurement
matrix

=
M
2m⇥n
DU
3⇥3
2m⇥3
3⇥n
V
>

=
M
2m⇥n
U
2m⇥3
3⇥n
D
1
2
3⇥3
D
1
2
3⇥3
V
> D 3⇥3

U
2m⇥3
3⇥n
D
1
2
3⇥3
D
1
2
3⇥3
V
>

U
2m⇥3
3⇥n
D
1
2
3⇥3
D
1
2
3⇥3
V
>
motion matrix

U
2m⇥3
3⇥n
D
1
2
3⇥3
D
1
2
3⇥3
V
>
motion matrix
shape matrix

Step 3
Enforce Euclidean constraints

ˆ
Sˆ
R

ˆ
Sˆ
R
motion-shape decomposition is not unique

C C
!1
ˆ
R
ˆ
S
motion-shape decomposition is not unique

Cˆ
R

Cˆ
R
enforce Euclidean constraints on the motion matrix

0
@R
0
@
X
Y
Z
1
A+t
1
A
(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
✓
x
y
◆
=
✓
100
010
◆
0
@
X
Y
Z
1
A
✓
x
y
◆
=
✓
100
010
◆
0
@
X
Y
Z
1
A
=
✓
a11a12a13
a21a22a23
◆
0
@
X
Y
Z
1
A+t

0
@R
0
@
X
Y
Z
1
A+t
1
A
(x, y, z)
>
(x
0
,y
0
,z
0
)
>
(x, y)
>
(x
0
,y
0
)
>
(x
00
,y
00
)
>
(x
00
,y
00
,z
00
)
>
✓
x
y
◆
=
✓
100
010
◆
0
@
X
Y
Z
1
A
✓
x
y
◆
=
✓
100
010
◆
0
@
X
Y
Z
1
A
=
✓
a11a12a13
a21a22a23
◆
0
@
X
Y
Z
1
A+t
rows of matrix are unit vectors and orthogonal

Cˆ
R
enforce Euclidean constraints on the motion matrix

a
>
i,1CC
>
ai,1=1
a
>
i,2CC
>
ai,2=1

a
>
i,1CC
>
ai,1=1
a
>
i,2CC
>
ai,2=1
unit vector constraint for each camera row

a
>
i,1CC
>
ai,1=1
a
>
i,2CC
>
ai,2=1
a
>
i,1CC
>
ai,2=0

a
>
i,1CC
>
ai,1=1
a
>
i,2CC
>
ai,2=1
a
>
i,1CC
>
ai,2=0
orthogonality constraint between a camera’s rows

three constraints per camera
a
>
i,1CC
>
ai,1=1
a
>
i,2CC
>
ai,2=1
a
>
i,1CC
>
ai,2=0

L=CC
>
solve linear equations for
a
>
i,1CC
>
ai,1=1
a
>
i,2CC
>
ai,2=1
a
>
i,1CC
>
ai,2=0

recover C from L using Cholesky decomposition
a
>
i,1CC
>
ai,1=1
a
>
i,2CC
>
ai,2=1
a
>
i,1CC
>
ai,2=0

R=
ˆ
RC S=C
!1ˆ
S
Euclidean
upgrade

tl;dl

M

M
take SVD of measurement matrix

=
M DU V
>

=
M DU V
>
extract rank three related matrix components

D
1
2D
1
2
=
M DU V
>

D
1
2D
1
2
=
M DU V
>
create motion and shape matrices

U
D
1
2 D
1
2
V
>

U
D
1
2 D
1
2
V
>
motion matrix

U
D
1
2 D
1
2
V
>
motion matrix
shape matrix

U
D
1
2 D
1
2
V
>
motion matrix
shape matrix
remove affine ambiguity with Euclidean constraints

Reconstruction result

Tomasi and Kanade, 1992

tracked points

reconstruction by factorization

Tomasi and Kanade - Structure from Motion

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Tomasi and Kanade - Structure from Motion

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77