CVPR 2012 Review Seminar - Multi-View Hair Capture using Orientation Fields

dukecyto 2,649 views 29 slides Jun 22, 2012
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

No description available for this slideshow.


Slide Content

MULTI-VIEW HAIR CAPTURE
USING ORIENTATION FIELDS
LINJIE LUO, HAO LI, SYLVAIN PARIS, THIBAUT WEISE, MARK PAULY, SZYMON RUSINKIEWICZ

PROCEEDINGS OF THE 25TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION AND
PATTERN RECOGNITION – CVPR 2012
CVPR 2012 Review Seminar
2012/06/23
Jun Saito @dukecyto

Objective
Build 3D hair geometry from a small number
of photographs

Related Work
Paris, S. et al. “Hair photobooth: geometric and photometric
acquisition of real hairstyles.” SIGGRAPH 2008.

Requires thousands of images for a single reconstruction

Related Work
Jakob, W., & Moon, J. “Capturing hair assemblies fiber by
fiber.” ACM SIGGRAPH Asia 2009.


Capture individual hair strands using focal sweep
Photographed Rendered

Contributions
Passive multi-view stereo approach capable of
reconstructing finely detailed hair geometry
Robust matching criterion based on the local
orientation of hair
Aggregation scheme to gather local evidence while
taking hair structure into account
Progressive template fitting procedure to fuse
multiple depth maps
Quantitative evaluation of our acquisition system

System Overview
•  Robotic camera gantry w/ Canon EOS 5D Mark II

System Overview
• Each 4 views grouped into a cluster to construct partial
depth maps

System Overview
• Multi-resolution orientation fields computed

System Overview
• Partial depth map constructed by minimizing MRF
framework with graph cuts, along with structure-aware
aggregation and depth map refinement to improve quality

System Overview
• Partial depth maps from different views are merged
together

2. Local Hair Orientation
• Filter bank of many (e.g. 180) oriented Difference-of-
Gaussian
DoG graph from http://fourier.eng.hmc.edu/e161/lectures/gradient/node11.html
Oriented DoG

2. Local Hair Orientation
Orientation

Map orientation to complex domain

Maximum response


Orientation field
!!,! =argmax
!
!
!
∗!!,! !!!!!
!
:oriented !filters
!!,! =max
!
!
!
∗!!,!
!
!!!!!!!!!!!!
!
:oriented !filters
!!,! =exp!(2!"!,!)

2. Local Hair Orientation
• Multi-resolution pyramid of orientation fields
Coarse Fine

3. Partial Geometry Construction
• MRF (Markov Random Field) energy formulation
!! =!
!
! +!!
!
!
y
i:
: noisy image

x
i
: denoised image

Global minimization approximation
by graph cuts with α expansion
Image from Patter Recognition and Machine Learning

3. Partial Geometry Construction
Approximate global minimization using graph cuts
• Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast
approximate energy minimization via graph cuts. IEEE
Transactions on Pattern Analysis and Machine
Intelligence, 23(11), 1222–1239.
• GAG?GbGsG?GMGaGFGuG?q?'?G:G2GW1 '?2'v>?G>GwGcG9GQGV

3. Partial Geometry Construction
• MRF (Markov Random Field) energy formulation
• Smoothness term
!! =!
!
! +!!
!
!

3. Partial Geometry Construction
• MRF (Markov Random Field) energy formulation
• Smoothness term
!! =!
!
! +!!
!
!
!
!
! =
!∈!"#$%&
!
!
!,!
!
!!−!!
! !
!!∈!(!)

Depth continuity constraint between adjacent pixels p and p’

3. Partial Geometry Construction
• MRF (Markov Random Field) energy formulation
• Smoothness term
!! =!
!
! +!!
!
!
!
!
!,!
!
=exp!(−
!
!"#
!−!
!"#
!
! !
2!
!
!
)
Enforce strong depth continuity by Gaussian of orientation distance

3. Partial Geometry Construction
• MRF (Markov Random Field) energy formulation
• Data term
!! =!
!
! +!!
!
!

3. Partial Geometry Construction
• MRF (Markov Random Field) energy formulation
• Data term
!
!
! =
!∈!"#$%&
!
!
(!)
(!,!)
!∈!"#"!$

!! =!
!
! +!!
!
!
!
!
!
!,! = !
!
(!
!"#
!
!,!
!
(!)
!
!
!,! )
!∈!"#$%

orientation field at
level l of
reference view

orientation field at
level l of
adjacent view

Projection of 3D point of
depth map D at pixel p
onto view v

3. Partial Geometry Construction
• MRF (Markov Random Field) energy formulation
• Data term
!
!
! =
!∈!"#$%&
!
!
(!)
(!,!)
!∈!"#"!$

!! =!
!
! +!!
!
!
!
!
!
!,! = !
!
(!
!"#
!
!,!
!
(!)
!
!
!,! )
!∈!"#$%

Cost function to measure
deviation of orientation fields
exp(..) is for influence of camera pair’s
different tilting angles

3. Partial Geometry Construction
• Structure-Aware Aggregation
• Before summing up data term, guided filtering is applied on each
level l based on orientation field
!
(!)
!,!
!
=
1
!
!
1+
ℜ{!!−!
!

!!
!
−!
!
}
!
!
!
+!
!:(!,!
!
)∈!
!

|ω| : # of pixels in window
ε : structure awareness
µ
k
: average of orientation
σ
k
: standard deviation of orientation

3. Partial Geometry Construction
• Sub-pixel depth map refinement
• Similar to T. Beeler et al. “High-quality single-shot capture of facial
geometry.” ACM ToG., 29(4)

3. Partial Geometry Construction
•  Aggregation and refinement results
No refinementWith refinementWith refinement
and aggregation

4. Final Geometry Reconstruction
• Merge depths from multiple views by
• Kazhdan, M. et al. “Poisson surface reconstruction.” SGP06
• Li, H. et al. “Robust single-view geometry and motion
reconstruction.” SIGGRAPH Asia 2009.

5. Evaluation
• Quantitative evaluation
using synthetic data
• (a) (f): Synthetic data
• (b): This method
• (c): (a) overlaid on (b)
• (d): Difference between (a)
and (b) is on the order of
millimeters
• (g): PMVS + Poisson
• (h): T. Beeler et al. “High-
quality single-shot capture
of facial geometry.” ACM
ToG., 29(4)
• (i) This method
Figure 7:Reconstruction results on different levels. From
left to right the resolution of the depth map increases from
0.4M to 1.5M and 6M pixels, respectively.
sitions in a T-pose: center, left, right and bottom. Each
position is 10 degrees apart from the neighboring position
in terms of gantry arm rotation. The left and right cam-
eras in the T-pose provide balanced coverage with respect
to the center reference camera. Since our system employs
orientation-based stereo, matching will fail for horizontal
hair strands (more specifically, strands parallel to epipolar
lines). To address this problem, a bottom camera is added
to extend the stereo baselines and prevent the “orientation
blindness” for horizontal strands.
We use 8 groups of 32 views for all examples in this
paper. Three of these groups are in the upper hemisphere,
while a further five are positioned in a ring configuration
on the middle horizontal plane, as shown in Figure2.We
calibrate the camera positions with a checkerboard pat-
tern [19], then perform foreground-background segmenta-
tion by background color thresholding combined with a
small amount of additional manual keying. A large area
light source was used for these datasets.
Qualitative EvaluationThe top two rows of Figure11
show reconstructions for two different hairstyles, demon-
strating that our method can accommodate a variety of
hairstyles — straight to curly — and handle various hair col-
ors. We also compare our results on these datasets with
[4] and [7] in Figure6. Note the significant details present
in our reconstructions: though we do not claim to per-
form reconstruction at the level of individual hair strands,
small groups of hair are clearly visible thanks to our
structure-aware aggregation and detail-preserving merging
algorithms.
In Figure7and Figure8, we show how our reconstruc-
tion algorithm scales with higher resolution input and more
camera views. Higher resolution and more views greatly
increase the detail revealed in the reconstructed results.
Quantitative EvaluationTo evaluate our reconstruction
accuracy quantitatively, we hired a 3D artist to manually
create a highly detailed hair model as our ground truth
model. We then rendered 8 groups of 32 images of this
model with the same camera configuration as in the real
capture session. We ran our algorithm on the images and
compared the depth maps of our reconstruction and the
Figure 8:Comparison between the depth map recon-
structed with 2, 3, 4 cameras.
ground truth model from the same viewpoints. The results
are shown in Figure9. On average, the distance between our
result and the ground truth model is 5 mm, and the median
distance is 3 mm. We also ran a state-of-the-art multi-view
algorithm [4,7,1] on the synthetic dataset, and the statistics
of its numerical accuracy are similar to ours. However, as
shown in Figure9, their visual appearance is a lot worse
with the presence of blobs and spurious discontinuities.
TimingsOur algorithm performs favorably in terms of ef-
ficiency. On a single thread of a Core i7 2.3GHz CPU, each
(a) (b) (c) (d) (e)
(f) (g) (h) (i)
Figure 9:We evaluate the accuracy of our approach by
running it on synthetic data (a), (f). The result is shown
in (b), and is overlaid to the synthetic 3D model in (c). The
difference between our reconstruction in the ground-truth
3D model is on the order of a few millimeters (d). We show
a horizontal slice of the depth map in (e): the ground-truth
strands are shown in red and our reconstruction result in
blue. Compared to PMVS + Poisson[4,7](g) and[1](h),
our reconstruction result (i) is more stable and accurate.

Dynamic Hair Capture
• Being completely passive,
this method is applicable to
dynamic hair capture
• Capture setup:
• 4 high-speed cameras
• 640 x 480, 100fps
• Lower quality due to low
resolution of high-speed
cameras

Conclusions
• Qualitative evaluation shows that passive, multi-view
construction of hair geometry based on multi-resolution
orientation fields achieves accurate measurements
• Combined with structure-aware aggregation, this method
achieves superior quality compared to other methods
• This method can be applied to capturing hair in motion

Latest Related Work
• Chai M. et al. “Single-View Hair Modeling for Portrait
Manipulation.” To appear in ACM TOG 31(4), to be
presented at SIGGRAPH 2012.