Machine Learning for Computer Vision.pdf

kamranlodhi2 17 views 27 slides Oct 16, 2024
Slide 1
Slide 1 of 27
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27

About This Presentation

Machine Learning for Computer Vision


Slide Content

Introduction to Computer Vision
Quantitative Biomedical Imaging Group
Institute of Biomedical Engineering
Big Data Institute
University of Oxford
Jens Rittscher

Mathematics (with Computer Science)
University of Bonn, Germany

Mathematics – a universal language plays a role in many disciplines
questions / opportunities
EconomicsBiologyComputer Vision

Machine Learning for Computer Vision
What is Computer Vision?
otrain machines to interpret the visual world
oanalyse what objects are in an image
odetect specific objects of interest
Edge Detection
Image Segmentation
Classification
Visual Motion
Course Themes

Jens Rittscher

Institute of Biomedical
Engineering & Big Data Institute
University of Oxford
5
GE
-Global ResearchNiskayuna, NY
University of
Oxford
DPhil -Engineering Science
(Computer Vision)
Title: Recognising Human Motion
Senior Scientist
Computer Vision and
Visualisation
Project Leader
Biomedical Imaging
Manager
Computer Vision
Senior Research Fellow (IBME)
Group Leader (TDI)
Adjunct Member(LICR)
2000
2005
2013
Professor of Engineering Science
Cell Tracking
`Zebrafish Imaging
Computational Pathology
Re-identification
Group Segmentation
U Oxford
Tissue Imaging
Endoscopy
length of “tongues” of BE, rather than the total length above
the GEJ.
Thus, the grading system defined by the working group to
improve the recognition of and reporting of gastroesophageal
landmarks and endoscopically recognized BE included the C &
M extent of endoscopically recognized BE, GEJ, SCJ, and dia-
phragmatic hiatus (Figure 2).Figures 3and4show theC&M
extents of endoscopically recognized BE, with C!2 cm and M
!5 cm, giving a classification of C2M5.
Initial Validation of the Classification System:
Internal Study
The grading system was validated initially by a panel of
5 members of the working group, who assessed a selection of 50
video clips. The video clips were viewed in random order. The
internal assessment produced reliability coefficients of 0.91 for
C and 0.66 for M. This correlates to an “almost perfect” level of
reliability for C and “substantial” reliability for M (Table 2).
One assessor misinterpreted M as being the “tongue” length,
and, if the results from this assessor were excluded, the reliabil-
ity coefficients were 0.94 for C and 0.88 for M. There were only
minimal differences between the reliability coefficients for
push-only and pull-only endoscopic procedures (Table 2), indi-
cating that these criteria could be used either during endoscope
insertion or toward the completion of endoscopic procedure, ie,
withdrawal.
Validation of the Classification System:
External Study
Of the 29 external assessors invited to participate in the
analysis, 22 submitted complete data forC & M values for the
selection of the 29 video clips selected for this study. One
observer assessed only 1 video clip, and these data were ex-
cluded from analysis. Moreover, 9 observers had at least once
recorded an M value that was numerically smaller than the C
value on the same clip (the M value should always be!C value).
In these situations, the M value was replaced with the C value.
The distribution of meanC&M assessments of the 29 video
clips is presented inTable 3. Almost half of the C assessments
but only 5 of the M assessments were less than 0.5 cm.
The overall reliability coefficients from the external assess-
ment were 0.94 for C and 0.93 for M, representing an “almost
perfect” level of reliability for both. Using theC & M criteria,
assessors were able to agree on the presence of endoscopic BE
greater than 1 cm in length with substantial reliability (RC!
0.72). The recognition of endoscopic BE"1 cm in length was
only slightly reliable (RC!0.21), making the recognition of
endoscopic BE of any length moderately reliable (RC!0.49).
The assessors were able to recognize the proximal margin of the
gastric folds and the diaphragmatic hiatus with almost perfect
reliability (RC!0.88 and 0.85, respectively). When calculating
percentage agreement, each observer was compared with every
other observer. For such pairwise assessment, there were a total
of 6699 comparisons from the 29 video clips. Of these compar-
isons for C & M values, the exact rates of agreement were 53%
and 38%, respectively. The comparisons differed at most by 1
cm in 88% and 82% and differed at most by 2 cm in 97% and
95% of the C & Mvalues, respectively. The detailed breakdown
of results from the external assessment by length of BE and
reliability coefficients for recognizing the position of gastro-
esophageal landmarks are presented inTables 4 – 6.
There were no observers that recorded extreme values, ie,
consistently the highest or lowest recordings. The observer with
the highest number of extreme recordings had, out of the 29
clips, 3 highest recordings on C and 4 highest recordings on M.
The results did not change when this observer was excluded
from the analysis.
Discussion
At present, standardized, validated criteria for the en-
doscopic description of BE are not routinely used. Endoscopists
currently adopt a loose classification system, defining endo-
scopic segments of BE as “long,” “short,” or “ultra-short,” with-
Figure 4.Video still of endoscopic Barrett’s esophagus showing an
area classified asC2M5.C: extent of circumferential metaplasia;M:
maximal extent of the metaplasia (C plus a distal “tongue” of 3 cm).
Table 2.Reliability Coefficients for the Initial Validation of
the Classification System: Internal Study
All endoscopies
(push or pull)
Push-only
endoscopy
Pull-only
endoscopy
Circumferential extent (C) 0.91 0.93 0.91
(0.94)
a
(0.94)
a
(0.94)
a
Maximal extent (M) 0.66 0.65 0.67
(0.88)
a
(0.96)
a
(0.81)
a
a
Reliability coefficient if the results from 1 of the 5 internal assessors,
who did not understand the “M” classification, are not included in the
analysis.
Table 3.Number of Video Clips WithC&MAssessments
in Relationship to the Length of the BE Segment
Estimated BE length
Number of video clips
(C value)
Number of video clips
(M value)
0.0 to"0.5 cm 14 5
0.5 to"1.0 cm 4 2
1.0 to"3.0 cm 4 11
3.0 to"5.0 cm 2 4
!5.0 cm 5 7
CLINICAL–
ALIMENTARY TRACT
1396 SHARMA ET AL GASTROENTEROLOGY Vol. 131, No. 5

•Learn image processing & machine learning
techniques in the context of a concrete application
setting
•Gain experience in working with images and the
application of machine learning models
Machine Learning for Computer Vision
Lectures
Exercises
Course Components

Data Science
Theory
You have a strong
background in
mathematics and statistics
and like to apply the
methods to real-world
problems.
Practice
You have the necessary
practical programming
skills to implement your
ideas and work on large
data sets.
Context
You have a strong interest
or background knowledge
in a particular scientific
field that excites you.

Structure of the course
Feature ExtractionImage SegmentationObject Detection
Traditional Computer Vision
Revisiting Computer Vision with Deep Learning
Object DetectionSemantic Segmentation
Machine Learning
Deep Learning
Motion & Tracking

Course structure
UnitCore TopicsLectures & Exercises
Day 1Introduction, representation of digital images,
filtering, feature extraction
Lectures 1, 2
Day 2Image segmentation Lectures 3, 4
Exercises 1, 2, (3)
Day 3Machine learning (part 1)
Discussion of exercise sheet 1
Lecture 5
Day 4Machine learning (part 2)
Object detection
Lectures 6, 7
Exercises 3, 4, (5)
Day 5Deep learning elements
Discussion of exercise sheet 1
Lecture 8

Course structure
UnitCore TopicsLectures & Exercises
Day 6Deep learning detection
Deep learning segmentation
Lectures 9, 10
Exercises 6, 7, (8)
Day 7Autoencoders
Discussion of exercise sheet 3
Lecture 11
Day 8Video processing
Visual tracking
Lectures 12, 13
Exercises 9, 10, (11)
Day 9Application and translation of AI
Discussion of exercise sheet 4
Lecture 14
Day 10Research Talk

The exercises are a fundamental part of the course. They are important
as they help you to understand the course material in more depth.
They will cover the following aspects:
• Understanding of the core methods
• Help to apply the concepts in practice
• Provide direction for additional study
The points from the exercises are account for 30% of the final grade.
Exercises 3, 6, 9, 12, 18 are optional.
Exercises

Programming and software

Python libraries
We advise to work with the Anaconda distribution that is based on Python 3.x. Using the conda installer is it possible to install missing packages
•Numpy
•Scikit-image (http://scikit-image.org/)
•Scikit-learn (http://scikit-learn.org/)
•OpenCV (http://opencv.org/) – not required for the exercises
•pyDICOM
For medical image processing:
•SimpleITK (http://www.simpleitk.org/)

You will find a set of Python
notebooks on github.
You can copy these onto your local
computer or run them online.
Your Python setup

Python example

Literature

Some history …

•Computer vision started in the late 1960s in groups that pioneered
artificial intelligence
•The goal was to build machines and systems that could ‘see’, i.e.
interpret the visual word
•As such the field has very close links with robotics
Computer Vision

•A seminal book that describes a
general framework for understanding
visual perception
•Reconstructing the scene from a set of
primitives (lines, simple geometric
structures) is a central theme
David Marr - Vision

Takeo Kanade
Contributions to computer vision and robotics in over 50 years

PhD Thesis 1974
Kyoto, Japan
Neural Network Based
Face Detection
H. A.Rowley, S. Baluja, T. Kanade
CVPR 1996
Input
Network
Output
subsampling
Preprocessing Neural network
pixels
20 by 20
Extracted windowInput image pyramid
(20 by 20 pixels)
Correct lightingHistogram equalization Receptive fields
Hidden units
variationacrosstheface.Thelinearfunctionwillapprox-
imatetheoverallbrightnessofeachpartofthewindow,
andcanbesubtractedfromthewindowtocompensatefora
varietyoflightingconditions.Thenhistogramequalization
isperformed,whichnon-linearlymapstheintensityvalues
toexpandtherangeofintensitiesinthewindow.Thehis-
togramiscomputedforpixelsinsideanovalregioninthe
window.Thiscompensatesfordifferencesincamerainput
gains,andimprovesthecontrastinsomecases.
Thepreprocessedwindowisthenpassedthroughaneural
network.Thenetworkhasretinalconnectionstoitsinput
layer;thereceptivefieldsofhiddenunitsareshownin
Figure1.Therearethreetypesofhiddenunits:4which
lookat10x10pixelsubregions,16whichlookat5x5pixel
subregions,and6whichlookatoverlapping20x5pixel
horizontalstripesofpixels.Eachofthesetypeswaschosen
toallowthehiddenunitstorepresentlocalizedfeaturesthat
mightbeimportantforfacedetection.Althoughthefigure
showsasinglehiddenunitforeachsubregionoftheinput,
theseunitscanbereplicated.Fortheexperimentswhich
aredescribedlater,weusenetworkswithtwoandthreesets
ofthesehiddenunits.Similarinputconnectionpatternsare
commonlyusedinspeechandcharacterrecognitiontasks
[Waibeletal.,1989,LeCunetal.,1989].Thenetworkhas
asingle,real-valuedoutput,whichindicateswhetherornot
thewindowcontainsaface.
Totraintheneuralnetworkusedinstageonetoserveasan
accuratefilter,alargenumberoffaceandnon-faceimages
areneeded.Nearly1050faceexamplesweregathered
fromfacedatabasesatCMUandHarvard
2
.Theimages
containedfacesofvarioussizes,orientations,positions,
andintensities.Theeyesandthecenteroftheupperlip
ofeachfacewerelocatedmanually,andthesepointswere
usedtonormalizeeachfacetothesamescale,orientation,
2
Dr.WoodwardYangatHarvardprovidedover400mug-shotimages.
andposition,asfollows:
1.Rotateimagesobotheyesappearonahorizontalline.
2.Scaleimagesothedistancefromthepointbetween
theeyestotheupperlipis12pixels.
3.Extracta20x20pixelregion,centered1pixelabove
thepointbetweentheeyesandtheupperlip.
Inthetrainingset,15faceexamplesaregeneratedfromeach
originalimage,byrandomlyrotatingtheimages(abouttheir
centerpoints)upto10,scalingbetween90%and110%,
translatinguptohalfapixel,andmirroring.Each20x20
windowinthesetisthenpreprocessed(byapplyinglighting
correctionandhistogramequalization).Therandomization
givesthefilterinvariancetotranslationsoflessthanapixel
andscalingsof10%.Largerchangesintranslationand
scalearedealtwithbyapplyingthefilterateverypixel
positioninanimagepyramid,inwhichtheimagesare
scaledbyfactorsof1.2.
Practicallyanyimagecanserveasanon-faceexample
becausethespaceofnon-faceimagesismuchlargerthan
thespaceoffaceimages.However,collectingsmallyeta
“representative”setofnon-facesisdifficult.Insteadofcol-
lectingtheimagesbeforetrainingisstarted,theimagesare
collectedduringtraininginthefollowingmanner,adapted
from[SungandPoggio,1994]:
1.Createaninitialsetofnon-faceimagesbygenerating
1000imageswithrandompixelintensities.Applythe
preprocessingstepstoeachoftheseimages.
2.Traintheneuralnetworktoproduceanoutputof1for
thefaceexamples,and-1forthenon-faceexamples.
Thetrainingalgorithmisstandarderrorbackpropoga-
tion.Onthefirstiterationofthisloop,thenetwork’s
weightsareinitiallyrandom.Afterthefirstiteration,
weusetheweightscomputedbytrainingintheprevi-
ousiterationasthestartingpointfortraining.
A Statistical Approach to 3D Object Detection
Applied to Faces and Cars
H. Scheiderman and T. Tanade
2000
25
quencies. Each subsequent level represents a higher octave of frequencies. In terms of spatial
extent, a coefficient in level 1 describes four times the area of a coefficient in level 2, which
describes four times the area of a coefficient in level 3. In Figure 17 we illustrate these relation-
ships.
In terms of orientation, the LH bands are the result of low-pass filtering in the horizontal direc-
tion and high pass filtering in the vertical direction giving horizontal features. Similarly, HL rep-
resents vertical features.
Level 3
LH
Level 3
HH
Level 3
HL
Level 2
LH
Level 2
HL
Level 2
HH
L1
HL
L1
LH
L1
HH
L1
LL
Figure 15. Wavelet representation of an image
Figure 16. Images and their wavelet transforms.
Note: the wavelet coefficients are each quantized to five values.

Structure from motion
C Tomasi and T. Kanade, 1991
Feature tracking
B.D. Lucas and T. Kanade 1981
C. Tomasi and T. Kanade 1991
If we now partition the matrices L, E, and R as follows:
E
R
I
I
>*l
L" ] }2F
" E' 0
I
I
0
3
E"
}/»-3
I
I
' R! '
R"
p
}3
-3 '
we have
LSR = L'E'R! +L"E"R!
f
.
Let be the ideal measurement matrix, that is, the matrix we would obtain in
the absence of noise. Because of the rank principle, the non-zero singular values
of W* are at most three. Since the singular values in E are sorted in non-increasing
order, 17 must contain all the singular values of W* that exceed the noise level.
As a consequence, the term ¿"17"/?" must be due entirely to noise, and the product
L'E'R! is the best possible rank-3 approximation to W*.
We can now restate our key point.
The Rank Principle for Noisy Measurements
All the shape and motion information in W is contained in its three
greatest singular values, together with the corresponding left and
right eigenvectors.
Thus, the best possible approximations to the ideal measurement matrix W is
the product
W = L'E'R!
12
•Registration can be achieved through a local search
using gradients
•Tracking is improved by selecting which features
should be tracked

•The robotic system controls 30 cameras based
on the operator controlled master camera
•The feeds from 30 cameras are blended into
one dynamic panorama
In collaboration with CBS & Princeton Video Imaging

Image Guided Navigation System
to Measure Intraoperatively
Acetabular Implant Alignment
1998
our registration research is the intelligent selection of these intra-operative data points
in a manner which maximizes registration accuracy while minimizing the quantity of
data [13].
The registration process is illustrated in Fig. 6. The pelvic surface model was construct-
ed from CT data using techniques similar to those described in [4]. The discrete points
were collected using an Optotrak digitizing probe which was physically touched to the
indicated points. The goal of the process is to determine a “registration transformation”
which best aligns the discrete points with the surface model. An initial estimate of this
transformation is first determined using manually specified anatomical landmarks to
perform corresponding point registration [6]. Once this initial estimate is determined,
the surface-based registration algorithm described in [15] uses the pre- and intra-oper-
ative data to refine the initial transformation estimate.
Once the location of the pelvis is determined via registration, navigational feedback can
be provided to the surgeon on a television monitor, as seen in Fig. 7. This feedback is
used by the surgeon to accurately position the acetabular implant within the acetabular
cavity. To align the cup within the acetabulum in the placement determined by the pre-
operative plan, the cross-hairs representing the tip of the implant and the top of the han-
dle must be aligned at the fixed cross hair in the center of the image. Once aligned, the
implant is in the pre-operatively planned orientation.
Fig. 6.Surface-based registration.
Fig. 7.Navigational feedback. Fig. 8.Real-time tracking of the pelvis.
Foundation of Quality of
Life Technology Center
CMU, 2208
Understanding the Phase Contrast Optics
to Restore Artifact-free Microscopy
Images for Segmentation
MICCAI 2012

Link to the youtube lecture.
Takeo Kanade’s Kyoto Prize lecture

•Lectures and attendance
•20% attendance
•20% exercises
•Examination
•30% mid-term exam
•30% mid-term exam
Course evaluation

•Form a study group of four students – in the second half of the course
we will have a small challenge and you will have to work as a team
•Every week we will devote 15 min of time to answer questions. In the
entire course we expect each group to prepare 3 questions. Please
submit these questions to the coordinator.
•In week 5 we will have a revision class. Each group should submit one
question/topic they like to revise
Group working & participation
Tags