Module 1 Chapter1_Computer vision VTU_syllabus.pdf

rswbec 42 views 68 slides Mar 12, 2025
Slide 1
Slide 1 of 68
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68

About This Presentation

Computer Vision Basics


Slide Content

COMPUTER VISION
(BCS613B)
Dr. Ramesh Wadawadagi
Associate Professor
Department of CSE
SVIT, Bengaluru
[email protected]

TEXT BOOK
1. Computer Vision: Algorithms
and Applications, Richard
Szeliski, (Module 1, & 2)
http://szeliski.org/Book/
2

TEXT BOOK
2. Digital Image Processing,
Pearson, Rafael C G., Woods R
E. and Eddins S L, 4th edition,
2019. (Module 3, 4, & 5)
3

PRE-REQUISITES
1) Linear Algebra
2) Vector Calculus
3) Computer Graphics
4) Algorithms & Programming
4

COURSE OBJECTIVES
CLO1: To understand the fundamentals of computer vision and digital
image processing
CLO2: To introduce the processes involved image enhancement and
restoration.
CLO3: To facilitate the students to gain understanding color image
processing and morphology.
CLO5: To impart the knowledge of image segmentation and object
recognition techniques.
5

COURSE OUTCOMES
At the end of the course, the student will be able to :
1. Explain the fundamentals of computer vision and its applications.
2. Apply the image enhancement techniques for smoothing and
sharpening of images.
3. Compare the different image restoration and segmentation
techniques.
4. Demonstrate the smoothing and sharpening techniques for color
images.
5. Explain morphological, feature extraction, and pattern classification
techniques for object recognition.
6

A PICTURE IS WORTH A THOUSAND WORDS
7

EVERY IMAGE TELLS A STORY
8

Traffic scene

Number of vehicles

Type of vehicles

Location of closest
obstacle

Assessment of
congestion

Location of the scene
captured

Traffic rules assessment

Critical scene
identification

Emotion recognition

GOAL OF COMPUTER VISION
9

Is to perceive the “story” behind the picture

Compute properties of the world

Understand 3D shape

Recognize people or objects

Analyze patterns

Count objects and assign labels

Improve and enhancement

WHAT IS COMPUTER VISION?
1. Computer Graphics:
INPUT: Scene RepresentationOUTPUT: Image
2. Image Processing:
INPUT: Image OUTPUT: Image
Image enhancement, reconstruction, filter, compression
3. Computer Vision:
INPUT: Image OUTPUT: Interpretation
Image analysis, Image interpretation, scene understanding
10

Computer Vision is a field of computer science that
works on enabling computers to see, identify and
process images in the same way that human vision
does and then provide appropriate output.

Building machines that see

Want machines to interact with the real world

Modeling biological perception of human eye
11
WHAT IS COMPUTER VISION?

12
WHAT IS COMPUTER VISION?

VISION
Vision is the process of discovering what is
present in the world and where it is by seeing.
13

COMPUTER VISION
Computer Vision is the study of analysis of
pictures and videos in order to interpret the real
world similar to those as by human beings.
14

COMPUTER VISION
15
Image

COMPUTER VISION VS GRAPHICS
16
CV reconstructs image properties, such as shape, illumination,
and color distributions.

17

WHY COMPUTER VISION?

An image is worth 1000 words

Many biological systems rely on vision

The world is 3D and dynamic

Cameras and computers are cheap


18

COMPUTER VISION EXAMPLE 1
Finding People in images
Problem1: Given an image ‘I’
Question: Does image ‘I’ contain an image of
a person?
19

”YES” INSTANCES
20

”NO” INSTANCES
21

COMPUTER VISION EXAMPLE 2
Finding People in images
Problem 2: Given an image ‘I’
Question: Recognize different objects in ‘I’.
22

23

24
sky
building
flag
wall
banner
bus
cars
bus
face
street lamp

HUMAN PERCEPTION HAS ITS SHORTCOMINGS
25

HUMAN PERCEPTION HAS ITS SHORTCOMINGS
26

HUMAN PERCEPTION HAS ITS SHORTCOMINGS
27

28

BUT HUMANS CAN TELL A LOT ABOUT A SCENE
FROM A LITTLE INFORMATION
29

APPLICATIONS OF COMPUTER VISION
30
A good number of computer vision
techniques are being used today in a
wide variety of real-world
applications, which include:
1. Real-life & Industry applications
2. Consumer-level applications

REAL-WORLD APPLICATIONS
31
Computer Vision is used in
many real-world applications,
including healthcare,
transportation, security, and
manufacturing.

OPTICAL CHARACTER RECOGNITION (OCR)
32
Reading handwritten
postal codes on letters
and automatic number
plate recognition
(ANPR) of vehicles.

MACHINE INSPECTION
33
Rapid parts inspection
for quality assurance
using stereo vision with
specialized illumination
to measure tolerances on
aircraft wings or auto
body parts or looking for
defects in steel castings
using X-ray vision.

RETAIL:
34
Object recognition for
automated checkout lanes and
fully automated stores.
Analyze visual data and
improve customer experience.

Inventory management

Theft prevention

Personalized marketing

Cashierless checkout

Virtual mirroring

Behavioral analytics

Heat maps

WAREHOUSE LOGISTICS:
35
Autonomous package
delivery and pallet-
carrying “drives” and
parts picking by
robotic manipulators.

MEDICAL IMAGING:
36
Registering pre-operative
and intra-operative imagery
or performing long-term
studies of people’s brain
morphology as they age.
Helps doctors diagnose,
monitor, and treat medical
conditions. Include X-rays,
CT scans, MRIs,
ultrasounds, and nuclear
medicine.

3D MODEL BUILDING (PHOTOGRAMMETRY):
37
Fully automated
construction of 3D
models from aerial and
drone photographs.
Transforming a 2D image
into a 3D space.

MATCH MOVE:
38
Merging computer-
generated imagery (CGI)
with live action footage
by tracking feature points
in the source image or
video to estimate the 3D
camera motion and shape
of the environment.

MOTION CAPTURE (MOCAP):
39
Using retro-reflective
markers viewed from
multiple cameras or
other vision-based
techniques to capture
actors for computer
animation.

SURVEILLANCE:
40
Monitoring for intruders,
analyzing highway traffic
and monitoring pools
for drowning victims.
Computer vision systems
interpret and analyze
video feeds in real-time,
significantly enhancing
security system
capabilities.

FINGERPRINT RECOGNITION AND BIOMETRICS:
41
For automatic access
authentication as well as
forensic applications.

FINGERPRINT RECOGNITION AND BIOMETRICS:
42

CONSUMER-LEVEL APPLICATIONS
43
In addition to all of these real-life and
industrial applications, there exist numerous
consumer-level applications, such as things
you can do with your own personal
photographs and video. These include:

IMAGE STITCHING (MOSAIC):
44
Turning overlapping photos into a single
seamlessly stitched panoramic view.

IMAGE STITCHING EXAMPLE:
45

EXPOSURE BRACKETING:
46
Exposure bracketing is where you take a sequence of
photographs with different exposure levels, and then blend
them together to create a photograph with a much higher
dynamic range.
Merging multiple exposures taken under challenging
lighting conditions (strong sunlight and shadows) into a
single perfectly exposed image.

EXPOSURE BRACKETING:
47

IMAGE MORPHING:
48
Turning a picture of one object into another,
using a seamless morph transition.

3D MODELING:
49
Converting one or more snapshots into a 3D
model of the object or person you are
photographing.

3D MODELING:
50

VIDEO MATCH MOVE AND STABILIZATION:
51
Inserting 2D pictures or 3D models into your
videos by automatically tracking nearby
reference points or using motion estimates to
remove shake from your videos.

PHOTO-BASED WALKTHROUGHS:
52
Navigating a large collection of photographs,
such as the interior of your house, by flying
between different photos in 3D.

FACE DETECTION & TAGGING:
53
For improved camera focusing as well as more
relevant image searching.

VISUAL AUTHENTICATION:
54
Automatically logging family members onto
your home computer as they sit down in front
of the webcam.

A BRIEF HISTORY (TIMELINE OF CV)

A BRIEF HISTORY: 1970’S.

Designed to model visual perception of robots with
intelligent behavior based on higher-level reasoning and
planning.

First program was written in 1966 by MIT undergraduate
student Gerald Jay Sussman to connect a camera to a
computer and get the computer to describe what it saw.

In 1976 digital image processing techniques came into
existance to extract the 3D structure of the world from
images and to use this as a stepping stone towards full
scene understanding.

Early attempts at scene understanding involved extracting
edges and then inferring the 3D structure of an object or a
“blocks world” from the topological structure of the 2D
lines (Fig. a).

A BRIEF HISTORY: 1970’S

A BRIEF HISTORY: 1970’S.

3D modeling of non-polyhedral objects was also being studied.

One popular approach used generalized cylinders, i.e., solids
of revolution and swept closed curves, often arranged into
related parts (Figure c).

Sometimes they are also called as elastic arrangements of parts
pictorial structures (Figure b).

A qualitative approach to understanding intensities and
shading variations and explaining them by the effects of image
formation phenomena, such as surface orientation and shadows
for intrinsic images are studied (Figure d).

More quantitative approaches to computer vision were also
developed at the time, including the first of many feature-based
stereo correspondence algorithms (Figure e).

Intensity-based optical flow algorithms are studied (Figure f).

A BRIEF HISTORY: 1980’S.

In the 1980s, a lot of attention was focused on more
sophisticated mathematical techniques for performing
quantitative image and scene analysis.

Image pyramids started being widely used to perform tasks
such as image blending (Figure a) and coarse-to-fine
correspondence search.

Continuous versions of pyramids using the concept of scale-
space processing were also developed.

In the late 1980s, wavelets started displacing or augmenting
regular image pyramids in some applications.

The use of stereo as a quantitative shape cue was extended by a
wide variety of shape-from-X techniques, including shape
from shading.

A BRIEF HISTORY: 1980’S

A BRIEF HISTORY: 1980’S.

Research into better edge and contour detection (Figure 1.8c)
was also active during this period.

Introduction of dynamically evolving contour trackers such as
snakes, as well as three-dimensional physically based models
are studied (Figure 1.8d).

Researchers noticed that a lot of the stereo, flow, shape-from-X,
and edge detection algorithms could be unified, or at least
described, using the same mathematical framework.

Online variants of MRF algorithms that modeled and updated
uncertainties using the Kalman filter were introduced.

3D range data processing (acquisition, merging, modeling,
and recognition; see Figure 1.8f) continued being actively
explored during this decade.

A BRIEF HISTORY: 1990’S.

Projective invariants for recognition of object
motion.

Simultaneously, factorization techniques.

Bundle adjustment techniques for photogrammetry.

Global optimization techniques.

Graph cuts

Multi-view stereo algorithms

Image tracking and reconstructing techniques.

Image segmentation

Statistical learning models

Computer Graphics

A BRIEF HISTORY: 1990’S.

A BRIEF HISTORY: 2000’S.

Image-based rendering.

Image stitching.

Light-field capture and rendering.

Exposure bracketing.

Tone mapping algorithms

Texture synthesis.

Feature-based techniques.

Machine Learning.

Multilayer perceptron.

Semantic segmentation.

A BRIEF HISTORY: 2000’S.

A BRIEF HISTORY: 2010’S.

Deep learning.

ImageNet, AlexNet etc.

GPU computing.

Simultaneous localization and mapping
(SLAM).

Augmented Reality/ Virtual Reality.

Visual inertial odometry.

Vision Transformers.

Vision Language models.

Generative AI.

A BRIEF HISTORY: 2010’S.

A TAXONOMY OF TOPICS.
Tags