COMPUTER VISION (Module-1 Chapter-2)
Dr. Ramesh Wadawadagi
Associate Professor
Department of CSE
SVIT, Bengaluru-560064 [email protected]
Image formation
Photometric Image Formation
Image formation is the process in which 3D scene
points are projected into 2D image plane locations, both
geometrically and optically.
It involves two parts:
1)The first part is the geometry that determines where in
the image plane the projection of a scene point will be
located (Spatial properties).
2)The other part of image formation, related with
radiometry, measures the brightness of a point in the
image plane as a function of illumination and surface
properties.
Photometric Image Formation
Components of the image formation process:
(a) Perspective projection
(b) Light scattering when hitting a surface
(c) Lens optics
(d) Bayer color filter array
Photometric image formation
●In modeling the image formation process, 3D
geometric features in the world are projected into 2D
features in an image.
●However, projected images are not composed of 2D
features.
●Images are made up of discrete color or intensity
values.
●How do images relate to the lighting in the
environment, surface properties and geometry, camera
optics, and sensor properties.
●In this section, we develop a set of models to describe
these interactions and formulate a generative process
of image formation.
Photometric image formation
1. Lighting
●To produce an image, the scene must be illuminated with
one or more light sources.
●Light sources can generally be divided into point light and
area light sources.
●A point light source originates at a single location in
space (e.g., a small light bulb), potentially at infinity (e.g.,
the Sun).
●In addition to its location, a point light source has an
intensity and a color spectrum, i.e., a distribution over
wavelengths L(λ).
●The intensity of a light source falls off with the square of
the distance between the source and the object being lit,
because the same light is being spread over a larger
(spherical) area.
1. Lighting cont...
●Area light sources are more complicated.
●A simple area light source such as a fluorescent ceiling
light fixture with a diffuser can be modeled as a finite
rectangular area emitting light equally in all directions.
●An Environment map or a Reflection map is used to
represent incident light directions v̂ to color values (or
wavelengths, λ), as L( ; λ)
v̂
.
2. Reflectance and shading
●When light hits an object’s surface, it is scattered and
reflected.
●Many different models have been developed to describe
this interaction.
●The most general model of light scattering is the
bidirectional reflectance distribution function (BRDF).
●BRDF is a four-dimensional (4D) function that describes
how much of each wavelength arriving at an incident
direction v̂i is emitted in a reflected direction v̂r
●This function can be written in terms of the angles of the
incident and reflected directions relative to the surface
frame as fr(θi , φi , θr , φr ; λ).
2. Reflectance and shading cont...
2. Reflectance and shading
●The BRDF is reciprocal, i.e., because of the physics of
light transport, you can interchange the roles of v̂i and vr
̂
and still get the same answer.
●To calculate the amount of light exiting a surface point p in
a direction v̂r under a given lighting condition, we integrate
the product of the incoming light Li(vi
̂
; λ) with the BRDF.
●Taking into account the foreshortening factor cos
+
θi, we
obtain:
2. Reflectance and shading cont...
●If the light sources are discrete (a finite number of point
light sources), we can replace the integral with a
summation,
2.1. Diffuse reflection
●Diffuse reflection is a fundamental process in illuminating
the 3D objects in a scene. It plays a significant role in
objects' perceived depth and texture.
●The diffuse component (also known as Lambertian or
Matte reflection) scatters light uniformly in all directions.
●The angle at which the light reflects off the surface is
determined by the surface’s normal, which is a vector
perpendicular to the surface.
2.1. Diffuse reflection cont...
●While light is scattered uniformly in all directions, i.e., the
BRDF is constant.
●The amount of light depends on the angle between the
incident light direction and the surface normal θi.
●This is because the surface area exposed to a given amount
of light becomes larger at oblique angles.
●Hence, becomes completely self-shadowed as the outgoing
surface normal points away from the light.
2.1. Diffuse reflection cont...
●The shading equation for diffuse reflection can thus be
written as:
2.2. Specular reflection
●The second major component of a typical BRDF is
specular (gloss or highlight) reflection, which depends
strongly on the direction of the outgoing light.
●Incident light rays are reflected in a direction that is rotated
by 180° around the surface normal n̂.
●We can compute the specular reflection direction ŝi as:
The amount of light reflected in a given direction vr
̂
thus
depends on the angle θs = cos (vr
̂
· ŝi) between the view
direction v̂r and the specular direction ŝi .
2.2. Specular reflection
3. Lens Optics (lens model)
●Once the light from a 3D scene reaches the camera, it must
pass through the lens before reaching digital sensor.
●The study of optics include issues such as focus, exposure,
vignetting, and aberration.
●Figure below shows a diagram of the most basic lens
model.
●The thin lens composed of a single piece of glass with very
low, equal curvature on both sides.
From the diagram
●Zo = Distance between the lens and the object
●Zi = Distance between the focused image and lens
●d = Lens aperture diameter
●f = Focal length
●W = Sensor width
●c = Circle of confusion
●∆zi = Distance between circle of confusion and focal plane
●The relationship between the distance to an object zo and
the distance behind the lens at which a focused image is
formed zi can be expressed as:
Understanding focal length
Lens laws
●If we let zo →∞, i.e., we adjust the lens (move the image
plane) so that objects at infinity are in focus, we get zi = f.
●If the focal plane is moved away from its proper in-focus
setting of zi, objects at zo are no longer in focus.
●The amount of misfocus (blurness) is measured by the
circle of confusion c.
●It measures how much a point of light is blurred in an
optical system, affecting depth of field and image
sharpness.
●The equation for the circle of confusion can be derived
using similar triangles model.
●It depends on the distance of travel in the focal plane ∆zi
relative to the original focus distance zi and the diameter of
the aperture d.
Circle of Confusion (CoC)
Lens law
●Depth of field: The allowable depth variation in the scene
that limits the circle of confusion to an acceptable number
is commonly called the depth of field and is a function of
both the focus distance and the aperture.
●Circles with a diameter less than the circle of confusion
will appear to be in focus.
●Depth of field depends on the aperture diameter d, we also
have to know how this varies with the commonly displayed
f-number, which is usually denoted as f/# or N and is
defined as:
●where the focal length f and the aperture diameter d are
measured in the same unit.
Depth of field indicators in camera
●The usual way to write the f-number is to replace the # in
f /# with the actual number, i.e., f /1.4, f /2, f /2.8, . . . , f /22.
●Dividing the focal length by the f-number gives us the
diameter d of the aperture.
Depth of field indicators example
Chromatic aberration
●Chromatic aberration, also known as color fringing, is a
color distortion that creates an outline of unwanted color
along the edges of objects in a photograph.
Chromatic aberration
●It is the tendency for light of different colors to focus at
slightly different distances.
●Because the refractive index of glass lens varies slightly as
a function of wavelength (color), simple lenses suffer from
chromatic aberration.
●To reduce chromatic aberrations, most photographic lenses
today are compound lenses made of different glass
elements.
Vignetting
●Another property of real-world lenses is vignetting, which
is the tendency for the brightness of the image to fall off
(not reaching) towards the edge of the image.
Normal Vignetting
Two kinds of Vignetting
●The first is called Natural vignetting and is due to the
foreshortening in the object surface, projected pixel, and
lens aperture, as shown in Figure below.
●It is primarily caused by light reaching different locations
on the camera sensor at different angles.
Normal Vignetting
Natural Vignetting
●Consider the light leaving the object surface patch of size
δo located at an off-axis angle α.
●Because this patch is foreshortened with respect to the
camera lens, the amount of light reaching the lens is
reduced by a factor cos α.
●The amount of light reaching the lens is also subject to the
usual 1/r
2
fall-off.
●In this case, the distance ro = zo / cos α.
●The actual area of the aperture through which the light
passes is foreshortened by an additional factor cos α, i.e.,
the aperture as seen from point O is an ellipse of
dimensions d × d cos α.
●Putting all these factors together, we see that the amount of
light leaving O and passing through the aperture is fall-off.
Vignetting
Two kinds of Vignetting
●The other major kind of vignetting, called Mechanical
vignetting, is caused by the internal occlusion (blocking)
of rays near the periphery of lens elements in a compound
lens
●It cannot easily be described mathematically without
performing a full ray-tracing of the actual lens design.
●Mechanical vignetting can be decreased by reducing the
camera aperture (increasing the f-number).
Normal Vignetting
Mechanical vignetting occurs in wide angle cameras
Vignetting
The digital camera (DSLR): Design principles
●During image formation, the light rays starts from one
or more sources, reflecting off one or more surfaces in
the world, and passes through the camera’s optics
(lenses), finally it reaches the imaging sensor.
●In this section, we develop a simple image model that
accounts for the most important effects, such as
exposure (gain and shutter speed), non-linear
mappings, sampling and aliasing, and noise.
●Figure 2.23, shows a simple version of the processing
stages that occur in modern digital cameras.
Vignetting
The digital camera: Architecture
Camera
assembly
Harware/
Raw Image
Software/
Processed Image
Vignetting
Exposure (light control)
●Light falling on an imaging sensor is usually picked up by
an active sensing area, integrated for the duration of the
exposure.
●Uusually expressed as the shutter speed in a fraction of a
second, (Ex. 1/125, 1/60, 1/30) and then passed to a set of
sense amplifiers.
●The two main kinds of sensor used in digital image and
video cameras today are Charge-coupled Device (CCD)
and Complementary Metal Oxide on Silicon (CMOS).
Vignetting
Charge-coupled Device (CCD)
●In a CCD, photons are accumulated in each active “well
(pixel)” during the exposure time.
●Then, in a transfer phase, the charges are transferred
from cell to cell in a kind of “bucket brigade” until
they are deposited at the sense amplifiers, which
amplify the signal and pass it to an analog-to-digital
converter (ADC).
●Some CCD sensors observe blooming effect, i.e. when
charges from one over-exposed pixel spilled into
adjacent ones, but most modern CCDs have anti-
blooming technology.
Vignetting
CCD Image Sensor
Vignetting
Exposure
Vignetting
Complementary Metal Oxide on Silicon (CMOS)
●In CMOS, the photons hitting the sensor directly affect
the conductivity (or gain) of a photodetector.
●This can be selectively gated to control exposure
duration, and locally amplified before being read out
using a multiplexing scheme.
●Traditionally, CCD sensors perform better than CMOS
in quality-sensitive applications, such as digital SLRs.
●While CMOS are better for low-power applications,
but today CMOS is used in most digital cameras.
Vignetting
Vignetting
The main factors affecting the quality of a digital
image sensor are:
●Shutter speed
●Sampling pitch
●Fill factor
●Chip size
●Analog gain
●Sensor noise
●ADC resolution
●Digital post-processing
Vignetting
Shutter speed:
●The shutter speed (exposure time) directly controls
the amount of light reaching the sensor and hence
determines if images are under- or over-exposed.
●For bright scenes, where a large aperture or slow
shutter speed is desired to get a shallow depth of field.
●Motion blur, neutral density filters are sometimes used
by photographers.
●For dynamic scenes, the shutter speed also determines
the amount of motion blur in the resulting picture.
Vignetting
Sampling pitch:
●The sampling pitch is the physical spacing between
adjacent sensor cells on the imaging chip.
●A sensor with a smaller sampling pitch has a higher
sampling density and hence provides a higher
resolution (in terms of pixels) for a given active chip
area.
●However, a smaller pitch also means that each sensor
has a smaller area and cannot accumulate as many
photons.
●This makes it not as light sensitive and more prone to
noise.
Vignetting
Fill factor:
●The fill factor is the active sensing area size as a
fraction of the theoretically available sensing area (the
product of the horizontal and vertical sampling
pitches).
●Higher fill factors are usually preferable, as they result
in more light capture and less aliasing.
●However, fill factor is originally limited by the need to
place additional electrons between the active sensing
areas.
●Modern backside illumination (or back-illuminated)
sensors, coupled with efficient microlens designs, have
largely removed this limitation.
Vignetting
Chip dimension:
●Video and point-and-shoot cameras have traditionally used
small chip areas (1/4 -inch to 1/2 -inch sensors).
●It is measured diagonally in inches and represents the
physical dimensions of the active area on the sensor chip.
●Digital SLR cameras try to come closer to the traditional
size of a 35mm film frame.
●When overall device size is not important, having a larger
chip size is preferable, since each sensor cell can be more
photo-sensitive.
●However, larger chips are more expensive to produce,
because the probability of a chip defect goes up
exponentially with the chip area.
Vignetting
Analog gain:
●Before analog-to-digital conversion, the sensed signal is
usually boosted by a sense amplifier.
●In video cameras, the gain on these amplifiers was
traditionally controlled by automatic gain control (AGC)
logic, which would adjust these values to obtain a good
overall exposure.
●In modern digital cameras, the user now has some
additional control over this gain through the ISO setting,
which is typically expressed in ISO standard units such as
100, 200, or 400.
●In theory, a higher gain allows the camera to perform better
under low light conditions.
Vignetting
Sensor noise (loss of information):
●Throughout the whole sensing process, noise is added from
various sources, which may include fixed pattern noise,
dark current noise, shot noise, amplifier noise, and
quantization noise.
●The final amount of noise present in a sampled image
depends on all of these quantities, as well as the incoming
light the exposure time, and the sensor gain.
●Also, for low light conditions where the noise is due to low
photon counts, a Poisson model of noise may be more
appropriate than a Gaussian model.
Vignetting
ADC resolution:
●The final step in the analog processing chain occurring
within an imaging sensor is the analog to digital
conversion (ADC).
●While a variety of techniques can be used to implement
this process, the two quantities of interest are the
resolution of this process and its noise level.
●For most cameras, the number of bits quoted exceeds the
actual number of usable bits.
●The best way to tell is to simply calibrate the noise of a
given sensor.
●Ex., by taking repeated shots of the same scene and
plotting the estimated noise as a function of brightness.
Vignetting
Sampling and aliasing:
●Display is discrete and world is continuous.
●Sampling: Convert continuous to discret.
●Reconstruction: Converting from discrete to
continuous.
●Aliasing: Artifacts arising from sampling and
consequent loss of information.
● Anti-aliasing: Attempts to overcome aliasing.
Vignetting
Sampling and aliasing:
●What happens when a field of light strikes on the
image sensor falls onto the active sense areas in the
imaging chip?
●The photons arriving at each active cell are
integrated and then digitized, as shown in Figure
2.24.
●However, if the fill factor on the chip is small and
the signal is not otherwise band-limited, visually
unpleasing aliasing can occur.
Vignetting
Sampling and aliasing:
Vignetting
Sampling and aliasing:
●To explore the phenomenon of aliasing, let us first look at a
one-dimensional signal (Figure 2.25).
●We have two sine waves, one at a frequency of f = 3/4 and
the other at f = 5/4.
●If we sample these two signals at a frequency of f = 2, we
see that they produce the same samples (shown in black),
and so we say that they are aliased.
●Why is this a bad effect? In essence, we can no longer
reconstruct the original signal, since we do not know
which of the two original frequencies was present.
Vignetting
Aliasing:
●In fact, Shannon’s Sampling Theorem shows that the
minimum sampling rate required to reconstruct a signal
from its instantaneous samples must be at least twice the
highest frequency. fs ≥ 2fmax.
●The maximum frequency in a signal is known as the
Nyquist frequency and the inverse of the minimum
sampling frequency rs = 1/fs is known as the Nyquist rate.
●The best way to predict the amount of aliasing that an
imaging system (or even an image processing algorithm)
will produce is to estimate the point spread function
(PSF), which represents the response of a particular pixel
sensor to an ideal point light source.
Vignetting
Aliasing:
Vignetting
Digital post-processing:
●Once the irradiance values arriving at the sensor have
been converted to digital bits, most cameras perform a
variety of digital signal processing (DSP) operations to
enhance the image before compressing and storing the
pixel values.
●These include color filter array (CFA) demosaicing,
white point setting, and mapping of the luminance values
through a gamma function to increase the perceived
dynamic range of the signal.
Vignetting
Color Fundamentals
Vignetting
Group objects based on colors
Vignetting
Select objects having blue shade
Vignetting
Color Fundamentals:
Vignetting
How human eyes perceive different colors?
●The existence of three primary colors is a result of the
tristimulus (or trichromatic) nature of the human visual
system.
●Since we have three different kinds of cells called cones,
each of which responds selectively to a different portion of
the color spectrum.
Vignetting
Trichromatic theory
Vignetting
Primary and secondary colors:
Vignetting
Color properties:
●Three properties generally used to distinguish one color
from another are brightness, hue, and saturation.
●Brightness is achromatic notion of intensity, and is one of
the key factors in describing color sensation.
●Hue is an attribute associated with the dominant
wavelength in a mixture of light waves.
●Hue represents dominant color as perceived by an observer.
●Thus, when we call an object red, orange, or yellow, we are
referring to its hue.
Vignetting
Color properties:
●Saturation refers to the relative purity or the amount of
white light mixed with a hue.
●The pure spectrum colors are highly saturated.
●Colors such as pink (red and white) and lavender (violet
and white) are less saturated, with the degree of saturation
being inversely proportional to the amount of white light
added.
●Hue and saturation taken together are called chromaticity
and, therefore, a color may be characterized by its
brightness and chromaticity.
Vignetting
Tristimulus values:
●The amounts of red, green, and blue required to form any
particular color are called the tristimulus values, and are
denoted, X, Y, and Z, respectively.
●A color is then specified by its trichromatic coefficients,
defined as:
Where x+y+z=1
Vignetting
Color Models:
●The purpose of a color model (also called a color space or
color system) is to facilitate the specification of colors in
some standard way.
● In essence, a color model is a specification of
●(1) A coordinate system, and
●(2) A subspace within that system,
●Hence, each color in the model is represented by a single
point contained in that subspace.
●Different color models exists: RGB, CMY and HSI.
Vignetting
The RGB Color Model:
●In the RGB model, each color appears in its primary spectral
components of red, green, and blue.
●This model is based on a Cartesian coordinate system in 3D
space.
RGB
Color
cube
Vignetting
The RGB Color Model:
●In this model RGB primary values are at three corners;
●The secondary colors cyan, magenta, and yellow are at three
other corners;
●Black is at the origin; and white is at the corner farthest
from the origin.
●The grayscale (points of equal RGB values) extends from
black to white along the line joining these two points.
●Images represented in the RGB color model consist of three
component images, one for each primary color.
●The number of bits used to represent each pixel in RGB
space is called the pixel depth.
Vignetting
The RGB Color Model:
●Consider an RGB image in which each of the red, green,
and blue images is an 8-bit image.
●Under these conditions, each RGB color pixel has a depth
of 24 bits (3 image planes times the number of bits
perplane).
●The term full-color image is used often to denote a 24-bit
RGB color image.
●The total number of possible colors in a 24-bit RGB image
is (2
8
)
3
= 16,777,216.
Vignetting
The CMY and CMYB Color Model:
●Cyan, Magenta, and Yellow are the secondary colors of
light.
●RGB to CMY conversion can be done internally using the
simple operation.
●In order to produce true black (which is the predominant
color in printing), a fourth color, black, denoted by K, is
added, giving rise to the CMYK color model.
Vignetting
Gamma Transformation (power law):
●Gamma correction is a nonlinear process that adjusts the
brightness of images to match how humans perceive light.
●Gamma correction applies a power function to each pixel
value in an image.
●The relationship between the input signal brightness Y and
the transmitted signal Y` is given by Y` = Y
1/γ
.
●Gamma values less than 1 make the image darker.
●Gamma values greater than 1 make the image lighter.
●A gamma value of 1 has no effect on the input image.
Gamma Transformation: Example
Vignetting
Image Compression:
●The last stage in a camera’s processing pipeline is usually
some form of image compression.
●Digital images: take huge amount of data.
●Storage, processing and communications requirements
might be impractical.
●More efficient representation of digital images is necessary.
●Image compression: reduces the amount of data required to
represent a digital image by removing redundant data.
●Image compression is an enabling technology: MPEG, JPEG
etc.