Introduction to Computer Vision - Image formation

18CSE390T Computer Vision

Topics Hours Introduction to Computer Vision - Image formation - Geometric primitives - 2D,3D Transformations - 3D to 2D Projection - Lighting, Reflectance and shading - Sampling and aliasing - Image processing Point operators - Pixel transforms - Color transforms - Histogram equalization - Linear filtering - Non Linear filtering - Fourier transforms - Two-dimensional Fourier transforms, Wiener filtering 9

Image Processing Image: An image is a two-dimensional function f(x,y) , where x and y are the spatial (plane) coordinates, and the amplitude of f at any pair of coordinates (x,y) is called the intensity of the image at that level. If x,y and the amplitude values of f are finite and discrete quantities , we call the image a digital image . A digital image is composed of a finite number of elements called pixels , each of which has a particular location and value. The digital image processing deals with developing a digital system that performs operations on an digital image. Field of enhancing the images by fine tuning many parameters and features of the image. Image processing basically includes the following three steps: Importing the image via image acquisition tools; Analyzing and manipulating the image; Output in which result can be altered image or report that is based on image analysis.

Introduction to Computer Vision Providing vision or eyes to a machine A field of study that seeks to develop techniques to help computers “see” and understand the content of digital images such as photographs and videos. Purpose of computer vision is to program a computer to "understand" a scene or features in an image . Focus on extracting information from the input images or videos to have proper understanding of them to predict the visual input from human brain. At an abstract level, the goal of computer vision problems is to use the observed image data to infer something about the world.

Goals of Computer Vision The detection, segmentation, localization, and recognition of certain objects in images (e.g., human faces) The evaluation of results (e.g., segmentation, registration) Registration of different views of the same scene or object Tracking an object through an image sequence Mapping a scene to a three-dimensional model of the scene ; such a model might be used by a robot to navigate the imaged scene Estimation of the three-dimensional pose of humans and their limbs Content-based image retrieval - Searching for digital images by their content

Applications of computer vision Facial recognition: Computer vision has enabled machines to detect face images of people to verify their identity. Initially, the machines are given input data images in which computer vision algorithms detect facial features and compare them with databases of fake profiles. Popular social media platforms like Facebook also use facial recognition to detect and tag users. Further, various government spy agencies are employing this feature to identify criminals in video feeds. Healthcare and Medicine: Computer vision has played an important role in the healthcare and medicine industry. Traditional approaches for evaluating cancerous tumors are time-consuming and have less accurate predictions, whereas computer vision technology provides faster and more accurate chemotherapy response assessments; doctors can identify cancer patients who need faster surgery with life-saving precision. Self-driving vehicles: Computer vision technology has also contributed to its role in self-driving vehicles to make sense of their surroundings by capturing video from different angles around the car and then introducing it into the software. This helps to detect other cars and objects, read traffic signals, pedestrian paths, etc., and safely drive its passengers to their destination. Optical character recognition (OCR) : Optical character recognition helps us extract printed or handwritten text from visual data such as images. Further, it also enables us to extract text from documents like invoices, bills, articles, etc. Machine inspection: Computer vision is vital in providing an image-based automatic inspection. It detects a machine's defects, features, and functional flaws, determines inspection goals, chooses lighting and material-handling techniques, and other irregularities in manufactured products. Retail (e.g., automated checkouts): Computer vision is also being implemented in the retail industries to track products, shelves, wages, record product movements into the store, etc. This AI-based computer vision technique automatically charges the customer for the marked products upon checkout from the retail stores.

3D model building: 3D model building or 3D modeling is a technique to generate a 3D digital representation of any object or surface using the software. In this field also, computer vision plays its role in constructing 3D computer models from existing objects. Furthermore, 3D modeling has a variety of applications in various places, such as Robotics, Autonomous driving, 3D tracking, 3D scene reconstruction, and AR/VR. Medical imaging: Computer vision helps medical professionals make better decisions regarding treating patients by developing visualization of specific body parts such as organs and tissues. It helps them get more accurate diagnoses and a better patient care system. E.g., Computed Tomography (CT) or Magnetic Resonance Imaging (MRI) scanner to diagnose pathologies or guide medical interventions such as surgical planning or for research purposes. Automotive safety: Computer vision has added an important safety feature in automotive industries. E.g., if a vehicle is taught to detect objects and dangers, it could prevent an accident and save thousands of lives and property. Surveillance: It is one of computer vision technology's most important and beneficial use cases. Nowadays, CCTV cameras are almost fitted in every place, such as streets, roads, highways, shops, stores, etc., to spot various doubtful or criminal activities. It helps provide live footage of public places to identify suspicious behavior, identify dangerous objects, and prevent crimes by maintaining law and order. Fingerprint recognition and biometrics: Computer vision technology detects fingerprints and biometrics to validate a user's identity. Biometrics deals with recognizing persons based on physiological characteristics, such as the face, fingerprint, vascular pattern, or iris, and behavioral traits, such as gait or speech. It combines Computer Vision with knowledge of human physiology and behavior.

Variety of real-world applications

Tasks for which computer vision can be used Image Classification: Image classification is a computer vision technique used to classify an image, such as whether an image contains a dog, a person's face, or a banana. It means that with image classification, we can accurately predict the class of an object present in an image. Object Detection: Object detection uses image classification to identify and locate the objects in an image or video. With such detection and identification technique, the system can count objects in a given image or scene and determine their accurate location, along with their labelling. For example, in a given image, there is one person and one cat, which can be easily detected and classified using the object detection technique. Object Tracking: Object tracking is a computer vision technique used to follow a particular object or multiple items. Generally, object tracking has applications in videos and real-world interactions, where objects are firstly detected and then tracked to get observation. Object tracking is used in applications such as Autonomous vehicles, where apart from object classification and detection such as pedestrians, other vehicles, etc., tracking of real-time motion is also required to avoid accidents and follow the traffic rules. Semantic Segmentation: Image segmentation is not only about detecting the classes in an image as image classification. Instead, it classifies each pixel of an image to specify what objects it has. It tries to determine the role of each pixel in the image. Instance Segmentation: Instance segmentation can classify the objects in an image at pixel level as similar to semantic segmentation but with a more advanced level. It means Instance Segmentation can classify similar types of objects into different categories. For example, if visual consists of various cars, then with semantic segmentation, we can tell that there are multiple cars, but with instance segmentation, we can label them according to their colour, shape, etc.

Computer Vision Tasks

Computer Vision Process 1. Capturing an Image A computer vision software or application always includes a digital camera or CCTV to capture the image. So, firstly it captures the image and puts it as a digital file that consists of Zero and one's. 2. Processing the image In the next step, different CV algorithms are used to process the digital data stored in a file. These algorithms determine the basic geometric elements and generate the image using the stored digital data. 3. Analyzing and taking required action Finally, the CV analyses the data, and according to this analysis, the system takes the required action for which it is designed.

The continuum from image processing to computer vision can be broken up into low-, mid- and high-level processes Low Level Process Input: Image Output: Image Examples: Noise removal, Image sharpening Mid Level Process Input: Image Output: Attributes Examples: Object recognition, Segmentation High Level Process Input: Attributes Output: Understanding Examples: Scene understanding, Autonomous navigation

Image Formation Before analyzing and manipulating images, we need to establish a vocabulary for describing the geometry of a scene . Need to understand the image formation process that produced a particular image given a set of Lighting conditions, Scene geometry, Surface properties, and Camera optics.

Acquisition of Images The images are generated by the combination of an illumination source and the reflection or absorption of energy from that source by the elements of the scene being imaged. Imaging sensors are used to transform the illumination energy into digital images. © 2002 R. C. Gonzalez & R. E. Woods

Image Formation Geometric primitives and transformations: Geometric primitives form the basic building blocks used to describe three-dimensional shapes. 2D Transformation 3D Transformation 3D to 2D Projection Photometric Image Formation Lighting, Reflectance and Shading

Geometric Primitives (1/4) 2D points (pixel coordinates in an image) 2D points (pixel coordinates in an image) can be denoted using a pair of values. Also, it can be represented using homogeneous coordinates as equated below where vectors that differ only by scale are considered to be equivalent 2D primitive space. where x¯ = (x, y, 1) is the augmented vector. Homogeneous points whose last element is w˜ = 0 are called ideal points or points at infinity and do not have an equivalent inhomogeneous representation

Geometric Primitives (2/4) 2D lines: 2D lines can also be represented using line equation The line can be normalized using the line equation where vector nˆ is the normal vector perpendicular to the line and d is its distance to the origin. The combination (θ, d) is also known as polar coordinates.

Geometric Primitives (4/4)

Geometric Primitives (3/4) 2D conics: circle, ellipse, parabola, hyperbola There are other algebraic curves that can be expressed with simple polynomial homogeneous equations. The conic sections (so called because they arise as the intersection of a plane and a 3D cone) can be written using a quadric equation

3D – 3D points

3D planes 3D planes can also be represented as homogeneous coordinates m˜ = (a, b, c, d) with a corresponding plane equation The plane equation can be normalized using the vector

3D (2/3) 3D lines Lines in 3D are less elegant than either lines in 2D or planes in 3D. One possible representation is to use two points on the line, (p, q). Any other point on the line can be expressed as a linear combination of these two points. If we use homogeneous coordinates, we can write the line as 3D quadrics The 3D analog of a conic section is a quadric surface

3D (3/3)

Why Homogeneous Coordinates The matrix representations for translation, scaling and rotation are respectively P' = T + P P' = S. P P' = R. P Unfortunately, translation is treated differently ( as an addition) from scaling and rotation(as multiplications).We would like to be able to treat all three transformations in a consistent way, so that they can be combined easily. If points are expressed in homogeneous coordinates, all three transformations can be treated as multiplications. In Homogeneous coordinates, we add a third coordinate to a point. Instead of being represented by a pair of numbers(x,y), each point is represented by a triple(x,y,W). At the same time, we say that two sets of homogeneous coordinates (x,y,W) and (x',y',W') represent the same point if and only if one is a multiple of the other .Thus, (2,3,6) and (4,6,12) are the same points represented by different coordinate triples.

2D Transformations The simplest transformations occur in the 2D plane and are illustrated in Figure 2.4

Translation x’ y’ x y tx ty = + x y 1 = 1 tx 1 ty . = 1 tx 1 ty 1 . x y 1 x’ y’ 1 x’ y’ 2D translations can be written in any of the form from the above mentioned three matrix. Note that in any equation where an augmented vector such as x¯ appears on both sides, it can always be replaced with a full homogeneous vector x˜.

Rotation + Translation This transformation is also known as 2D rigid body motion or the 2D Euclidean transformation (since Euclidean distances are preserved). It can be written is an orthonormal rotation matrix with

Scaled Rotation Also known as the similarity transform, this transformation can be expressed as x 0 = sRx + t where s is an arbitrary scale factor. It can also be written as

Affine The affine transformation is written as Parallel lines remain parallel under affine transformations

Projective Perspective transform This transformation, also known as a perspective transform or homography, operates on homogeneous coordinates, Perspective transformations preserve straight lines.

Overview of 2D transformation

Stretch/squash This transformation changes the aspect ratio of an image , and is a restricted form of an affine transformation .

Planar surface flow This eight-parameter transformation arises when a planar surface undergoes a small 3D motion . It can be thought of as a small motion approximation to a full homography. Its main attraction is that it is linear in the motion parameters , a k , which are often the quantities being estimated.

Bilinear interpolant This eight-parameter transform can be used to interpolate (interrupt) the deformation due to the motion of the four corner points of a square . While the deformation is linear in the motion parameters, it does not generally preserve straight lines . However, it is often quite useful, e.g., in the interpolation of sparse grids using splines

3D transformation The set of three-dimensional coordinate transformations is very similar to that available for 2D transformations. As in 2D, these transformations form a nested set of groups.

Translation 3D translations can be written as x’ = x + t or where I is the (3 X 3) identity matrix 0 is the zero vector.

Rotation + translation Also known as 3D rigid body motion or the 3D Euclidean transformation , it can be written as x’ = Rx + t or Where R is a 3 X 3 orthonormal rotation matrix with RR T = I and |R| = 1. Sometimes it is more convenient to describe a rigid motion using where c is the center of rotation (often the camera center).

Scaled rotation The 3D similarity transform can be expressed as x’ = sRx + t where ‘s’ is an arbitrary scale factor. It can also be written as This transformation preserves angles between lines and planes.

Affine The affine transform is written as x’ = A x ̄ , Where A is an arbitrary 3 X 4 matrix, Parallel lines and planes remain parallel under affine transformations

Projective This transformation, variously known as a 3D perspective transform, homography, or collineation , Operates on homogeneous coordinates, where H̄ is an arbitrary 4 X 4 homogeneous matrix. The resulting homogeneous coordinate x̄’ must be normalized in order to obtain an inhomogeneous result x. Perspective transformations preserve straight lines (i.e., they remain straight after the transformation).

3D to 2D Projections

3D to 2D Projections We can do this using a linear 3D to 2D projection matrix . The simplest model is orthography , which requires no division to get the final (inhomogeneous) result . The more commonly used model is perspective , since this more accurately models the behavior of real cameras . “How to represent 2D and 3D geometric primitives?” How to transform them spatially, we need to specify how 3D primitives are projected onto the image plane.

Projection ➢ Projections transform points in n-space to m-space where m<n. ➢ In 3D, we map points from 3-space to the projection plane(PP) along projector emanating from the center of Projection (COP) ➢ Types of Projection Perspective (Center of Projection) Parallel (Direction of Projection)

Perspective vs Parallel Projection

Orthography and para-perspective

Orthography and para-perspective An orthographic projection simply drops the ‘z’ component of the three-dimensional coordinate ‘p’ to obtain the 2D point x. This can be written as If we are using homogeneous (projective) coordinates, we can write

Perspective projection

Perspective Points are projected onto the image plane by dividing them by their z component. we drop the w component of p. Thus, after projection, it is not possible to recover the distance of the 3D point from the image, which makes sense for a 2D imaging sensor

Perspective Projection Model

Camera Intrinsic (1/4) Once we have projected a 3D point through an ideal pinhole using a projection matrix, we must still transform the resulting coordinates according to the pixel sensor spacing and the relative position of the sensor plane to the origin

Camera Intrinsics (2/4) The combined 2D to 3D projection can then be written as ( x s , y s ): pixel coordinates ( s x , s y ): pixel spacings c s : 3D origin coordinate R s : 3D rotation matrix M s : sensor homography matrix The combined 2D to 3D projection can then be written as

Camera Intrinsics (3/4) K : calibration matrix p w : 3D world coordinates P : camera matrix

Camera Intrinsics (4/4)

A Note on Focal Lengths

Camera Matrix 3×4 camera matrix

IMAGE FORMATION

2.2 Photometric Image Formation 2.2.1 Lighting To produce an image, the scene must be illuminated with one or more light sources. A point light source originates at a single location in space (e.g., a small light bulb), potentially at infinity (e.g., the sun). A point light source has an intensity and a color spectrum, i.e., a distribution over wavelengths L (λ).

Rendering Equation

2.2.2 Reflectance and Shading Bidirectional Reflectance Distribution Function (BRDF) the angles of the incident and reflected directions relative to the surface frame

Diffuse and Specular Reflection

Diffuse Reflection vs. Specular Reflection (1/2)

Diffuse Reflection vs. Specular Reflection (2/2)

Phong Shading Phong reflection is an empirical model of local illumination. Combined the diffuse and specular components of reflection. Objects are generally illuminated not only by point light sources but also by a diffuse illumination corresponding to inter-reflection (e.g., the walls in a room) or distant sources, such as the blue sky.

GLOBAL ILLUMINATION

S4 - Sampling and Aliasing

Sampling & Aliasing Real world is continuous and the Computer world is discrete. Mapping a continuous function to a discrete one is called Sampling Mapping a continuous variable to a discrete one is called Quantization To represent or render an image using a computer, we must both sample and quantize .

Aliasing Aliasing is an effect that causes different signals to become indistinguishable (or aliases of one another) when sampled. It also often refers to the distortion or artifact that results when a signal reconstructed from samples is different from the original continuous signal.

Aliasing of a one-dimensional signal An one-dimensional signal in which there are two sine waves, one at a frequency of f = 3 / 4 and the other at f = 5 / 4. If these two signals are sampled at a frequency of f = 2, they produce the same samples (shown in black), and so they are aliased. Figure 2.24 Aliasing of a one-dimensional signal: The blue sine wave at f = 3/4 and the red sine wave at f = 5/4 have the same digital samples, when sampled at f = 2. Even after convolution with a 100% fill factor box filter, the two signals, while no longer of the same magnitude, are still aliased in the sense that the sampled red signal looks like an inverted lower magnitude version of the blue signal. (The image on the right is scaled up for better visibility. The actual sine magnitudes are 30% and -18% of their original values.)

Shannon’s Sampling Theorem How frequently do we need to sample? Shannon’s Sampling Theorem shows that “The minimum sampling rate required to reconstruct a signal from its instantaneous (immediate) samples must be at least twice the highest frequency. ” The solution: Shannon’s Sampling Theorem: A continuous-time signal x(t) with frequencies no higher than fmax can be reconstructed exactly from its samples x[n] = x(nTs), if the samples are taken a rate fs = 1 / Ts that is greater than 2 fmax. Note that the minimum sampling rate, 2 fmax , is called the Nyquist rate.

Alaising

SAMPLING THEORY

Nyquist rate The maximum frequency in a signal is known as the Nyquist frequency. The inverse of the minimum sampling frequency r s = 1 / f s is known as the Nyquist rate.

Nyquist rate(contd..) An imaging chip actually averages the light field over a finite area, “Are the results on point sampling still applicable?” Averaging over the sensor area does tend to attenuate some of the higher frequencies. However, even if the fill factor is 100%, as in the right image, frequencies above the Nyquist limit (half the sampling frequency) still produce an aliased signal, although with a smaller magnitude than the corresponding band-limited signals.

Nyquist rate(contd..) A more convincing argument as to why aliasing is bad can be seen by downsampling a signal using a poor quality filter such as a box (square) filter . The best way to predict the amount of aliasing that an imaging system or an image processing algorithm will produce is to estimate the Point Spread Function (PSF) , which represents the response of a particular pixel sensor to an ideal point light source .

Aliasing of a two-dimensional signal Aliasing of a two-dimensional signal: (a) original full-resolution image; (b) down sampled 4 with a 25% fill factor box filter; (c) downsampled 4 with a 100% fill factor box filter; (d) downsampled 4 with a high-quality 9-tap filter. Notice how the higher frequencies are aliased into visible frequencies with the lower quality filters, while the 9-tap filter completely removes these higher frequencies.

Image Processing - Point Operators

Point Operators The simplest kinds of image processing transforms are point operators, where each output pixel’s value depends on only the corresponding input pixel value and potentially, some globally collected information or parameters . Examples of such operators include brightness and contrast adjustments (Figure 3.2) as well as color correction and transformations . A general image processing operator is a function that takes one or more input images and produces an output image. Image transforms can be seen as: Point operators (pixel transforms) Neighborhood (area-based) operators

Figure 3.2 Some local image processing operations: (a) original image along with its three color (per-channel) histograms; (b) brightness increased (additive offset, b = 16); (c) contrast increased (multiplicative gain, a = 1:1); (d) gamma (partially) linearized ( γ = 1:2); (e) full histogram equalization; (f) partial histogram equalization.

Point operations Point operations will, not change the size of the image not change the geometry of the image not change the local structure of the image not affect the neighbor pixels https://colab.research.google.com/drive/14wnzsXqepuonuxg1J6r_OdK1XmGD-yis#scrollTo=UYnztJ1deLZF

Pixel Transforms An image processing operator is a function that takes one or more input images (signals) and produces an output image. In the continuous domain, this can be denoted as where x is the D-dimensional domain of the functions (usually D = 2 for images), and the functions f and g operate over some range, which can either be scalar or vector-valued (e.g., for color images or 2D motion). For discrete (sampled) images, the domain consists of a finite number of pixel locations, x = (i, j), and we can write g(i, j) = h(f(i, j)). Two commonly used point processes are multiplication and addition with a constant, g(x) = af(x) + b. The parameters a > 0 and b are often called the gain and bias parameters; sometimes these parameters are said to control contrast and brightness. In this kind of image processing transform, each output pixel's value depends on only the corresponding input pixel value (plus, potentially, some globally collected information or parameters). Examples of such operators include brightness and contrast adjustments as well as color correction and transformations.

Pixel Transforms Brightness and contrast adjustments Two commonly used point processes are multiplication and addition with a constant: g(x)=αf(x)+β The parameters α>0 and β are often called the gain and bias parameters ; sometimes these parameters are said to control contrast and brightness respectively. You can think of f(x) as the source image pixels and g(x) as the output image pixels. Then, more conveniently we can write the expression as: g(i,j)=α⋅f(i,j)+β where i and j indicates that the pixel is located in the i-th row and j-th column.

Color transforms While color images can be treated as arbitrary vector-valued functions or collections of independent bands, It usually makes sense to think about them as highly correlated signals with strong connections to the image formation process, sensor design, and human perception. Consider, for example, brightening a picture by adding a constant value to all three channels Can you tell if this achieves the desired effect of making the image look brighter? Can you see any undesirable side-effects or artifacts? In fact, adding the same value to each color channel not only increases the apparent intensity of each pixel, it can also affect the pixel’s hue and saturation. Chromaticity coordinates or even simpler color ratios can first be computed and then used after manipulating (e.g., brightening) the luminance Y to re-compute a valid RGB image with the same hue and saturation.

Color transform While color images can be treated as arbitrary vector-valued functions or collections of independent bands, it usually makes sense to think about them as highly correlated signals with strong connections to the image formation process, sensor design, and human perception. Consider, for example, brightening a picture by adding a constant value to all three channels (a) original image along with its three color (per-channel) histograms; (b) brightness increased (additive offset, b = 16 );

Color transform In fact, adding the same value to each color channel not only increases the apparent intensity of each pixel, it can also affect the pixel’s hue and saturation. If we divide the XYZ values by the sum of X+Y+Z, we obtain the chromaticity coordinates which sum up to 1. Color ratio of R,G and B are calculated by Chromaticity coordinates or even simpler color ratios can first be computed and then used after manipulating (e.g., brightening) the luminance Y to re-compute a valid RGB image with the same hue and saturation.

Color transform Similarly, color balancing (e.g., to compensate for incandescent lighting) can be performed either by multiplying each channel with a different scale factor or by the more complex process of mapping to XYZ color space. Changing the nominal white point, and mapping back to RGB, which can be written down using a linear 3 x 3 color twist transform matrix.

Compositing and matting In many photo editing and visual effects applications, it is often desirable to cut a foreground object out of one scene and put it on top of a different background. The process of extracting the object from the original image is often called matting (Smith and Blinn 1996), while the process of inserting it into another image (without visible artifacts) is called compositing (Porter and Duff 1984; Blinn 1994a). Compositing equation C = (1 − α)B + αF.

HISTOGRAM EQUALISATION

How can we automatically determine the best values of appearance of an image? Approaches: Map the darkest and brightest pixel values in an image to pure black and pure white. Find the average value in the image, push it towards middle grey, and expand the range

How to visualize the set of lightness values in an image to test heuristics Histogram - plot the histogram of the individual color channels and luminance values Original Image Color Channel and Intensity From this distribution, we can compute relevant statistics such as the minimum, maximum, and average intensity values.

Histogram – plots the no of pixels vs each intensity value

Histogram - Exercise

HISTOGRAM EQUALISATION Enhances the contrast of the image F ind an intensity mapping function f(I) such that the resulting histogram is flat. Generate random samples from a probability density function, which is to first compute the cumulative distribution function

HISTOGRAM EQUALISATION integrate the distribution h(I) to obtain the cumulative distribution c(I), where N is the number of pixels in the image Figure shows the result of applying f(I) = c(I) to the original image. As we can see, the resulting histogram is flat; so is the resulting image (it is “flat” in the sense of a lack of contrast and being muddy looking). The resulting image maintains more of its original grayscale distribution while having a more appealing balance

It is preferable to apply different kinds of equalization in different regions for some images. Consider this image which has a wide range of luminance values , what if we were to subdivide the image into M ×M pixel blocks and perform separate histogram equalization in each sub -block? the resulting image exhibits a lot of blocking artifacts, i.e., intensity discontinuities at block boundaries. LOCALLY ADAPTIVE HISTOGRAM EQUALISATION

HOW TO ELIMINATE BLOCKING ARTIFACTS? One way to eliminate blocking artifacts is to use a moving window, i.e., to recompute the histogram for every M × M block centered at each pixel.

ADAPTIVE HISTOGRAM EQUALIZATION More efficient approach is to compute non-overlapped block-based equalization functions as before, but to then smoothly interpolate the transfer functions as we move between blocks. The weighting function for a given pixel (i, j) can be computed as a function of its horizontal and vertical position (s,t) within a block,

Image Processing Libraries in Python OpenCV − Image processing library mainly focused on real-time computer vision with application in wide-range of areas like 2D and 3D feature toolkits, facial & gesture recognition, Human-computer interaction, Mobile robotics, Object identification and others. Numpy and Scipy libraries − For image manipuation and processing. Sckikit − Provides lots of alogrithms for image processing. Python Imaging Library (PIL) − To perform basic operations on images like create thumnails, resize, rotation, convert between different file formats etc. $pip install pillow

Basic Operations in an image https://colab.research.google.com/drive/14wnzsXqepuonuxg1J6r_OdK1XmGD-yis#scrollTo=-UfY6f8cCV5T https://drive.google.com/file/d/1nn8jvrT4vtWwrmO0PvMDpyxQJ5_q_6TC/view?usp=sharing

Histogram

Linear Filtering

What is Filters ? Applying filters to the image is an another way to modify image. And the difference compare to point operation is the filter use more than one pixel to generate a new pixel value. For example, smoothing filter which replace a pixel value by average of its neighboring pixel value. Filters can divided in 2 types, linear filter and non-linear filter.

Linear filtering The most commonly used type of neighborhood operator is a linear filter, in which an output pixel’s value is determined as a weighted sum of input pixel values within a small neighborhood N The entries in the weight kernel or mask h(k, l) are often called the filter coefficients. The above correlation operator can be more compactly notated as A common variant on this formula is where the sign of the offsets in f has been reversed, This is called the convolution operator, g = f ∗ h, and h is then called the impulse response function. g = f ⊗ h.

Neighborhood filtering (convolution): The image on the left is convolved with the filter in the middle to yield the image on the right. The light blue pixels indicate the source neighborhood for the light green destination pixel. Linear filter is a filter which operate the pixel value in the support region in linear manner (i.e.,as weighted summation). The support region is specified by the ‘filter matrix’ and be represent as H(i,j) . The size of H is call ‘filter region’ and filter matrix has its own coordinate system, i is column index and j is row index. The center of it is the origin location and it is called the ‘hot spot’.

Applying the filter To apply the filter to the image, please follow these step. Move the filter matrix over the image I and H(0,0) must go along with the current image position (u,v) Multiply each filter coefficient H(i,j) with the corresponding image element I(u+i,v+j) Average all result from the previous step and it is the result for the current location I(u,v) All steps can be described as equation below

Type of linear filter Smoothing Filter / Separable filter (This filter has only positive integer.) The process of performing a convolution requires K 2 (multiply-add) operations per pixel , where K is the size (width or height) of the convolution kernel . In many cases, this operation can be significantly speed up by first performing a one-dimensional horizontal convolution followed by a one-dimensional vertical convolution (which requires a total of 2K operations per pixel) . A convolution kernel for which this is possible is said to be separable. Box filter. All members of this filter are the same. Gaussian filter. The weight of filter member depend on the location of the member. The center of the filter receive the maximum weigh and it decreases with distance from the center.

2. Different Filter Laplace or Mexican hat filter. Some members of this filter are negative filter and it can calculate by summation of positive member and negative member. 3D structure, 2D structure and example of filter (a) Box filter (b) Gaussian filter and (c) Laplace filter

Properties of Linear Filter “Linear Convolution” For two-dimensional function I and H, the convolution operation is defined as the equation where * is the convolution operation. Look at the equation you will see that this operation provide the similar result with the linear filter with the filter function which reflect in both horizontal and vertical axis. The convolution matrix H can be called kernel .

Properties of Linear Convolution Commutativity Linearity Associativity Seperability: the kernel H can be represented as the convolution of multiple kernels and can separated in a pair dimensional kernel x and y.

Figure 3.14 Separable linear filters: For each image (a)–(e), we show the 2D filter kernel (top), the corresponding horizontal 1D kernel (middle), and the filtered image (bottom). The filtered Sobel and corner images are signed, scaled up by 2x and 4x, respectively, and added to a gray offset before display.

Padding (border effects) The matrix multiply suffers from boundary effects , The results of filtering the image in this form will lead to a darkening of the corner pixels. This is because the original image is effectively being padded with 0 values wherever the convolution kernel extends beyond the original image boundaries, and so the filtered images suffer from boundary effects. To deal with this, a number of different padding or extension modes have been developed for neighborhood operations: zero: set all pixels outside the source image to 0 (a good choice for alpha-matted cutout images); constant (border color): set all pixels outside the source image to a specified border value; clamp (replicate or clamp to edge): repeat edge pixels indefinitely; (cyclic) wrap (repeat or tile): loop “around” the image in a “toroidal” configuration; mirror: reflect pixels across the image edge; extend: extend the signal by subtracting the mirrored version of the signal from the edge pixel value.

Figure 3.13 Border padding (top row) and the results of blurring the padded image (bottom row). The normalized zero image is the result of dividing (normalizing) the blurred zero padded RGBA image by its corresponding soft alpha value.

Band-pass and steerable filters The Sobel and corner operators are simple examples of band-pass and oriented filters. More sophisticated kernels can be created by first smoothing the image with a (unit area) Gaussian filter, and then taking the first or second derivatives . Such filters are known collectively as band-pass filters , since they filter out both low and high frequencies . The second derivative of a two dimensional image, is known as the Laplacian operator .

Band-pass and steerable filters Blurring an image with a Gaussian and then taking its Laplacian is equivalent to convolving directly with the Laplacian of Gaussian (LoG) filter , which has certain nice scale-space properties The Sobel operator is a simple approximation to a directional or oriented filter, which can obtained by smoothing with a Gaussian (or some other filter) and then taking a directional derivative which is obtained by taking the dot product between the gradient field r and a unit direction û = (cos, sin ),

The smoothed directional derivative filter, where û = (u; v), is an example of a steerable filter, Since the value of an image convolved with Gû can be computed by first convolving with the pair of filters (Gx,Gy) and then steering the filter (potentially locally) by multiplying this gradient field with a unit vector û. The advantage of this approach is that a whole family of filters can be evaluated with very little cost .

Recursive filtering The incremental formula (3.31) for the summed area is an example of a recursive filter , i.e ., one whose values depends on previous filter outputs. In the signal processing literature, such filters are known as infinite impulse response (IIR) , since the output of the filter to an impulse (single non-zero value) goes on forever. For example, for a summed area table, an impulse generates an infinite rectangle of 1s below and to the right of the impulse.

Recursive Filtering The filters we have previously studied, which involve the image with a finite extent kernel , are known as finite impulse response (FIR). Two-dimensional IIR filters and recursive formulas are sometimes used to compute quantities that involve large area interactions , such as two-dimensional distance functions and connected components . More commonly, IIR filters are used inside one-dimensional separable filtering stages to compute large-extent smoothing kernels , such as efficient approximations to Gaussians and edge filters .

Non-linear filtering Noise removing with smoothing filter (a linear filter) provide the result in burred of the image structure, line and edge. Non-Linear Filters were used to solve this problem and it works in non-linear manner. Consider for example the image in Figure 3.18e, where the noise, rather than being Gaussian, is shot noise, i.e., it occasionally has very large values. In this case, regular blurring with a Gaussian filter fails to remove the noisy pixels and instead turns them into softer (but still visible) spots (Figure 3.18f).

Type of non-linear filters Minimum and Maximum Filters: The minimum and maximum value in the moving region R of the original image is the result of the minimum and maximum filter respectively. These filter were defined as The equation of minimum and maximum filter

Median filtering The result was calculated in the same way as the minimum and maximum filter. The median of all value in moving region R is the result of the median filter. And this filter typically use for remove salt and pepper noise in the image. This filter was defined as

Median filtering Non-linear smoothing has another, perhaps even more important property, especially since shot noise is rare in today’s cameras . Such filtering is more edge preserving , i.e., it has less tendency to soften edges while filtering away high-frequency noise. Consider the noisy image in Figure 3.18a. To remove most of the noise , the Gaussian filter is forced to smooth away high-frequency detail, which is most noticeable near strong edges.

Median filtering Median filtering does better but, as mentioned before, does not do as good a job at smoothing away from discontinuities . We could try to use the -trimmed mean or weighted median , these techniques still have a tendency to round sharp corners , since the majority of pixels in the smoothing area come from the background distribution .

Bilateral filtering This is the essential idea in bilateral filtering , which was first popularized in the computer vision community in 1998. In the bilateral filter , the output pixel value depends on a weighted combination of neighboring pixel values What if we were to combine the idea of a weighted filter kernel with a better version of outlier rejection? What if instead of rejecting a fixed percentage, we simply reject (in a soft way) pixels whose values differ too much from the central pixel value?

Figure 3.19 Median and bilateral filtering: (a) median pixel (green); (b) selected –trimmed mean pixels; (c) domain filter (numbers along edge are pixel distances); (d) range filter. Implementation of the filters in python 3 is so easy. For box, gaussian and median filter, you can use cv2.boxFilter() , cv2.GaussianBlur() and cv2.medianBlur() .

Fourier Transforms

Frequency Domain Analysis of mathematical functions or signals with respect to frequency, rather than time A time-domain graph shows how a signal changes over time, whereas a frequency-domain graph shows how much of the signal lies within each given frequency band over a range of frequencies. A frequency-domain representation can also include information on the phase shift that must be applied to each sinusoid in order to be able to recombine the frequency components to recover the original time signal. In DIP: Analysis of the image in another domain rather than in spatial domain.

Frequency-domain operations

Easier

Background Jean Baptiste Joseph Fourier (21 March 1768 – 16 May 1830): French mathematician and physician 1822: Théorie analytique de la chaleur (The Analytic Theory of Heat) Main idea: Every periodic function can be expressed as a sum of sines/cosines (Fourier Series) 🡪Harmonic analysis

The main idea

Mathematical Background: Complex Numbers A complex number x is of the form: α: real part , b: imaginary part Addition: Multiplication:

Mathematical Background: Complex Numbers (cont.) Magnitude-Phase (i.e.,vector) representation Magnitude: Phase: φ Magnitude-Phase notation:

Mathematical Background: Complex Numbers (cont.) Multiplication using magnitude-phase representation Complex conjugate Properties

Mathematical Background: Complex Numbers (cont.) Euler’s formula Properties j

Mathematical Background: Sine and Cosine Functions Periodic functions General form of sine and cosine functions:

Mathematical Background: Sine and Cosine Functions (cont.) Special case: A=1, b=0, α=1 π π π/2 π/2 3π/2 3π/2

Mathematical Background: Sine and Cosine Functions (cont.) Shifting or translating the sine function by a const b Remember: cosine is a shifted sine function:

Mathematical Background: Sine and Cosine Functions (cont.) Changing the amplitude A

Mathematical Background: Sine and Cosine Functions (cont.) Changing the period T=2π/|α| consider A=1, b=0: y=cos(αt) period 2π/4=π/2 shorter period higher frequency (i.e., oscillates faster) α =4 Frequency is defined as f=1/T Alternative notation : sin(αt)=sin(2πt/T)=sin(2πft)

Basis Functions Given a vector space of functions, S, then if any f(t) ϵ S can be expressed as the set of functions φ k (t) are called the expansion set of S. If the expansion is unique , the set φ k (t) is a basis .

Image Transforms Many times, image processing tasks are best performed in a domain other than the spatial domain. Key steps: (1) Transform the image (2) Carry the task(s) in the transformed domain . (3) Apply inverse transform to return to the spatial domain.

Transformation Kernels Forward Transformation Inverse Transformation inverse transformation kernel forward transformation kernel

Kernel Properties A kernel is said to be separable if: A kernel is said to be symmetric if:

Fourier Series Theorem Any periodic function f(t) can be expressed as a weighted sum (infinite) of sine and cosine functions of varying frequency: is called the “fundamental frequency”

Fourier transform Let f-frequency - is angular frequency - phase The variables x and y to denote the spatial coordinates of an image. If we convolve the sinusoidal signal s(x) with a filter whose impulse response is h(x) , we get another sinusoid of the same frequency but different magnitude A and phase

Fourier transform The new magnitude A is called the gain or magnitude of the filter, while the phase difference is called the shift or phase. More compact notation is to use the complex-valued sinusoid we can simply write The Fourier transform is simply a tabulation of the magnitude and phase response at each frequency,

Fourier transform Fourier transform exist both in the continuous domain, and in the discrete domain, where N is the length of the signal or region of analysis. These formulas apply both to filters, such as h(x) , and to signals or images, such as s(x) or g(x) . The discrete form of the Fourier transform is known as the Discrete Fourier Transform (DFT). (with the value of k in the range of [N/2,N/2])

Fourier transform Properties of Fourier transform: Superposition: The Fourier transform of a sum of signals is the sum of their Fourier transforms. Thus, the Fourier transform is a linear operator. Shift: The Fourier transform of a shifted signal is the transform of the original signal multiplied by a linear phase shift (complex sinusoid). Reversal: The Fourier transform of a reversed signal is the complex conjugate of the signal’s transform.

Fourier transform Properties of Fourier transform: Convolution: The Fourier transform of a pair of convolved signals is the product of their transforms. Correlation: The Fourier transform of a correlation is the product of the first transform times the complex conjugate of the second one. Multiplication: The Fourier transform of the product of two signals is the convolution of their transforms. Differentiation: The Fourier transform of the derivative of a signal is that signal’s transform multiplied by the frequency. In other words, differentiation linearly emphasizes (magnifies) higher frequencies.

Fourier transform Properties of Fourier transform: Domain scaling: The Fourier transform of a stretched signal is the equivalently compressed (and scaled) version of the original transform and vice versa. Real images: The Fourier transform of a real-valued signal is symmetric around the origin. This fact can be used to save space and to double the speed of image FFTs by packing alternating scanlines into the real and imaginary parts of the signal being transformed. Parseval’s Theorem: The energy (sum of squared values) of a signal is the same as the energy of its Fourier transform.

Fourier transform Fourier transform pairs Fourier transform pairs are some commonly occurring filters and signals

Fourier transform Fourier transform pairs Impulse: The impulse response has a constant (all frequency) transform. Shifted impulse: The shifted impulse has unit magnitude and linear phase. Box filter: The box (moving average) filter has a sinc Fourier transform, which has an infinite number of side lobes. Conversely, the sinc filter is an ideal lowpass filter. For a non-unit box, the width of the box a and the spacing of the zero crossings in the sinc 1/a are inversely proportional.

Fourier transform Fourier transform pairs 4. Tent: The piecewise linear tent function, has a sinc 2 Fourier transform. 5. Gaussian: The (unit area) Gaussian of width σ has a (unit height) Gaussian of width σ -1 as its Fourier transform.

Fourier transform Fourier transform pairs 6. Laplacian of Gaussian: The second derivative of a Gaussian of width σ has a band-pass response of as its Fourier transform. 7. Gabor: The even Gabor function, which is the product of a cosine of frequency ω and a Gaussian of width σ, has as its transform the sum of the two Gaussians of width σ -1 centered at ω=±ω . The odd Gabor function, which uses a sine, is the difference of two such Gaussians. Gabor functions are often used for oriented and band-pass filtering, since they can be more frequency selective than Gaussian derivatives.

Fourier transform Fourier transform pairs 8. Unsharp mask: The unsharp mask has as its transform a unit response with a slight boost at higher frequencies. 9. Windowed sinc: The windowed (masked) sinc function has a response function that approximates an ideal low-pass filter better and better as additional side lobes are added ( W is increased).

Fourier transform Discrete Kernels

Two-dimensional Fourier transforms The formulas and insights we have developed for one-dimensional signals and their transformstranslate directly to two-dimensional images. Here, instead of just specifying a horizontal or vertical frequency ωx or ωy, we can create an oriented sinusoid of frequency ( ωx , ωy ) , The corresponding two-dimensional Fourier transforms are then and in the discrete domain, where M and N are the width and height of the image.

Two-dimensional Fourier transforms Wiener filtering simple model for images is to assume that they are random noise fields whose expected magnitude at each frequency is given by this power spectrum To generate such an image, we simply create a random Gaussian noise image S( ωx , ωy ) where each “pixel” is a zero-mean Gaussian of variance P s ( ωx , ωy ) and then take its inverse FFT. where the angle brackets <.> denote the expected (mean) value of a random variable

Discrete cosine transform The discrete cosine transform (DCT) is a variant of the Fourier transform particularly wellsuited to compressing images in a block-wise fashion. The one-dimensional DCT is computed by taking the dot product of each N -wide block of pixels with a set of cosines of different frequencies, where k is the coefficient (frequency) index, and the 1 / 2-pixel offset is used to make the basis coefficients symmetric. The two-dimensional version of the DCT is defined similarly,

Applications of Fourier transform Sharpening, blur, and noise removal Sharpening Noise removal

S9 – 2D FOURIER TRANSFORM AND WEINER FILTER

Two-dimensional Fourier Transforms The formulas and insights we have developed for one-dimensional signals and their transforms translate directly to two-dimensional images. Here, instead of just specifying a horizontal or vertical frequency ω x or ω y , we can create an oriented sinusoid of frequency (ω x , ω y ) The corresponding two-dimensional Fourier transforms are then

Two-dimensional Fourier Transforms and in the discrete domain, where M and N are the width and height of the image. All of the Fourier transform properties from Table 3.1 carry over to two dimensions if we replace the scalar variables x, ω, x and a with their 2D vector counterparts x = (x,y), ω= (ω x , ω y ), x = (x 0, y ), and a = (a x , a y ), and use vector inner products instead of multiplications.

Wiener filtering

The Wiener filter performs two main functions - it inverts the blur of the image and removes extra noise. It is particularly helpful when processing images that have been through a degradation filter or when the image has been blurred by a known lowpass filter. It is often used in deconvolution, which is an algorithm-based process to enhance signals from data. Wiener filtering

How does the Wiener filter work? The Wiener filter performs two main functions - it inverts the blur of the image and removes extra noise. It is particularly helpful when processing images that have been through a degradation filter or when the image has been blurred by a known lowpass filter. It is often used in deconvolution, which is an algorithm-based process to enhance signals from data.

Assuming that an image is a sample from a correlated Gaussian random noise field combined with a statistical model of the measurement process yields an optimum restoration filter known as the Wiener filter . To derive the Wiener filter, Analyze each frequency component of a signal’s Fourier transform independently . The noisy image formation process can be written as

where s(x, y) is the (unknown) image we are trying to recover, n(x, y) is the additive noise signal, and o(x, y) is the observed noisy image. Because of the linearity of the Fourier transform , we can write where each quantity in the above equation is the Fourier transform of the corresponding image.

At each frequency (ω x , ω y ), we know from our image spectrum that the unknown transform component S(ω x , ω y ) has a prior distribution which is a zero-mean Gaussian with variance P s (ω x , ω y ). We also have noisy measurement O(ω x , ω y ) whose variance is P n (ω x , ω y ), i.e., the power spectrum of the noise, which is usually assumed to be constant (white), P n (ω x , ω y ) = σ 2 n . According to Bayes’ Rule, the posterior estimate of S can be written as

Where is a normalizing constant used to make the distribution proper (integrate to 1). The prior distribution p(S) is given by where μ is the expected mean at that frequency (0 everywhere except at the origin) and the measurement distribution P(O|S) is given by

Taking the negative logarithm of both sides of (3.68) and setting μ = 0 for simplicity, we get which is the negative posterior log likelihood. The minimum of this quantity is easy to compute, The quantity

is the Fourier transform of the optimum Wiener filter needed to remove the noise from an image whose power spectrum is P s (ω x , ω y ). Notice that this filter has the right qualitative properties, i.e., for low frequencies where , it has unit gain, whereas for high frequencies, it attenuates the noise by a factor P s / σ 2 n . The methodology given above for deriving the Wiener filter can easily be extended to the case where the observed image is a noisy blurred version of the original image,

where b(x, y) is the known blur kernel.

Thank You

Introduction to Computer Vision - Image formation

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Introduction to Computer Vision - Image formation

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77