CV module 2.pdfvVdbsbsbsbsbsvsvsvsvwvssvv

MODULE -2

More neighborhood operators:
Non-linear filtering:
Non-linear filters do not just compute weighted sums; they can ignore or down-weight outliers.
Median filtering:
The median filter replaces each pixel with the median value of its neighborhood (e.g., 3×3 window).
Why it works for shot noise:
Shot noise = occasional extreme values (very high or low). Since median depends on the middle value, extreme
outliers are ignored.
Effect: noisy pixels are removed without affecting normal pixel values much.

[ 10 12 11 ]
[ 9 100 10 ] ← center pixel is noisy
[ 11 9 10 ]
[9, 9, 10, 10, 10, 11, 11, 12, 100]
median=10

α-Trimmed Mean
This is a compromise between median and mean.
Method:
Sort pixel values in the neighborhood. Remove the α fraction of the smallest and largest pixels (to ignore outliers).
Take the average of the remaining pixels.
Weighted Median / Weighted Mean
Pixels are “counted” multiple times depending on their weight (distance or importance).
Equivalent to minimizing:

α-Trimmed Mean example
[ 10, 12, 11 ]
[ 9, 100, 10 ]
[ 11, 9, 10 ]
[ 9, 9, 10, 10, 10, 11, 11, 12, 100 ]
Remaining = [9, 10, 10, 10, 11, 11, 12]
Average = (9+10+10+10+11+11+12)/7 = 73/7 ≈ 10.4

[ 9 10 11 ]
[ 10 100 12 ] ← center is a noisy outlier
[ 11 10 9 ]
[ 1 2 1 ]
[ 2 4 2 ]
[ 1 2 1 ]
Weighted sum = 40+444+40 = 524
Weighted mean = 524 / 16 = 32.75

Aggregate weights by value:
9 → weight 2
10 → weight 6
11 → weight 2
12 → weight 2
100 → weight 4
Total = 16
Cumulative (sorted):
9 (2) → 2, 10 (6) → 8, 11 (2) → 10, 12 (2) → 12, 100 (4) → 16
The 50% point (≥8) lands at 10 → weighted median = 10

Bilateral filtering:
For each pixel g(i,j), the output is a weighted average of its neighbors
Weight w(i,j,k,l) = domain weight ×range weight:
Domain weight = depends on distance from the center pixel (closer pixels get higher weight).
Range weight = depends on intensity difference from the center pixel (similar colors get higher weight, very
different colors get lower weight).
At an edge, pixels across the edge have very different intensity. The range weight down-weights these pixels →
they barely affect the center pixel. So the edge remains sharp, unlike a normal Gaussian blur. For color images,
the range weight considers all color channels, so edges in any channel are preserved
Computing this directly is slow, because every pixel looks at a neighborhood and calculates weights based on
both distance and intensity.

Iterated adaptive smoothing and anisotropic diffusion:
Iterative Filtering:
Bilateral filters can be applied multiple times to achieve stronger smoothing or a “cartoon-like” effect. Instead of
using a large neighborhood, a small neighborhood (like the 4 nearest neighbors, N4) is often enough in iterative
filtering. Each iteration gradually smooths the image while preserving edges, because the range kernel reduces the
influence of pixels with very different intensity.

This iterative process is mathematically equivalent to anisotropic diffusion . Smooth the image inside regions (where
intensity is similar). Stop diffusion across edges (where intensity difference is high).The range kernel r is called the
edge-stopping function or diffusion coefficient.
Small intensity difference → high weight → diffusion occurs
Large intensity difference → low weight → diffusion stops → edges preserved
Guided image filtering:
Instead of computing pixel weights based on the input image itself, use a guide image h(i,j).The output pixel g(i,j) is
computed as a weighted average of the input pixels, but the weights come from the guide image:
f(k,l) = input pixel values (noisy image)
h(i,j) = guide image (clean edges)
The weights w depend on the similarity in the guide image, so the filtered output follows the edges in the guide.

f=[10 12 11]
[9 15 14]
[190 195 192]
h=[0 0 0]
[0 0 0]
[1 1 1]

In some approaches, instead of just averaging, we model the relationship between the guide and input locally:
This is a local linear (affine) transformation:
Scale by A and shift by b. Parameters A
k,l andb
k,lare computed for a small neighborhood around each pixel.
Binary image processing:
Binary Images: A binary image has only two values per pixel: 0 or 1 (black or white). Example: a scanned document
where text is black (1) and background is white (0)
Start with a grayscale image (pixels 0–255). Apply thresholding: pick a threshold value t.
Rule:
Meaning:
Pixels brighter than t → white (1)
Pixels darker than t → black (0)
This converts a grayscale image into a binary image.

Morphology:
What are Morphological Operations?
Binary images = only black (0) and white (1).
Morphology = changing the shape of objects in a binary image (making them thicker, thinner, removing noise, filling
gaps, etc.).
To do this, we use a small pattern called a structuring element (like a 3×3 square, cross, or circle).Think of it as a little
"stamp" we slide over the image.
How it Works
Place the structuring element over each pixel. Count how many 1s (white pixels) are inside it. Call this count c.
Compare c to some rule (threshold) to decide the output pixel
Main Operations
Dilation(grow white areas)If c ≥ 1 → output = 1.
Rule: if at least 1 white pixel is in the window → make output = 1.
Effect: shapes become thicker, gaps get filled.

Erosion(shrink white areas) If c = 9
Rule: if all pixels under the window are white → make output = 1.
Effect: shapes become thinner, small white spots disappear.
Majority If c > 4
Rule: if more than half of the window is white → output = 1.
Effect: balances between dilation and erosion.
Opening (erosion → then dilation)
Effect: removes small noise but keeps big shapes.
Example: small dots around letters disappear.
Closing (dilation → then erosion)
Effect: fills small holes and connects nearby shapes.
Example: broken letter "O" gets closed

If neighborhood is
0 1 0
1 1 0
0 0 0
c=3

Distance transforms:
You start with a binary image b(i,j):
Foreground pixels (object) = 1 (or non-zero)
Background pixels = 0
The distance transform D(i,j) tells you: For every pixel (i,j), how far it is from the nearest background pixel (value =
0)
City Block Distance (D1)
Easy and efficient → just do two raster scans:
Forward pass: Left-to-right, top-to-bottom. Update each pixel from north and west neighbors.
Backward pass: Right-to-left, bottom-to-top. Update each pixel from south and east neighbors.
This way, each pixel gets the correct minimum distance without brute force.

0 0 0 0 0
0 1 1 0 0
0 1 1 0 0
0 0 0 0 0
0 0 0 0 0
Step 1: Initialization
Replace:
0 → 0 (distance 0, already background)
1 → ∞ (very large, meaning "unknown yet")
0 0 0 0 0
0 inf inf 0 0
0 inf inf 0 0
0 0 0 0 0
0 0 0 0 0
Formula:D(i,j)=min(D(i−1,j),D(i,j−1))+1

Forward pass (top-left → bottom-right)
At each pixel, check top (north) and left (west) neighbors → take min(neighbor)+1.
Pixel (2,2): min(0,0)+1 = 1
Pixel (2,3): min(0,1)+1 = 1
Pixel (3,2): min(1,0)+1 = 1
Pixel (3,3): min(1,1)+1 = 2
0 0 0 0 0
0 1 1 0 0
0 1 2 0 0
0 0 0 0 0
0 0 0 0 0

Backward pass (bottom-right → top-left)
Now check bottom (south) and right (east) neighbors → min(neighbor)+1.
Update only if it gives a smaller value.
D(i,j)=min(D(i,j),D(i+1,j)+1,D(i,j+1)+1)
South neighbor = (i+1,j)
East neighbor = (i,j+1)
Pixel (3,3): already 2, neighbors are 0 (below, right) → min(0+1=1,2)=1 → update to 1.
Pixel (2,3): stays 1 (min check doesn’t improve).
Pixel (3,2): stays 1.
Pixel (2,2): stays 1
0 0 0 0 0
0 1 1 0 0
0 1 1 0 0
0 0 0 0 0
0 0 0 0 0

Euclidean Distance (D2)
More accurate, but trickier.
You can’t just store a single number, because the direction matters.
Instead, each pixel keeps a vector pointing to its nearest background pixel (dx, dy).
Skeleton / Medial Axis
The ridges in the distance map = points equidistant to two or more boundaries.
These form the skeleton (thin central representation of shape).
Very useful in shape analysis and pattern recognition.
01110
12221
01110

Signed Distance Transform
So far → distances are outside-only (or inside-only). But often we want distances everywhere, with a sign:
Inside object: negative distances
Outside object: positive distances
How to compute:
Compute distance transform of original image.
Compute distance transform of complement (flip 0 ↔ 1).
Negate one of them and combine.
Connected components:
Imagine a binary image (only black and white pixels).
A connected component is a "blob" of connected pixels that share the same value (usually 1 = object).
For example, in a scanned document → each letter is a connected component.
In microscopy → each cell can be a connected component
Two pixels are "connected" if they touch:
N4 adjacency: only up, down, left, right.
N8 adjacency: also diagonals.

0 0 0 0 0
0 1 1 1 0
0 1 1 1 0
0 1 1 1 0
0 0 0 0 0
•Inside pixels → negative distances.
•Boundary pixels → 0.
•Outside pixels → positive distances

Two pixels are "connected" if they touch:
N4 adjacency: only up, down, left, right.
N8 adjacency: also diagonals.
So:
N4 → stricter, fewer connections.
N8 → more connections, blobs merge more easily.
Finding connected components lets us split the image into separate regions/objects.
11100
10100
00111

Once you find a blob, you can measure its properties:
Area = number of pixels in the blob.
(Like counting how many "1"s belong to that object.)
Perimeter = number of boundary pixels (those touching background).
(Like tracing its outline.)
Centroid = the average (x, y) position of its pixels.
(Like the "center of mass" of the blob.)
Second moments (M) = these capture the shape of the blob.
From them, you can compute:
The major axis (longest direction of the shape, like length).
The minor axis (shortest direction, like width).
The orientation (angle of the blob)

Fourier transforms:
Every image can be thought of as a combination of many sinusoidal waves (patterns of different frequencies).
So, if you want to test a filter: Input a sinusoidal pattern (like black/white stripes of a certain width). See how the filter
changes it. The output will always still be a sinusoid (same frequency), but: Its amplitude may shrink (weakened). Its
phase may shift (moved sideways). That’s all a filter does in frequency terms → scale + shift
Instead of using sine waves alone, we often use complex exponentials:
This makes the math cleaner. The filter’s effect can then be summarized as:
where: A = gain (how much it scales the wave)
ϕ = phase shift
This is the Fourier transform of the filter.
Two key formulas:
Continuous case:
Discrete case (for digital signals/images):

For images, the signal is 2D: h(x,y).
So instead of one variable x, we now have two: horizontal (x) and vertical (y). And instead of one frequency ω, we
now need two frequency components:
ω
x→ horizontal frequency
??????
??????→ vertical frequency
A sinusoid in 2D looks like: sin(x,y)=sin(ω
xx+ω
yy)
Continuous 2D Fourier Transform:

Discrete 2D fourier transform
Vector Substitution Rule:

Wiener filtering

How to generate random image
Asin(2πft+ϕ)

Now, suppose we observe a noisy image:

Discrete cosine transform:
The Discrete Cosine Transform (DCT) is like the Fourier Transform, but it uses only cosine waves (no sine waves).
Cosines are symmetric and "fit nicely" into image blocks. That makes the DCT better for image compression (like
JPEG)

2D DCT (for Images)

Application: Sharpening, blur, and noise removal:
Images are often:
Blurry → need sharpening.
Noisy (grainy, random pixels) → need denoising.
To fix these, we process each pixel using its neighbors (nearby pixels).
Instead of just averaging, they adapt depending on the image content:
Weighted median filter → replaces each pixel with a median of neighbors, weighted by closeness.
Bilateral filter → smooths flat areas but keeps edges sharp.
Anisotropic diffusion → a kind of "smart blur" that spreads only along flat regions, not across edges.
Non-local means → looks for similar patches all over the image, not just nearby, and averages them
optimization-based; Treat denoising as an optimization problem:
"Find the clean image that is close to the noisy one, but also smooth in some sense."
Total variation (TV) → favors images with sharp edges (L1 norm) instead of overly smooth results (L2 norm).

Deep learning (most recent): Neural networks trained on huge datasets can learn how to remove noise.They usually
outperform classical filters today.
When we know the clean image (because we added fake noise ourselves), we can compare:
PSNR (Peak Signal-to-Noise Ratio):
Measures pixel-by-pixel difference.
Higher = better (closer to original).
But doesn’t always match human perception.
SSIM (Structural Similarity Index):
Compares patterns of structure, brightness, and contrast.
Closer to how humans see similarity.
FLIP (newer metric):
A perceptual difference measure (models human vision better).

Pyramids and wavelets:
Imagine searching for a face in a picture: A face could appear big (close-up) or small (far away). If you only look at
one resolution, you might miss it.
Solution → Image pyramid Build a stack of the same image at different sizes (like a pyramid).
Top: very small version.
Bottom: full resolution.
Then scan each level for the object. This is how many vision systems (including human vision) handle scale.
Why pyramids are useful
Object detection: Search quickly at coarse scales, then refine at full resolution.
Multi-scale editing: Combine/blend images smoothly (no visible seams).
Efficient processing: Work on small versions first to save time.

Interpolation:
When you upsample an image (make it larger), you need to create new pixels that weren’t in the original. To decide
their values, you use an interpolation kernel → basically a little "smoothing function" that spreads each original
pixel’s value into the higher-resolution grid.
Two equivalent ways:
Imagine putting a copy of the interpolation kernel at each original pixel and adding them up Or imagine centering the
kernel at each new output pixel and sampling from the old pixels.
Different kernels → different quality and speed trade-offs:
Nearest neighbor: Fast, blocky.
Linear (tent / bilinear): Connects pixels with straight lines → can look jagged, creates creases.
Cubic B-spline: Smooth, soft-looking images, but loses fine details (not exact interpolation).
Bicubic (common in Photoshop etc.): Produces sharper results, widely used.
Has a parameter a that controls sharpness vs smoothness.
a=−0.5 → smoother, reproduces ramps well.
a=−1 → sharper, but may cause ringing (little ripples near edges).

Decimation:
Before throwing away pixels, you first need to smooth (blur) the image with a low-pass filter.
This filter removes the fine details that can cause aliasing.
Filters mentioned:
Linear filter [1, 2, 1]
Very simple (like a short blur).
Doesn’t block high frequencies well → so aliasing leaks in.
Cheap but poor quality

Binomial filter [1, 4, 6, 4, 1]
A stronger blur (longer kernel).
Removes more high-frequency detail.
Good for analysis pyramids (used in computer vision, where a bit of aliasing is okay).
But may blur too much if the image is for direct display.
Cubic filters (parameter a)
Smooth filters based on cubic polynomials.
a=−0.5: smoother, gentler cutoff.
a=−1: sharper cutoff, keeps more detail but risks aliasing.
So you can tune how sharp vs smooth you want.

Cosine-windowed sinc
A sinc function is the ideal low-pass filter (perfect frequency cutoff). But sinc is infinite length → can’t compute.
Solution: cut it off with a cosine window (fade it smoothly). Gives a good practical approximation of ideal filtering.
Higher quality, but more expensive to compute.
QMF-9 (Quadrature Mirror Filter, 9 taps)
Designed for wavelet transforms (like in denoising).
Special property: “self-inverting” → when used in wavelets, downsampling + upsampling can perfectly reconstruct
the signal. But on its own, it aliases more than others. So it’s good for wavelets, not so much for pretty downsampling.
9/7 filter (used in JPEG 2000): A carefully designed wavelet analysis filter. 9 taps and 7 taps (two filters in the pair).
Balances good frequency separation with smoothness. Chosen for JPEG 2000 compression because it keeps details
while limiting aliasing.

Multi-resolution representations:

Gaussian/octave pyramid
To go from one level to the next:
Blur the image a little (so high-frequency details don’t cause aliasing). Shrink the image by 2 in width and height
(keep every other pixel). This shrinking by 2 is why it’s called an octave pyramid (like in music, halving frequency
doubles pitch). Before shrinking, you need a smoothing filter.
Burt & Adelson proposed a five-number filter that looks like: [c,b,a,b,c] where a,b,c are constants chosen so the filter
is smooth and sums to 1.
1/16[1,4,6,4,1]
This is a binomial filter (like the middle row of Pascal’s triangle, scaled). If you apply this blur many times, the result
looks more and more like a Gaussian blur (bell curve shape). So each level is essentially a Gaussian-blurred version of
the previous one.

Laplacian pyramid:
Take the original image I
Blur + shrink → get a lower resolution (Gaussian) image.G1
Upsample this lower resolution image back to the original size (interpolation). This gives you a smoothed version of
the original (no fine details).G1
UP
Subtract this smoothed image from the original → what’s left is only the details (edges, textures, small patterns).
That “details-only” image is called the Laplacian image.L=I-G1
UP
(LAPLACIAN IMAGE)
At each level: store the Laplacian image (band-pass details).
At the very top: store the final blurred small Gaussian image (the coarse base).

Differences of two blurred images (two Gaussians with different blur levels).This is called a Difference of Gaussians
(DoG).
Laplacian of Gaussian:
First blur the image with a Gaussian (G
σ∗I).
Then apply the Laplacian operator (∇
2
) → a second derivative that highlights regions of rapid intensity change
(edges, blobs).

Half-octave pyramid
Instead of shrinking the image by half at every step, you shrink by a smaller amount (like √2 ≈ 1.41 times smaller).
So you get more levels in between. This means the scale change between levels is smaller → the images are closer in
size.
Quincunx sampling
In image processing, sometimes instead of taking every pixel, they take pixels in a checkerboard pattern (like black
squares on a chessboard).This is another way to downsample while keeping useful information. Half-octave pyramids
often use this checkerboard pattern.

Wavelets:
Wavelets are filters that localize a signal in both space and frequency (like the Gabor filter) and are defined over a
hierarchy of scales. Wavelets provide a smooth way to decompose a signal into frequency components without
blocking and are closely related to pyramids.
Wavelet transforms are designed so that the total number of coefficients = the number of pixels in the image.
This is called a tight frame → no redundancy, more compact.
Some wavelet families are also overcomplete (they intentionally keep extra information) because that helps with:
Shiftability (so the representation doesn’t change too much if you slightly move the image),
Orientation steering (being able to capture features at more precise directions).

2D wavelets:
First, you pass the image through filters:
Low-pass filter → keeps the smooth part (low frequencies).
High-pass filter → keeps the details (edges/textures).
Then you decimate (downsample) → reduce the number of pixels
(throw away half in each direction).
So at each stage:
1/4 of the pixels (the smooth, low-frequency part) continue to the
next stage (coarse level).
3/4 of the pixels capture the details at that stage.
After filtering in both horizontal and vertical directions, you get 3 “detail images”:
HL (High–Low) High-pass in horizontal direction, low-pass in vertical. Highlights vertical edges (like walls or tree
trunks).
LH (Low–High) Low-pass in horizontal, high-pass in vertical. Highlights horizontal edges (like horizon lines or
rooftops).
HH (High–High) High-pass in both directions.Captures diagonal/texture details (corners, fine patterns, less common).
Meanwhile, the LL (Low–Low) part (smooth version) gets passed down to the next stage for further decomposition.

Lifting scheme:
Split into evens and odds:
Even samples → base signal (smooth part candidate).
Odd samples → will become detail (edges, differences).
Predict step (make high-pass / detail):
Take the average of the neighboring evens (that’s like a tiny low-pass filter).
Subtract this average from the odd value.
What’s left = high-pass coefficient (detail).
Update step (fix the evens):
add a small fraction of the detail signal back to the evens. This correction makes the even sequence truly smooth →
proper low-pass.update = even + ½ * (sum of neighboring details)
Why this works:
The subtraction makes odds = “detail only” (high-pass).
The correction makes evens = “smooth only” (low-pass).
Together, you’ve perfectly separated the signal into smooth + detail.

To rebuild the signal:
Reverse the steps. Instead of subtracting average, you add it back. Instead of adding detail to evens, you subtract it
back out. You recover the original exactly. You don’t have to stick with a simple “average of neighbors” filter.
First-generation vs. second-generation wavelets
First-generation wavelets: built using traditional filters on a regular grid (like pixels in an image lined up neatly).
Second-generation wavelets (lifted wavelets): Built using the lifting scheme (split, predict, update). Can work on
irregular sampling (e.g., 3D meshes, surfaces in graphics). Can adapt their weights to the problem (e.g., emphasize
edges, smooth textures). Used for tasks like image editing and preconditioning (making optimization algorithms
converge faster).

[10, 15, 14, 20, 18, 22]
Evens: positions 0, 2, 4 → [10, 14, 18]
Odds: positions 1, 3, 5 → [15, 20, 22]
Predict step (create details)
Odd at position 1
Neighbors: Even 0 = 10, Even 2 = 14
Predicted = (10 + 14)/2 = 12
Detail = Actual − Predicted = 15 − 12 = +3
Odd at position 3
Neighbors: Even 2 = 14, Even 4 = 18
Predicted = (14 + 18)/2 = 16
Detail = 20 − 16 = +4
Odd at position 5
Neighbors: Even 4 = 18, next neighbor doesn’t exist → assume boundary = 18 (repeat last)
Predicted = (18 + 18)/2 = 18
Detail = 22 − 18 = +4
[3, 4, 4]

Update step (fix the smooth part)
Even 0 → only right neighbor detail = 3
Updated = 10 + 3/2 = 10 + 1.5 = 11.5
Even 2 → neighbors = left detail 3, right detail 4
Updated = 14 + (3 + 4)/4 = 14 + 7/4 = 14 + 1.75 = 15.75
Even 4 → neighbors = left detail 4, right detail 4 (assume repeated boundary)
Updated = 18 + (4 + 4)/4 = 18 + 2 = 20
[11.5, 15.75, 20]
Low-pass (smooth / approximation): [11.5, 15.75, 20]
High-pass (detail / edges): [3, 4, 4]

Separable wavelets (the usual way)
In standard image wavelets: First filter horizontally (low vs. high). Then vertically.
Result = four sub-bands (low-low, low-high, high-low, high-high).
Steerable pyramids
Alternative representation: Instead of splitting into square chunks of frequency, split into radial wedges (arcs).
Each wedge = one orientation and scale (like edge detectors pointing in different directions).
Properties:
Rotationally symmetric → treats all directions fairly.
Orientation selective → can capture diagonal, curved, or angled textures.
Overcomplete → more coefficients than pixels (so a bit redundant), but avoids aliasing.
Self-inverting → you can reconstruct perfectly.

Application: Image blending
Decompose each image into a Laplacian pyramid:
Low frequencies = smooth, broad color changes.
High frequencies = fine details, edges, textures.
Create a smooth mask (blend region):
Start with a binary mask (left side = apple, right side = orange).
Build a Gaussian pyramid of this mask (so at coarser scales the boundary is very soft, at finer scales it’s sharper).
Blend at each level:
At coarse levels (low frequency): blend smoothly (so colors transition gradually).
At fine levels (high frequency): blend sharply (so textures stay crisp, without ghosting).
Reconstruct the image:
Add the blended Laplacian pyramid levels back up.
The result = a seamless hybrid image (like an “orapple”).

Geometric transformations:You change the positions of pixels (where things appear in the image).
Formula: g(x)=f(h(x))
Example: rotate the whole image by 30°, or warp it so it looks curved.
Parametric transformations:Parametric transformations apply a global deformation to an image, where the behavior
of the transformation is controlled by a small number of parameters.
Two ways to warp an image
Forward warping (bad idea):
Take each source pixel, push it to its new location.
Problems:
New pixel position may not be an integer → where do you put it? Rounding causes jagged, aliased results. Some
destination pixels never get filled → holes and cracks. Fixes like “splatting” (spreading pixel to neighbors) cause blur.
Inverse warping (better idea): For each pixel in the output image → ask: where did this pixel come from in the input?
Look up that location in the source, interpolate if needed.
Advantages:
No holes (every output pixel is filled).
Resampling at non-integer coordinates is a well-studied problem → just use interpolation filters.

Interpolation filters you can use
Nearest neighbor → fastest, but blocky.
Bilinear → smooth, commonly used in real-time graphics.
Bicubic → smoother, better quality.
Windowed sinc → highest quality, but expensive.
Scaling and distortion
If the transformation is just zoom (uniform scale), you can say:
If zooming in → interpolate.
If zooming out → smooth first (anti-aliasing).
But for general transformations (like affine, shear, perspective):
One direction may be stretched (needs interpolation). Another direction may be squashed (needs smoothing to avoid
aliasing).So the filter should adapt to direction → this is anisotropic filtering.

MIP-mapping:
making a pyramid of images, each one smaller (half size in width & height).
Level 0 = original texture (full resolution).
Level 1 = half-size (prefiltered, not just pixel-dropped).
Level 2 = quarter-size.
… until it becomes 1×1. This way, you always have a texture that’s close to the right size for your object.
We compute a resampling rate r (how much we’re scaling):
r > 1 → zooming in (need more detail).
r < 1 → zooming out (need smaller texture).
Then find the “pyramid level” that matches: l=log
2(r) This tells you which MIP level is best.

Instead of picking just one level, we do:
Take the two nearest levels (say level 2 and level 3).
Resample both with bilinear filtering (smooth inside the level).
Blend them together smoothly based on the fractional part of l. 0.7 ×(bilinear sample from level 2) + 0.3 ×(bilinear
sample from level 3).
This is called trilinear interpolation:
Elliptical Weighted Average:For each pixel on screen → look at where it comes from in the texture. That “footprint”
is not a square, but an ellipse. Then, instead of just taking one sample, take a weighted average of all texture pixels
inside that ellipse. And to do the averaging, EWA uses a Gaussian filter (smooth bell-shaped weights).
So pixels near the center of the ellipse count more, far ones count less.
Anisotropic filtering: Instead of sampling with a round (isotropic) filter, it samples along the long axis of the stretched
ellipse.Example: A road in perspective → footprints look like long ovals → filter along the road’s direction.
To do this, GPU:
Looks up textures at different resolutions (fractional MIP-map levels).
Takes multiple samples along that long axis.
Combines (averages) them.

Multi-pass transforms:
For each pixel in the new image:
Pretend the original image is a smooth continuous signal (not just pixels).
Apply the perfect low-pass filter (a sinc function) shaped to match how the pixel footprint is stretched or squashed
(can be skewed/oriented).
Resample at the new grid locations.
This guarantees no aliasing and minimal blur.
We use approximations (like cubic filters) instead of perfect sinc. Instead of doing separate interpolation → filtering
→ resampling steps, they are combined into one efficient digital filter (called a polyphase filter).
Filtering in full 2D is expensive (big kernel).
So:
Break the 2D warp into a sequence of 1D shears and resamplings.
Each shear is like slanting the image slightly, then resampling rows or columns.
This is much cheaper than full 2D convolution.

Mesh-based warping:When we want to warp (deform) an image, global transformations (like rotation, scaling,
affine, perspective) are sometimes not enough. For example, if we want to turn a frown into a smile, we don’t want to
move the whole face —only the corners of the mouth should move upwards, while the rest stays the same. That
requires local deformations → different parts of the image move differently.
There are a few main approaches:
Point-based warping (sparse points → dense field): Pick a few important points (like corners of the mouth, eyes,
nose).
Specify how they move. Fill in (interpolate) how the rest of the pixels should move smoothly between these points.
Triangulation method: connect points with triangles → each triangle moves using an affine transform → gives a
piecewise smooth warp. Alternatively, use quadrilateral meshes, energy-based smoothing (regularization), or radial
basis functions for smoother interpolation.
Line-based warping (morphing with line segments):Instead of points, use pairs of corresponding lines (like mouth
curves, eyebrows). Each line defines how nearby pixels move (translation, rotation, scaling). Influence decreases with
distance: pixels close to the line move strongly, far pixels move weakly.

Application: Feature-based morphing:
It’s a smooth transformation from one image to another (like turning one face into another face in movies). If you just
fade one image into another → you get ghosting (both sets of eyes, mouths, etc. visible at the same time).
Morphing solution
Warp both images so that their key features line up (e.g., move eyes-to-eyes, mouth-to-mouth). Blend them together
once aligned. This removes ghosting because now features overlap before fading.
Making animation (intermediate frames):To go smoothly from image A to image B, you create intermediate images at
times t = 0 → 1.
At t = 0 → 100% image A.
At t = 1 → 100% image B.
At values in between (say t = 0.3) → warp both images partway and blend them with weights (1−t) and t.
So:
At start: no warp, all image A.
At middle: both warped halfway, blended 50-50.
At end: fully warped to image B, all image B.

Thank You

CV module 2.pdfvVdbsbsbsbsbsvsvsvsvwvssvv

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

CV module 2.pdfvVdbsbsbsbsbsvsvsvsvwvssvv

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 28

Slide 29

Slide 30

Slide 31

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx