Skip to content

CS4243 Final notes

Basic operations

Pixel‐wise Operations

Transform a into b: b(x,y)=F(a(x,y))b(x, y) = F\left(a(x, y)\right)

E.g: Histogram manipulation, Grayscale modification, Image normalization

Grayscales modification

Find max and min gray level: max(a),min(a)\max(a), \min(a)

Dynamic range of a: [min(a),max(a)]\left[ \min(a), \max(a) \right]

Change to [newMin,newMax]\left[ newMin, newMax \right] by:

b(x,y)=[a(x,y)min(a)max(a)min(a)×(newMaxnewMin)]+newMinb(x, y) = \left[ \frac{a(x, y) - \min(a)}{\max(a) - \min(a)}\times (newMax-newMin) \right] + newMin

Brightness: The measured intensity of all the pixels comprising an ensemble that constitutes the digital image after it has been captured, digitized, and displayed.

Contrast: The amount of color or grayscale differentiation that exists between various image features in both analog and digital images.

Histogram

Application: Histogram equalization, Histogram‐based color reduction, Used as a feature vector, Template matching

Histogram equalization

  1. Original image has got N pixels, G gray levels, and should be converted to an image with K gray levels with uniform distribution.
  2. Start from the lowest gray level, g=0, k=0.
  3. Accumulatively count the number of pixels in gray levels and increase g, g++.
  4. When the accumulated number of pixels gets to N/K, assign a new gray level, k, to that group.
  5. g++, k++, number of pixels counter=0.
  6. Continue from 3

Template matching algorithm

  1. Key image k, its histogram h_k, input image a, its histogram h_a.
  2. The Numbers of gray levels are equal in k and a.
  3. Apply histogram equalization on a, results are a1, h_a1.
  4. Use cumulative distribution of gray levels in k, try to put the equal number of pixels in the bins of a2, a -> a1 -> a2. the goal is to maximize the similarity between cumulative distribution of h_k and h_a2.

Convolution

Properties: Commutative → fg=gff*g=g*f. Associative → f(gh)=(fg)hf*(g*h) = (f*g)*h. Distributive → f(g+h)=fg+ghf*(g+h) = f*g + g*h

Lowpass filter

Make the image less noisy but blur.

hl1=19[111111111]h_{l1} = \frac{1}{9}\begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix}

Gaussian filter: G(x,y)=12πσ2exp(x2+y22σ2)G(x,y)=\frac{1}{2\pi \sigma^2} \exp{(-\frac{x^2+y^2}{2\sigma^2})}

hl2=1273[1474141626164726412674162616414741]h_{l2} = \frac{1}{273} \begin{bmatrix} 1 & 4 & 7 & 4 & 1 \\ 4 & 16 & 26 & 16 & 4 \\ 7 & 26 & 41 & 26 & 7\\ 4 & 16 & 26 & 16 & 4 \\ 1 & 4 & 7 & 4 & 1 \end{bmatrix}

Padding: Zero padding vs. Average padding vs. Random padding

Stride: #steps we are moving in each step in convolution (1 by default)

m*m image convolve with n*n kernel, output is of size (m-n+1) * (m-n+1)

  • If padding p then (m + 2*p - n + 1)*(m + 2*p - n + 1)
  • If stride s then ((m + 2*p - n + 1)/s + 1)*((m + 2*p - n + 1)/s + 1)

Median Filter

Can use mode instead. Good in dealing with dot (salt & pepper) noises. No unsharping/blurring effect. Patch size variant

Noise

White noise: a random signal having equal intensity at different frequencies, giving it a constant power spectral density.

Presence of noise in an image might be additive or multiplicative.

Dot noise: Mostly result of the multiplication of a [0,1] or [1,255] matrix with your image. Could be due to channel disconnection or saturation.

Signal to Noise Ratio:

SNR(a)=10log10(Power(a)Power(n))SNR(a) = 10\log_{10}{\left(\frac{Power(a)}{Power(n)}\right)}

Calculated in dBdB, Noise power is often calculated by Power(n)=σ2(n)Power(n) = \sigma^2(n)

Highpass Filter

Typically a gradient or differentiation operator, tend to be differential and derivative operators (1st or 2nd order).

hh1=[1111],hh2=[111181111]h_{h1} = \begin{bmatrix} 1 & -1\\ -1 & 1 \end{bmatrix},\quad h_{h2} = \begin{bmatrix} -1 & -1 & -1\\ -1 & 8 & -1\\ -1 & -1 & -1 \end{bmatrix} hh3=[111191111],hh2=[010141010]h_{h3} = \begin{bmatrix} -1 & -1 & -1\\ -1 & 9 & -1\\ -1 & -1 & -1 \end{bmatrix},\quad h_{h2} = \begin{bmatrix} 0 & -1 & 0\\ -1 & 4 & -1\\ 0 & -1 & 0 \end{bmatrix}

Discrete 1st derivative: fx=f(x+1)f(x)\frac{\partial f}{\partial x} = f(x+1) - f(x)

Discrete 2nd derivative: 2fx2=f(x+1)+f(x1)2f(x)\frac{\partial^2 f}{\partial x^2} = f(x+1) + f(x-1) - 2f(x)

Laplacian:

2f=2fx2+2fy2=[f(x+1,y)+f(x1,y)+f(x,y+1)+f(x,y1)]4f(x,y)\nabla^2f = \frac{\partial^2f}{\partial x^2} + \frac{\partial^2f}{\partial y^2} \\= \left[ f(x+1,y) + f(x-1,y) + f(x, y+1) + f(x,y-1) \right] - 4f(x,y)

Laplacian Filter: h=[010141010]h = \begin{bmatrix} 0& 1& 0\\ 1& -4& 1\\ 0& 1& 0 \end{bmatrix}

Untitled

Sobel Operator

hsy=[121000121],hsx=[101202101],h_{sy} = \begin{bmatrix} -1& -2& -1\\ 0& 0& 0\\ 1& 2& 1 \end{bmatrix},\quad h_{sx} = \begin{bmatrix} -1& 0& 1\\ -2& 0& 2\\ -1& 0& 1 \end{bmatrix}, Gx=ahsx,Gy=ahsyG_x = a*h_{sx}, G_y = a*h_{sy}

Edge strength map: G=Gx2+Gy2|G| = \sqrt{G_x^2 + G_y^2}

Edge direction map: G=arctan(GyGx)\angle G = \arctan{\left(\frac{G_y}{G_x}\right)}

Apply thresholding (TT is the threshold):

Untitled

Many things can be done with G|G| and G\angle G: Finding the strongest edges and their direction, border tracking and vectorization

Diagonal edge detection:

hs135=[210101012],hs135=[012101210]h_{s135} = \begin{bmatrix} -2 &-1 &0\\ -1 &0 &1\\ 0 &1 &2 \end{bmatrix},\quad h_{s135} = \begin{bmatrix} 0 &1 &2\\ -1 &0 &1\\ -2 &1 &0 \end{bmatrix}

→ Mathematically it’s a bit problematic due to the lack of orthogonality, but practically it’s not bad.

Unsharp masking

Edge enhancement algorithm

Unsharp Masking

Untitled

Image Energy, Power, and Entropy

Energy(a)=ija2(i,j)Power(a)=1MNija2(i,j)Entropy(a)=kPklog2PkEnergy(a) = \sum_i\sum_j{a^2(i, j)}\\ Power(a) = \frac{1}{MN}\sum_i\sum_j{a^2(i, j)}\\ Entropy(a) = -\sum_{k}P_k\log_2{P_k}

PkP_k is the probability associated with gray level kk. Entropy of an img measures the degree of randomness in the img → noisy img has higher entropy than original img

Image transform

Fourier transform

Continuous:

F(f(x,y))=F(u,v)=f(x,y)ej2π(ux+vy)dx,dyF1(F(u,v))=f(x,y)=F(u,v)ej2π(ux+vy)du,dvF(f(x,y)) = F(u, v) = \int\limits_{-\infty}^{\infty}\int\limits_{-\infty}^{\infty}{f(x,y) e^{-j2\pi (ux+vy)}dx, dy}\\ F^-{1}(F(u,v)) = f(x, y) = \int\limits_{-\infty}^{\infty}\int\limits_{-\infty}^{\infty}{F(u,v) e^{-j2\pi (ux+vy)}du, dv}

Discrete:

F(u,v)=1MNx=0M1y=0N1f(x,y)exp[j2π(uxM+vyN)]f(x,y)=u=0M1v=0N1F(u,v)exp[j2π(uxM+vyN)]F(u,v) = \frac{1}{MN}\sum_{x=0}^{M-1}\sum_{y=0}^{N-1} {f(x,y)\exp{\left[ -j2\pi(\frac{ux}{M} + \frac{vy}{N}) \right]}}\\ f(x,y) = \sum_{u=0}^{M-1}\sum_{v=0}^{N-1} {F(u,v)\exp{\left[j2\pi(\frac{ux}{M} + \frac{vy}{N}) \right]}}

where:

u=0,1,2,,M1,v=0,1,2,,N1,and x=0,1,2,,M1,y=0,1,2,,N1u = 0, 1, 2, \dots, M-1, v = 0, 1, 2, \dots, N-1, and\ x = 0, 1, 2, \dots, M-1, y = 0, 1, 2, \dots, N-1

FFT is a one-to-one and invertible transform

Untitled

Power spectrum density: PSD = Magnitude of Fourier transformed

Phase → which components are present (more important). Magnitude → contributions of each components.

Untitled

Untitled

Untitled

Frequency domain filters

Ideal:

H(u,v)={1,D(u,v)D00,otherwiseH(u,v) = \begin{cases} 1, &D(u, v) \leq D_0\\ 0, &otherwise \end{cases}

Untitled

Gaussian:

H(u,v)=exp(D2(u,v)2D02)H(u, v) = \exp{\left(-\frac{D^2(u,v)}{2D_0^2} \right)}

Untitled

Butterworth:

H(u,v)=11+(D(u,v)D0)2nH(u,v) = \frac{1}{1 + \left(\frac{D(u,v)}{D_0}\right)^{2n}}

Untitled

n = order of the Butterworth filter

D(u,v)D(u,v) = distance to the (u,v)(u, v) frequency axes origin

D0D_0= filter bandwidth or coordination of the cut‐off point

Filter matrix size = image size = Fourier Transform matrix size

Cut-off point: where the magnitude of the filter raeches 0.5

Corresponding high pass filter obtains by 1‐LPF

Ideal filter → cause sidelobes and distortion

Lowpass Filter

Untitled

Untitled

Walsh/Hadamard Transform

Untitled

Use real square wave kernels → faster and computationally lighter

Hadamard transform: matrix-based Walsh transform, applicable when the input image is 2n×2n2^n\times 2^n

User term ‘Sequence’ instead of frequency, where S=2F

Hadamard Transform

Natural Hadamard matrix: H2=[1111],H4=[1111111111111111]H_2=\begin{bmatrix} 1 &1\\ 1 &-1 \end{bmatrix},\quad H_4=\begin{bmatrix} 1 &1 &1 &1\\ 1 &-1 &1 &-1\\ 1 &1 &-1 &-1\\ 1 &-1 &-1 &1 \end{bmatrix}

HN=[HN/2&HN/2HN/21N/2]H_N=\begin{bmatrix} H_{N/2} \&H_{N/2}\\ H_{N/2} &-1_{N/2}\end{bmatrix}

Natural Hadamard transform: W=HfHTW = HfH^{T}

Since H=HT=H1H = H^T = H^{-1}, W=HfHW = HfH and f=HWHf = HWH.

A 1N2\frac{1}{N^2} coefficient is necessary for the transform

Untitled

Sequence-ordered Hadamard matrix (reorder rows in increasing number of sign changes):

H4=[1111111111111111] W4=[1111111111111111]H_4=\begin{bmatrix} 1 &1 &1 &1\\ 1 &-1 &1 &-1\\ 1 &1 &-1 &-1\\ 1 &-1 &-1 &1 \end{bmatrix}\ \rarr W_4=\begin{bmatrix} 1 &1 &1 &1\\ 1 &1 &-1 &-1\\ 1 &-1 &-1 &1\\ 1 &-1 &1 &-1\\ \end{bmatrix}

W[0,0]W[0, 0] is the sum of all pixels in the original images

Discrete Cosine Transform

Untitled

Cosine transform kernel functions are not complex.

Hough Transform

Line in img space: y=mx+ny=mx+n → Point in parameter space: (n,m)(n, m)

Point in img space (x1,y1)(x_1, y_1) → Line in parameter space n=x1m+y1n = -x_1m + y_1

Colinear points → Intersecting lines

Untitled

At each point of the (discrete) parameter space, count how many lines pass through it using an array of counters. The higher the count, the more edges are collinear in the image space. Find a peak in the counter array.

Practical difficulties:

  • The slope of the line is \<m\<-\infty \< m \< \infty → Param space is infinite
  • y=mx+ny=mx+n does not express lines of the form x=kx=k

Untitled

Solution: Use normal equation of a line ρ=xcosθ+ysinθ\rho = x\cos{\theta} + y\sin{\theta}

The new param space is finite as 0\<ρ\<D0\<\rho\<D where D is the image diagonal, and 0\<θ\<π0\<\theta\<\pi

It can represents all line: y=ky=k with ρ=k,θ=90°\rho=k, \theta=90\degree and x=kx=k with ρ=k,θ=0°\rho=k, \theta=0\degree

Untitled

A point in image space is not represented as a sinusoid:

Hough Transform Algorithm

Input is an edge img: E[i, j] = 1 for edges

Discretize θ\theta and ρ\rho in increments of dθd\theta and dρd\rho. Let A[R, T] be an array of integer accumulators initialized to 0

For each pixel E[i, j] = 1 and h = 1, 2, 3, ..., T do

  • ρ=icos(h\*dθ)+jsin(hdθ)\rho = i\cos{(h\*d\theta)} + j\sin{(hd\theta)}
  • Find the closest integer k corresponding to ρ\rho
  • Increment counter A[h, k] by 1

Find local maxima in A[R, T]

→ Can speed up if we now the orientation θ\theta of the edge, can allow orientation uncertainty by incrementing a few counters around the nominal counter.

→ Can be generalized for curves y=f(x,a1,a2,,ap)y = f(x, a_1, a_2, \dots, a_p) where a1,a2,,apa_1, a_2, \dots, a_p are the parameters.

Color spaces

RGB color space

Untitled

RGB values normalized to (0,1)(0, 1)

Human perceives gray for triples on the diagonal

“Pure” color on corners

HSI/HSV color space

Untitled

Untitled

H and S ancodes chromaticity. Hue is defined by the angle. S models the purity of the color (S=1 for a completely pure or saturated color and S=0 for a shade of gray)

YIQ and YUV for TV signals

Have better compression properties. Luminance Y encoded using more bits than chrominance values I and Q, humans are more sensitive to Y than I, Q.

Luminance Y=0.3R+0.59G+0.11BY = 0.3R + 0.59G + 0.11B → often used for color to grayscale conversion

CIELAB color space

Untitled

L*: Lightness

a*: Red/Green value

b*: Blue/Yellow value

+a direction → shift towards red

+b direction → shift towards yellow

L=0 → black or total absorbtion

Center of the plane is gray or neutral

Hue: H=arctanabH = \arctan{\frac{a}{b}}, Chroma: C=a2+b2C=\sqrt{a^2 + b^2}

Texture Analysis

Categories

Categories

Statistical methods

Local binary pattern (LBP)

Untitled

Gray level co-occurrence matrix (GLCM)

Untitled

Usually Θ=0°,90° or Θ=0°,45°,90°,135°\Theta = {0\degree, 90\degree} \text{ or } \Theta = {0\degree, 45\degree, 90\degree, 135\degree} (clockwise) and d=1,2,3 or d=1,4,26d = { 1, 2, 3 } \text{ or } d={1, 4, 26}

Functions to be applied on GLCMs:

f1=Maximum=maxi,jΦ(i,j)f2=Energy=i,jΦ(i,j)2f3=Entropy=i,jΦ(i,j)log(Φ(i,j))f4=Correlation=i,j(iμi)(jμj)Φ(i,j)σiσjf5=Inverse Difference Moment=i,jΦ(i,j)1+(ij)2f6=Inertia=i,j(ij)2Φ(i,j)f_1 = Maximum = \max_{i,j}{\Phi(i,j)}\\ f_2 = Energy = \sum_{i,j}{\Phi(i,j)^2}\\ f3 = Entropy = \sum_{i,j}{\Phi(i,j)\log{\left(\Phi(i,j)\right)}}\\ f_4 = Correlation = \sum_{i,j}{\frac{(i-\mu_i)(j-\mu_j)\Phi(i,j)}{\sigma_i\sigma_j}}\\ f_5 = Inverse\ Difference\ Moment = \sum_{i,j}{\frac{\Phi(i,j)}{1 + (i-j)^2}}\\ f_6 = Inertia = \sum_{i,j}{(i-j)^2\Phi(i,j)}

f1,f2f_1, f_2 are basic statistic of Φ\Phi. Entropy f3f_3 measures the texture homogeneity. Correlation f4f_4 measures img linearity metric, large correlation in direction θ\theta → linear directional structures in direction θ\theta. IDM f5f_5 measures the extent to which the same tone tends to be neighbors. Inertia (or Contrast) f6f_6 is a texture dissimilarity measure.

Signal processing methods

Karhonen‐Loeve Transform

Using nn sliding windows, nn local neighbourhoods of the image can be extracted and rearranged as different observations of data into a k*n**2 matrix (k is the number of sliding windows). Neighborhood size is typically n=3,5,7n=3,5,7

The covariance matrix is then computed, and the eigenvalues and eigenvectors of that population matrix are obtained:

C(x)=E[(xμx)(xμx)T](C(x)λxI)e=0C(x) = E\left[(x-\mu_x)(x-\mu_x)^T\right]\\ (C(x) - \lambda_xI)e = 0

A nn rearrangement of the eigenvectors could be interpreted as a bank of adapted filters of the same size, which optimally cover all nn relations of the test image pixels.

Detail images can be ontained by 2D spatial domain convolution of the test image by the members of the eigenfilter bank:

DiA=AFiA,1in2D^A_i = A\otimes F^A_i, \quad 1 \leq i \leq n^2

Then, to extract the feature vector, a few statistics of each detail image would be computed and used.

Untitled

Ring/Wedge Filters

Untitled

Wedge filter can etract info about edge in perpendicular direction in spatial domain. (e.g. vertical line for (c-1))

MSMD - Wavelet

L1 + H1 = Image

L1 + H1 = Image

Gaussian (a) and Laplacian(b) detail of an img

Gaussian (a) and Laplacian(b) detail of an img

Filters employed for wavelet feature extraction

Filters employed for wavelet feature extraction

MSMD - Gabor filtering

12 filters

12 filters

Untitled

Intersection of a ring filter and a wedge filter in frequency domain

ANN and Deep Learning

Untitled

Overfitting

Degree of freedom F0F_0, number of constraints CC. To avoid loss of denerality in the system, CC must be kk times more than F0F_0, k=4,5,10k=4,5,\dots10CkF0C \geq kF_0

In a NN, CC is the number of training examples,, and F0F_0 is the number of amendable parameters (mostly weights and biases)

Generative adversarial network

The generator’s objective is to model or generate data that is very similar to the training data.

The adversarial stage tries to classify input data; that is, given the features of an instance of data, they predict a label or category to which that data belongs.

Recurrent Neural Network

A class of artificial neural networks where connections between nodes form a directed or undirected graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior.

Can use their internal state (memory) to process variable‐length sequences of inuts.

LSTM and GRU

Untitled

LSTM (Long short term memory) and GRU (Gated recurrent unit) have gates as internal mechanism, which control what info to keep and what info to throw out. By doing this, LSTM and GRU networks solve the exploding or vanishing gradient problem. These gates can learn which data in a sequence is important to keep or throw away.