DSIP — Visual Interactive Study Guide

Digital Signal & Image Processing · drag, click, and watch every concept happen.

Created by Jakariya Abbas

A beginner-friendly companion to a Digital Signal & Image Processing midterm. It turns the dense topics of a typical DSIP course — sampling, convolution, image kernels, gamma — into pictures you can play with, so the ideas stick before the exam.

This is the exam revision guide turned interactive. Move every slider and press every button — the pictures do the explaining. Works fully offline.

Part A

Digital Foundations

How continuous reality becomes numbers a computer can store: analog vs digital, sampling, quantization, frequency, bit depth and pixels.

1. Digital vs Analog

Analog = continuous: every in-between value exists (a ramp, a slope — like a real sound wave). Digital = discrete: only fixed steps exist (a staircase — you stand on step 1 or 2, never 1.5).

A computer can only store digital values, so every real-world signal (sound, light) must be converted to digital first. Drag the slider to crush the smooth ramp into steps.

Analog → (sample + quantize) → Digital

Memory hook: Analog = a ramp (every height exists). Digital = a staircase (only fixed heights).

2. Sampling & Quantization — the conversion pair

These are the two steps that turn analog into digital. They are different and act on different axes:

Sampling = how often you measure → acts on the time (x) axis. "10 samples per second" = read the value 10 times each second.
Quantization = how precisely you record each value → acts on the amplitude (y) axis. Snap each reading to the nearest allowed level.

Drag Sample rate to add/remove dots along time. Drag Levels to make the staircase coarse or fine. Watch what happens when the rate gets too low.

Levels = 2^bits Nyquist: sample rate ≥ 2 × (highest frequency)

Why exactly 2×? A sine wave has two things to pin down each cycle — its peak and its trough — so you need a little more than two samples per cycle to lock onto its true frequency. Sample any slower and several different waves fit the same dots, so the math can no longer tell them apart. This is the Nyquist–Shannon sampling theorem: capture everything below half the sample rate (the Nyquist frequency), lose everything above it.

What aliasing really is. A frequency that is too high doesn't just disappear — it folds back and reappears as a lower frequency of f_s − f, impersonating a wave that was never there. You see this everywhere: car wheels that spin backwards on video (the wagon-wheel effect), shimmering moiré patterns on striped shirts in photos, and it's why CD audio samples at 44.1 kHz — just over twice the ~20 kHz limit of human hearing. To prevent it, real systems run an anti-aliasing (low-pass) filter that removes the too-high frequencies before sampling.

Exam trap — aliasing: If you sample slower than 2× the fastest wiggle, the dots trace a fake slower wave that was never there. That false wave is aliasing. Below Nyquist you don't just lose detail — you invent wrong data.

Memory hook: Sampling = how often (time). Quantization = how precise (amplitude).

Go deeper: Nyquist–Shannon theorem (Wikipedia) · The Scientist & Engineer's Guide to DSP, ch.3

3. Frequency & Period

Frequency = how many full cycles per second, in Hertz (Hz).
Period = time for one cycle = 1 / frequency.

Why this matters in DSIP: frequency is what Nyquist (Section 2) is measured against — you must sample at least twice the highest frequency present. A faster wave is harder to capture, which is exactly why high-pitched sound and fine image detail need higher sample rates.

Drag the frequency. Notice period is just its reciprocal — they move opposite ways.

period = 1 / frequency e.g. 10 Hz → period = 1/10 = 0.1 s

Memory hook: Faster wave → higher frequency → shorter period. They are reciprocals.

4. Bit Depth → Levels (Accuracy vs Memory)

This is the second half of quantization (Section 2): the number of levels you snap each sample to is set by the bit depth. Each extra bit doubles the number of shades you can store. More levels = smoother, more accurate — but more storage. This is the core trade-off: Accuracy ↔ Memory. Too few bits and smooth gradients break into visible stripes (banding) — drag the slider down to 1–2 bits to see it.

Bits	Levels (2^bits)	Look
1	2	black / white only
3	8	visible banding
8	256	smooth to the eye

Memory hook: Levels = 2^bits. 8 bits = 256 shades — the standard for a smooth image.

5. Images: Pixels, Bit Depth & RGB

An image = a grid of pixels. A 4×4 grid = 16 pixels, stored as a 2D matrix.

Grayscale: each pixel = one value. 3-bit → 2³ = 8 shades, values 0 (black) … 7 (white).
Color (RGB): each pixel = three values (Red, Green, Blue), each 0–255. e.g. rgb(128,240,192) = medium red, high green, light blue.

Storage cost follows directly: colour needs 3× the memory of grayscale because every pixel stores three numbers instead of one. That single fact drives the size calculations in Section 14.

Drag the three sliders to mix a color. Click any cell in the grid to paint it.

Memory hook: Grayscale = 1 number per pixel. RGB = 3 numbers per pixel (0–255 each).

Part B

Signals & Systems

The language of discrete signals and the two building blocks — the impulse δ[n] and the step U[n] — plus feedback, the idea behind every echo and filter.

6. Discrete Signals & Operations

Continuous uses round brackets x → f(x) = y. Discrete uses square brackets n → f[n], where n = the sample index (often time) and f[n] = the amplitude at that sample. No values exist between whole n.

Golden rule: Inside the bracket changes the X-axis (time/position). Outside the bracket changes the Y-axis (amplitude).

Pick an operation and compare the original (grey) to the result (blue).

Operation	Effect
`f[n−2]`	shift right (minus = right)
`f[n+2]`	shift left (plus = left)
`f[2n]`	compress / squeeze
`f[n/2]`	stretch / spread out
`f[−n]`	flip (mirror on Y-axis)
`2·f[n]`	amplify (taller — outside bracket)

Exam trap: f[n−2] shifts right, not left. Minus inside = delay = move toward larger n.

7. Unit Sample / Delta Signal δ[n]

A single spike at 0 — the simplest possible signal, and the building block of every signal.

δ[n] = 1 when n = 0 δ[n] = 0 otherwise

Any signal can be written as a sum of shifted, scaled deltas. Add some below and watch the composite signal appear.

Example from the notes: δ[n+1] + 3δ[n] + δ[n−1] + (−2)δ[n−2] — each term is one spike at a position, scaled by a number. Press the preset to load it.

This "every signal is a stack of shifted spikes" idea is the building-block (sifting) property, and it is what makes the delta so important. Picture a signal as a row of values; each value is simply one impulse sitting at that index, scaled by that value. So any input is just a weighted sum of deltas.

Now the payoff: if a system is linear and time-invariant, feeding it one impulse gives a fixed output called the impulse response h[n]. Because the input is a sum of shifted impulses, the output is the same sum of shifted impulse responses. Measure how the system answers a single spike and you can predict its answer to everything — that summation is convolution (Section 10).

Why it matters: If you know how a system reacts to one spike, you can predict its output for any input. That reaction is the impulse response — the heart of convolution (Section 10).

Go deeper: MIT OCW 6.003 — Signals & Systems · Impulse / Dirac delta (Wikipedia)

8. Step Signal U[n]

A signal that is "off," then turns on and stays on.

U[n] = 0 when n < 0 U[n] = 1 when n ≥ 0

Two key relationships connect the step to the delta:

U[n] = δ[n] + δ[n−1] + δ[n−2] + … (a step = sum of all deltas from 0 onward) U[n] − U[n−1] = δ[n] (the difference of a step = one spike)

These two lines are the discrete version of integration and differentiation. A running sum of impulses keeps adding 1 from n=0 onward, building the flat plateau of the step (like integrating). The first difference — this sample minus the previous one — is zero everywhere the step is flat and jumps to 1 only at the single instant it switches on, handing the impulse back (like differentiating). Summing and differencing are inverse operations, which is exactly why δ and U are inverses of each other.

Memory hook: Add up deltas → you get a step. Take the difference of a step → you get back one delta.

Go deeper: Unit step function (Wikipedia)

9. Feedback Loop & Geometric Series

A feedback system feeds its output back into its input, multiplied by a relay factor R. The input keeps echoing back, each time smaller — like sound echoing in a room.

x + Y·R = Y → Y(1 − R) = x → Y = x / (1 − R) = x·(1 + R + R² + R³ + …)

Drag R. The bars are successive echoes; the running total converges to x/(1−R).

Where the formula comes from. The output is the input plus a delayed copy of itself: Y = x + R·Y. Collect the Y terms — Y − R·Y = x, so Y(1 − R) = x — and divide to get Y = x/(1 − R). Expanding 1/(1−R) as the geometric series shows the same answer as a stack of fading echoes: the original, then R as loud, then R², and so on forever.

This is a real IIR (infinite impulse response) filter — the maths behind digital echo and reverb. A single input click comes back an infinite number of times, each quieter than the last, because the output is fed back in. It only stays stable while each echo is smaller than the one before.

Exam trap: The series only settles (converges) when |R| < 1. If R ≥ 1 the echoes grow forever — the system blows up (unstable).

Memory hook: 1/(1−R) = 1 + R + R² + … — the geometric series. Closer R is to 1, the louder the total echo.

Go deeper: Geometric series (Wikipedia) · IIR filters (Wikipedia)

Part C

Convolution & Filtering

The single operation behind blur, sharpen, edge-detection and neural networks: slide a small filter over a signal or image and combine.

10. Convolution (the core concept)

Convolution = slide one signal over another and measure how much they overlap at each position. It's how every filter — blur, sharpen, edge-detect, and every conv-net layer — actually works.

Continuous: (f * g)(t) = ∫ f(τ)·g(t − τ) dτ Discrete: y[n] = Σ x[k]·h[n − k] x = input, h = kernel/filter, y = output

The intuition: at each position, the kernel asks “how much of my shape is present in the signal right here?” Line the two up, multiply point-by-point, add it all into one number, then shift by one and repeat. Where the signal locally matches the kernel the sum is large; where it doesn't, the products cancel toward zero. That single multiply-and-add, swept across every position, is the whole operation.

Press Step to slide the flipped kernel one position at a time. Each output value is the sum of the aligned products.

Why the kernel "flips": the h[n−k] term reverses the kernel by 180° before sliding (note the minus). The flip is what makes convolution commutative (f∗g = g∗f) and associative — the properties that let filters be reordered and combined. Slide without flipping and you get cross-correlation instead; that's actually what deep-learning "conv" layers compute, and it doesn't matter there because the kernel weights are learned (they'd just learn the mirror). For the symmetric kernels of Section 11 the flip changes nothing — but the exam loves to ask why it's there.

Edges: near the start and end the kernel hangs off the signal, so you must decide what's "outside" — pad with zeros, repeat the edge value, or wrap around. Different choices give slightly different border pixels.

Fast trick — the diagonal table

Convolving two short lists = multiplying polynomials. Build a multiply table, then sum the diagonals.

Convolve [1,2,3] * [4,5,6] → diagonals 4 · (5+8) · (6+10+12) · (15+12) · 18 → [4, 13, 28, 27, 18].

Go deeper: 3Blue1Brown — “But what is a convolution?” · Convolution (Wikipedia)

11. Image Convolution Kernels (blur / sharpen / edge)

A 2-D kernel is a tiny weight grid swept over every pixel. At each spot, multiply the 3×3 neighborhood by the weights and add them up — that sum is the new center pixel. Pick a kernel and hover any output cell to see its math.

Blur (Gaussian, ×1/16): 1 2 1 2 4 2 1 2 1

Sharpen: 0 −1 0 −1 5 −1 0 −1 0

Edge detect (Laplacian): 0 −1 0 −1 4 −1 0 −1 0

From the notes — a flat patch (all 100) through the edge kernel: (−1·100)·4 + (4·100) = −400 + 400 = 0 → flat areas vanish. The sharpen kernel on the same patch: −400 + 500 = 100 → the pixel survives (5 in the center instead of 4).

Read the weights to predict the effect. When the weights sum to 1, a flat region keeps its brightness — the kernel is just a weighted average (blur). When they sum to 0, flat regions cancel to black and only places where neighbours differ — edges — survive. Sharpening is "original + a dose of the edges," which is why its centre weight is bumped to 5 instead of 4.

Two practical notes: at the image border the 3×3 window hangs off the edge, so you pad (repeat or zero the missing pixels) — here we repeat the edge. And the Gaussian blur is separable: a 2-D blur equals a 1-D blur across rows followed by a 1-D blur down columns, which is far cheaper to compute than the full grid.

Memory hook: Kernel weights sum to 1 → brightness preserved (blur). Sum to 0 → flat areas go black, only edges remain (edge-detect).

Go deeper: Explained Visually — Image Kernels · Stanford CS231n — Convolutional Networks

12. Neuron = Weighted Sum

A single neuron does exactly what a convolution tap (Section 10) does: multiply each input by a learned weight, add them up, add a bias. Stack these and you get a neural network; slide a tiny one over an image and you get a convolutional network. This is the link between classic DSIP and modern AI image processing: the hand-designed kernels of Section 11 become weights the network learns for itself.

output = Σ (Wᵢ · Xᵢ) + b

Memory hook: Convolution kernel ≈ a neuron's weights slid across the image. Same multiply-add, repeated everywhere.

Part D

Storage & Data Rates

Putting sampling and bit depth together to answer the practical exam question: how many bytes does this audio or image actually cost?

13. Audio Data-Rate Calculator

Storage per second = samples/sec × bytes/sample. Slide the controls to see the cost.

bytes/sec = sample_rate × (bits ÷ 8) 8000 samples × 16 bit = 8000 × 2 bytes = 16 KB/s 8000 samples × 8 bit = 8000 × 1 byte = 8 KB/s

Memory hook: Halve the bit depth → halve the file. Same for the sample rate.

14. Image Size / Bit-Budget Calculator

An RGB image needs width × height × 3 values. Given a fixed byte budget, how many bits per value can you afford?

values = W × H × channels 640 × 480 × 3 = 921,600 values budget 256,000 bytes × 8 = 2,048,000 bits 2,048,000 ÷ 921,600 ≈ 2.22 → 2 bits per value

Memory hook: Always convert bytes → bits (×8) before dividing by the number of values.

Part E

Image Intensity & Display

Remapping pixel brightness — point operations, normalization and gamma — and where every topic sits on the camera-to-eye pipeline.

15. Intensity Transform S = T(r)

These are point operations: each pixel's new value depends only on its own old intensity r, run through a transfer function T — neighbours are ignored (unlike the kernels of Section 11). The shape of the curve is the operation. Pick a preset and read the shape.

Beyond the linear presets below, two non-linear curves are exam favourites. A log transform S = c·log(1 + r) stretches the dark end and compresses the bright end, pulling faint detail out of shadows. A power-law (gamma) transform S = c·r^γ does the same job with a tunable strength — and is exactly the gamma curve of Section 17. Stretching a narrow band of inputs across the full output range is called contrast stretching.

Preset	T(r)	Effect
Brighten	S = r + k	shifts curve up
Contrast	S = (r − mid)·g + mid	steeper slope
Negative	S = max − r	flips the curve

Memory hook: The slope of T is contrast; the vertical shift is brightness; a downhill line is a negative.

Go deeper: Histogram equalization (Wikipedia) · Gonzalez & Woods — Digital Image Processing (ch.3)

16. Normalization / Requantization

Stretch any range of raw values into a clean 0 … (2^bit − 1) range — used to fit data into a fixed bit depth.

S = ((f − f_min) / (f_max − f_min)) × (2^bit − 1)

This is min–max contrast stretching: find the darkest and brightest values actually present, map the darkest to 0 and the brightest to the maximum, and spread everything in between. A washed-out image whose values only span 90–160 gets re-stretched across the full 0–255, restoring punch. It's also how you re-fit data into a chosen bit depth (Section 4) — the 2^bit − 1 term sets the new ceiling. The "min" and "max" usually come from reading the image's histogram.

Drag a raw value and the output range; see where it lands.

Memory hook: Subtract the min, divide by the span (0–1), then scale up to the new max. "Shift, squeeze, stretch."

Go deeper: Image normalization (Wikipedia)

17. Gamma Correction

Displays and eyes are non-linear, so intensities are reshaped by a power curve before display. Drag γ and watch both the curve and the gradient strip change.

S = a · r^(1/γ) (r normalized to 0…1)

The real reason gamma exists. Human vision is far more sensitive to changes in dark tones than bright ones — doubling a dim light is very noticeable; doubling an already-bright one barely registers. If you stored brightness with evenly-spaced steps (linear), you'd waste most of your codes on highlights the eye can't tell apart, while the shadows — where the eye is fussy — would show ugly banding.

So images are saved with an encoding gamma of about 1/2.2 (≈ 0.45), which packs more code values into the dark region where they're needed; the display then applies the inverse decoding gamma of about 2.2 to recover true light output. The two cancel, and the limited 8 bits are spent where your eye actually cares. The common sRGB standard follows roughly a 2.2 curve (with a small linear segment near black). That's why the exponent is 1/γ, not γ — you're pre-distorting to undo the display.

Memory hook: γ > 1 brightens mid-tones (curve bulges up); γ < 1 darkens them. γ = 1 is a straight line (no change). Encode at ≈1/2.2, the screen decodes at ≈2.2 — they cancel.

Go deeper: Cambridge in Colour — Understanding Gamma · Gamma correction (Wikipedia)

18. The Imaging Pipeline

Every concept above lives somewhere on this path from scene to perception. Use it as a one-glance revision map: if you can place each topic on the pipeline and say why it happens there, you understand the course.

Big picture: Sampling & quantization happen at the camera (Section 2); bit-depth/compression at store (Section 4, Section 14); gamma at display (Section 17); and the non-linear eye is why gamma exists at all.

Resources

Go Deeper — Curated Study Links

Reputable, mostly-free sources to take each topic further. Links open in a new tab.

Digital Foundations

1. Digital vs Analog

2. Sampling & Quantization — the conversion pair

3. Frequency & Period

4. Bit Depth → Levels (Accuracy vs Memory)

5. Images: Pixels, Bit Depth & RGB

Signals & Systems

6. Discrete Signals & Operations

7. Unit Sample / Delta Signal δ[n]

8. Step Signal U[n]

9. Feedback Loop & Geometric Series

Convolution & Filtering

10. Convolution (the core concept)

Fast trick — the diagonal table

11. Image Convolution Kernels (blur / sharpen / edge)

12. Neuron = Weighted Sum

Storage & Data Rates

13. Audio Data-Rate Calculator

14. Image Size / Bit-Budget Calculator

Image Intensity & Display

15. Intensity Transform S = T(r)

16. Normalization / Requantization

17. Gamma Correction

18. The Imaging Pipeline

Go Deeper — Curated Study Links

Sampling, Nyquist & Aliasing

Signals & Systems

Convolution & Kernels

Image Intensity & Gamma