DSIP — Visual Interactive Study Guide
Digital Signal & Image Processing · drag, click, and watch every concept happen.
A beginner-friendly companion to a Digital Signal & Image Processing midterm. It turns the dense topics of a typical DSIP course — sampling, convolution, image kernels, gamma — into pictures you can play with, so the ideas stick before the exam.
Digital Foundations
How continuous reality becomes numbers a computer can store: analog vs digital, sampling, quantization, frequency, bit depth and pixels.
1. Digital vs Analog
Analog = continuous: every in-between value exists (a ramp, a slope — like a real sound wave). Digital = discrete: only fixed steps exist (a staircase — you stand on step 1 or 2, never 1.5).
A computer can only store digital values, so every real-world signal (sound, light) must be converted to digital first. Drag the slider to crush the smooth ramp into steps.
2. Sampling & Quantization — the conversion pair
These are the two steps that turn analog into digital. They are different and act on different axes:
- Sampling = how often you measure → acts on the time (x) axis. "10 samples per second" = read the value 10 times each second.
- Quantization = how precisely you record each value → acts on the amplitude (y) axis. Snap each reading to the nearest allowed level.
Drag Sample rate to add/remove dots along time. Drag Levels to make the staircase coarse or fine. Watch what happens when the rate gets too low.
Why exactly 2×? A sine wave has two things to pin down each cycle — its peak and its trough — so you need a little more than two samples per cycle to lock onto its true frequency. Sample any slower and several different waves fit the same dots, so the math can no longer tell them apart. This is the Nyquist–Shannon sampling theorem: capture everything below half the sample rate (the Nyquist frequency), lose everything above it.
What aliasing really is. A frequency that is too high doesn't just disappear — it folds back and reappears as a lower frequency of f_s − f, impersonating a wave that was never there. You see this everywhere: car wheels that spin backwards on video (the wagon-wheel effect), shimmering moiré patterns on striped shirts in photos, and it's why CD audio samples at 44.1 kHz — just over twice the ~20 kHz limit of human hearing. To prevent it, real systems run an anti-aliasing (low-pass) filter that removes the too-high frequencies before sampling.
Go deeper: Nyquist–Shannon theorem (Wikipedia) · The Scientist & Engineer's Guide to DSP, ch.3
3. Frequency & Period
- Frequency = how many full cycles per second, in Hertz (Hz).
- Period = time for one cycle =
1 / frequency.
Why this matters in DSIP: frequency is what Nyquist (Section 2) is measured against — you must sample at least twice the highest frequency present. A faster wave is harder to capture, which is exactly why high-pitched sound and fine image detail need higher sample rates.
Drag the frequency. Notice period is just its reciprocal — they move opposite ways.
4. Bit Depth → Levels (Accuracy vs Memory)
This is the second half of quantization (Section 2): the number of levels you snap each sample to is set by the bit depth. Each extra bit doubles the number of shades you can store. More levels = smoother, more accurate — but more storage. This is the core trade-off: Accuracy ↔ Memory. Too few bits and smooth gradients break into visible stripes (banding) — drag the slider down to 1–2 bits to see it.
| Bits | Levels (2^bits) | Look |
|---|---|---|
| 1 | 2 | black / white only |
| 3 | 8 | visible banding |
| 8 | 256 | smooth to the eye |
5. Images: Pixels, Bit Depth & RGB
An image = a grid of pixels. A 4×4 grid = 16 pixels, stored as a 2D matrix.
- Grayscale: each pixel = one value. 3-bit → 2³ = 8 shades, values 0 (black) … 7 (white).
- Color (RGB): each pixel = three values (Red, Green, Blue), each 0–255. e.g.
rgb(128,240,192)= medium red, high green, light blue.
Storage cost follows directly: colour needs 3× the memory of grayscale because every pixel stores three numbers instead of one. That single fact drives the size calculations in Section 14.
Drag the three sliders to mix a color. Click any cell in the grid to paint it.
Signals & Systems
The language of discrete signals and the two building blocks — the impulse δ[n] and the step U[n] — plus feedback, the idea behind every echo and filter.
6. Discrete Signals & Operations
Continuous uses round brackets x → f(x) = y. Discrete uses square brackets n → f[n], where n = the sample index (often time) and f[n] = the amplitude at that sample. No values exist between whole n.
Pick an operation and compare the original (grey) to the result (blue).
| Operation | Effect |
|---|---|
f[n−2] | shift right (minus = right) |
f[n+2] | shift left (plus = left) |
f[2n] | compress / squeeze |
f[n/2] | stretch / spread out |
f[−n] | flip (mirror on Y-axis) |
2·f[n] | amplify (taller — outside bracket) |
f[n−2] shifts right, not left. Minus inside = delay = move toward larger n.7. Unit Sample / Delta Signal δ[n]
A single spike at 0 — the simplest possible signal, and the building block of every signal.
Any signal can be written as a sum of shifted, scaled deltas. Add some below and watch the composite signal appear.
δ[n+1] + 3δ[n] + δ[n−1] + (−2)δ[n−2] — each term is one spike at a position, scaled by a number. Press the preset to load it.This "every signal is a stack of shifted spikes" idea is the building-block (sifting) property, and it is what makes the delta so important. Picture a signal as a row of values; each value is simply one impulse sitting at that index, scaled by that value. So any input is just a weighted sum of deltas.
Now the payoff: if a system is linear and time-invariant, feeding it one impulse gives a fixed output called the impulse response h[n]. Because the input is a sum of shifted impulses, the output is the same sum of shifted impulse responses. Measure how the system answers a single spike and you can predict its answer to everything — that summation is convolution (Section 10).
Go deeper: MIT OCW 6.003 — Signals & Systems · Impulse / Dirac delta (Wikipedia)
8. Step Signal U[n]
A signal that is "off," then turns on and stays on.
Two key relationships connect the step to the delta:
These two lines are the discrete version of integration and differentiation. A running sum of impulses keeps adding 1 from n=0 onward, building the flat plateau of the step (like integrating). The first difference — this sample minus the previous one — is zero everywhere the step is flat and jumps to 1 only at the single instant it switches on, handing the impulse back (like differentiating). Summing and differencing are inverse operations, which is exactly why δ and U are inverses of each other.
Go deeper: Unit step function (Wikipedia)
9. Feedback Loop & Geometric Series
A feedback system feeds its output back into its input, multiplied by a relay factor R. The input keeps echoing back, each time smaller — like sound echoing in a room.
Drag R. The bars are successive echoes; the running total converges to x/(1−R).
Where the formula comes from. The output is the input plus a delayed copy of itself: Y = x + R·Y. Collect the Y terms — Y − R·Y = x, so Y(1 − R) = x — and divide to get Y = x/(1 − R). Expanding 1/(1−R) as the geometric series shows the same answer as a stack of fading echoes: the original, then R as loud, then R², and so on forever.
This is a real IIR (infinite impulse response) filter — the maths behind digital echo and reverb. A single input click comes back an infinite number of times, each quieter than the last, because the output is fed back in. It only stays stable while each echo is smaller than the one before.
1/(1−R) = 1 + R + R² + … — the geometric series. Closer R is to 1, the louder the total echo.Go deeper: Geometric series (Wikipedia) · IIR filters (Wikipedia)
Convolution & Filtering
The single operation behind blur, sharpen, edge-detection and neural networks: slide a small filter over a signal or image and combine.
10. Convolution (the core concept)
Convolution = slide one signal over another and measure how much they overlap at each position. It's how every filter — blur, sharpen, edge-detect, and every conv-net layer — actually works.
The intuition: at each position, the kernel asks “how much of my shape is present in the signal right here?” Line the two up, multiply point-by-point, add it all into one number, then shift by one and repeat. Where the signal locally matches the kernel the sum is large; where it doesn't, the products cancel toward zero. That single multiply-and-add, swept across every position, is the whole operation.
Press Step to slide the flipped kernel one position at a time. Each output value is the sum of the aligned products.
h[n−k] term reverses the kernel by 180° before sliding (note the minus). The flip is what makes convolution commutative (f∗g = g∗f) and associative — the properties that let filters be reordered and combined. Slide without flipping and you get cross-correlation instead; that's actually what deep-learning "conv" layers compute, and it doesn't matter there because the kernel weights are learned (they'd just learn the mirror). For the symmetric kernels of Section 11 the flip changes nothing — but the exam loves to ask why it's there.Edges: near the start and end the kernel hangs off the signal, so you must decide what's "outside" — pad with zeros, repeat the edge value, or wrap around. Different choices give slightly different border pixels.
Fast trick — the diagonal table
Convolving two short lists = multiplying polynomials. Build a multiply table, then sum the diagonals.
[1,2,3] * [4,5,6] → diagonals 4 · (5+8) · (6+10+12) · (15+12) · 18 → [4, 13, 28, 27, 18].Go deeper: 3Blue1Brown — “But what is a convolution?” · Convolution (Wikipedia)
11. Image Convolution Kernels (blur / sharpen / edge)
A 2-D kernel is a tiny weight grid swept over every pixel. At each spot, multiply the 3×3 neighborhood by the weights and add them up — that sum is the new center pixel. Pick a kernel and hover any output cell to see its math.
(−1·100)·4 + (4·100) = −400 + 400 = 0 → flat areas vanish. The sharpen kernel on the same patch: −400 + 500 = 100 → the pixel survives (5 in the center instead of 4).Read the weights to predict the effect. When the weights sum to 1, a flat region keeps its brightness — the kernel is just a weighted average (blur). When they sum to 0, flat regions cancel to black and only places where neighbours differ — edges — survive. Sharpening is "original + a dose of the edges," which is why its centre weight is bumped to 5 instead of 4.
Two practical notes: at the image border the 3×3 window hangs off the edge, so you pad (repeat or zero the missing pixels) — here we repeat the edge. And the Gaussian blur is separable: a 2-D blur equals a 1-D blur across rows followed by a 1-D blur down columns, which is far cheaper to compute than the full grid.
Go deeper: Explained Visually — Image Kernels · Stanford CS231n — Convolutional Networks
12. Neuron = Weighted Sum
A single neuron does exactly what a convolution tap (Section 10) does: multiply each input by a learned weight, add them up, add a bias. Stack these and you get a neural network; slide a tiny one over an image and you get a convolutional network. This is the link between classic DSIP and modern AI image processing: the hand-designed kernels of Section 11 become weights the network learns for itself.
Storage & Data Rates
Putting sampling and bit depth together to answer the practical exam question: how many bytes does this audio or image actually cost?
13. Audio Data-Rate Calculator
Storage per second = samples/sec × bytes/sample. Slide the controls to see the cost.
14. Image Size / Bit-Budget Calculator
An RGB image needs width × height × 3 values. Given a fixed byte budget, how many bits per value can you afford?
Image Intensity & Display
Remapping pixel brightness — point operations, normalization and gamma — and where every topic sits on the camera-to-eye pipeline.
15. Intensity Transform S = T(r)
These are point operations: each pixel's new value depends only on its own old intensity r, run through a transfer function T — neighbours are ignored (unlike the kernels of Section 11). The shape of the curve is the operation. Pick a preset and read the shape.
Beyond the linear presets below, two non-linear curves are exam favourites. A log transform S = c·log(1 + r) stretches the dark end and compresses the bright end, pulling faint detail out of shadows. A power-law (gamma) transform S = c·r^γ does the same job with a tunable strength — and is exactly the gamma curve of Section 17. Stretching a narrow band of inputs across the full output range is called contrast stretching.
| Preset | T(r) | Effect |
|---|---|---|
| Brighten | S = r + k | shifts curve up |
| Contrast | S = (r − mid)·g + mid | steeper slope |
| Negative | S = max − r | flips the curve |
Go deeper: Histogram equalization (Wikipedia) · Gonzalez & Woods — Digital Image Processing (ch.3)
16. Normalization / Requantization
Stretch any range of raw values into a clean 0 … (2^bit − 1) range — used to fit data into a fixed bit depth.
This is min–max contrast stretching: find the darkest and brightest values actually present, map the darkest to 0 and the brightest to the maximum, and spread everything in between. A washed-out image whose values only span 90–160 gets re-stretched across the full 0–255, restoring punch. It's also how you re-fit data into a chosen bit depth (Section 4) — the 2^bit − 1 term sets the new ceiling. The "min" and "max" usually come from reading the image's histogram.
Drag a raw value and the output range; see where it lands.
Go deeper: Image normalization (Wikipedia)
17. Gamma Correction
Displays and eyes are non-linear, so intensities are reshaped by a power curve before display. Drag γ and watch both the curve and the gradient strip change.
The real reason gamma exists. Human vision is far more sensitive to changes in dark tones than bright ones — doubling a dim light is very noticeable; doubling an already-bright one barely registers. If you stored brightness with evenly-spaced steps (linear), you'd waste most of your codes on highlights the eye can't tell apart, while the shadows — where the eye is fussy — would show ugly banding.
So images are saved with an encoding gamma of about 1/2.2 (≈ 0.45), which packs more code values into the dark region where they're needed; the display then applies the inverse decoding gamma of about 2.2 to recover true light output. The two cancel, and the limited 8 bits are spent where your eye actually cares. The common sRGB standard follows roughly a 2.2 curve (with a small linear segment near black). That's why the exponent is 1/γ, not γ — you're pre-distorting to undo the display.
Go deeper: Cambridge in Colour — Understanding Gamma · Gamma correction (Wikipedia)
18. The Imaging Pipeline
Every concept above lives somewhere on this path from scene to perception. Use it as a one-glance revision map: if you can place each topic on the pipeline and say why it happens there, you understand the course.
Go Deeper — Curated Study Links
Reputable, mostly-free sources to take each topic further. Links open in a new tab.
Sampling, Nyquist & Aliasing
- The Scientist & Engineer's Guide to DSP — ch.3 · free book, Steven W. Smith
- Nyquist–Shannon sampling theorem · Wikipedia
- Sampling & Aliasing, explained visually · DSPRelated
Signals & Systems
- MIT OpenCourseWare 6.003 — Signals & Systems · full course
- Impulse / Dirac delta function · Wikipedia
- Geometric series · Wikipedia
Convolution & Kernels
- 3Blue1Brown — “But what is a convolution?” · video
- Stanford CS231n — Convolutional Networks · course notes
- Explained Visually — Image Kernels · interactive
- Kernel (image processing) · Wikipedia
Image Intensity & Gamma
- Understanding Gamma Correction · Cambridge in Colour
- Gamma correction · Wikipedia
- Gonzalez & Woods — Digital Image Processing · textbook companion