Gradient Attribution — Interactive Explorer

See which input features drive model predictions — explore saliency maps, integrated gradients, and attribution comparison

Input Pattern

Click cells on the grid to paint a custom pattern

Model Settings

Target Class

Baseline (for IG)

Black = zeros, White = ones, Noise = random

Integration

Steps (m)

m 50

More steps = more accurate but slower

Playback

Step 0 / 50

Speed 3×

What's happening?

Select an input pattern and press Play to step through gradient attribution. Each step shows how the gradient at that interpolated input contributes to the final attribution map.

Key Concepts ▾

Saliency maps: compute ∂output/∂input — shows which pixels, if changed slightly, would most change the prediction. Fast but sensitive to noise and saturation near ReLU dead zones or sigmoid tails.

Integrated gradients: accumulate gradients along a straight path from a baseline to the input — IG(x) = (x−x′) × ∫₀¹ ∂F(x′+α(x−x′))/∂x dα. Captures contributions at all activation levels, not just the endpoint. Satisfies the completeness axiom.

Completeness axiom: attributions must sum to F(input) − F(baseline) — every unit of prediction difference is accounted for. Vanilla gradients do NOT satisfy this; integrated gradients do by construction.

SmoothGrad: average gradients over N noisy copies of the input x + ε, ε ~ N(0, σ²). Reduces visual noise and sharpens the attribution map without changing the fundamental gradient method.

GradCAM vs pixel attribution: GradCAM uses gradients of the class score with respect to feature map activations — gives coarser but more spatially coherent explanations than pixel-level gradients. Best for spatial localization in CNNs.

Saliency Maps — Edge pattern · Neural Net · Class 0

Input Pattern

8 × 8 pixel grid

pixel = 0 pixel = 1

Gradient Heatmap

∂f/∂x at this input

− (toward 0) 0 + (toward 1)

Values normalized to [−1, +1] for display

The saliency map shows ∂f/∂x — red pixels pushed the prediction toward class 1, blue pixels pushed it toward class 0. Magnitude shows sensitivity strength.

Integrated Gradients — step-by-step accumulation

Baseline x′

Black (all zeros)

Interpolated x′ + α(x − x′)

α = 0.50 (step 25 / 50)

Accumulated Attribution

∑ gradients so far

− 0 +

Values normalized to [−1, +1] for display

Integration progress step 0 / 50

α = 0.00

Each step α ∈ [0, 1] evaluates ∂f/∂x at x′ + α(x − x′). After all steps the accumulated gradients are scaled by (x − x′)/m — the result satisfies the completeness axiom: attributions sum to f(x) − f(x′).

Attribution Comparison — Vanilla Gradients · Integrated Gradients · SmoothGrad

Vanilla Gradients

Fast but can be noisy — gradient at input only

SNR: —

Integrated Gradients

Faithful — accumulates gradients from baseline to input

Σ attr − Δf = —

SmoothGrad

Denoised — averages gradients over noisy samples

SNR: —

Negative Neutral Positive

Values normalized to [−1, +1] for display

Total Attribution Magnitude per Method

Vanilla gradients can highlight irrelevant pixels near saturation. Integrated gradients respect the completeness axiom — attribution sum equals the prediction gap. GradCAM is coarser but more spatially stable.

Prediction

Class —

f(x) —

f(x′) —

Δf —

Top Attributed Pixels

(—, —)

—

(—, —)

—

(—, —)

—

(—, —)

—

row, col · attribution value

Completeness (IG)

—

Σ IG(x) − Δf

Should be ≈ 0 when integration is complete. Nonzero = more steps needed.

Attribution Quality

—

Signal-to-noise ratio

High SNR = attribution signal concentrated on few pixels. Low SNR = diffuse, noisy map.