Dropout Regularization — Interactive Explorer

Watch nodes randomly drop during training, then see full network activate at inference — with weight scaling

Dropout Rate p

p 0.50

50% — each hidden node drops independently

Network Architecture

input – hidden – output nodes

Simulation

Speed Med

Inverted Dropout

ĥ = (h ⊙ Bern(1−p)) / (1−p)

Key Concepts ▾

What's happening?

Set dropout rate p and press Forward Pass. Each pass samples a fresh random mask — active nodes are scaled by 1/(1−p) so the expected output stays constant regardless of how many nodes survive.

Pass 0

Dropped —

Active —

Training mode — thinned subnetwork (p = 0.50)

Active node Dropped node Input node Output node

Drop Statistics

Pass #

—

Dropped

—

Active

—

Avg Drop %

Current p 0.50

Hidden node activity

ActiveDropped

Loss over Passes

Weight Scaling

Training
Active units scaled ×1/(1−p).
Expected output unchanged.

Inference
All units active, no scaling.
Network averages all subnetworks.