© 2026 Greg T. Chism · MIT License

Dropout Regularization — Interactive Explorer

Watch nodes randomly drop during training, then see full network activate at inference — with weight scaling


Dropout Rate p
p 0.50
50% — each hidden node drops independently
Network Architecture
input – hidden – output nodes
Simulation
Speed Med
Inverted Dropout
ĥ = (h ⊙ Bern(1−p)) / (1−p)
Key Concepts
What's happening?
Set dropout rate p and press Forward Pass. Each pass samples a fresh random mask — active nodes are scaled by 1/(1−p) so the expected output stays constant regardless of how many nodes survive.
Pass 0
Dropped
Active
Training mode — thinned subnetwork (p = 0.50)
Neural network renders here Press Forward Pass to begin
Active node Dropped node Input node Output node
Drop Statistics
0
Pass #
Dropped
Active
Avg Drop %
Current p 0.50
Hidden node activity
ActiveDropped
Loss over Passes
Loss chart
Weight Scaling
Training
Active units scaled ×1/(1−p).
Expected output unchanged.
Inference
All units active, no scaling.
Network averages all subnetworks.