© 2026 Greg T. Chism · MIT License

Learning Rate Schedules — Interactive Explorer

Watch how learning rate schedules shape training — compare constant, step decay, cosine annealing, warmup, and cyclical strategies


Schedule
Fixed LR throughout training — simple but rarely optimal
Hyperparameters
Initial Learning Rate
η₀ 0.010
Log scale: 10slider
Epochs
T 100
Playback
Speed
What's happening?
Select a schedule and press Play to animate training. Watch how each strategy shapes the learning rate over time — and how it affects convergence speed and final accuracy.
Key Concepts
Why learning rate matters: Too high and training overshoots the minimum and diverges — the loss explodes. Too low and training converges too slowly or gets stuck in a suboptimal region. LR is the single most impactful hyperparameter in any gradient-based optimizer.
Step decay: Drops LR by a fixed factor every N epochs — simple and effective but the abrupt jumps can cause brief instability when the LR suddenly halves or thirds. The staircase pattern in the loss curve is its signature.
Cosine annealing: Smoothly reduces LR following a cosine curve from initLR to near-zero — widely used in modern deep learning (ResNets, ViTs). Reaches lower final loss than constant or step decay and often finds flatter, more generalizable minima.
Warmup: Start with a very small LR and linearly increase to initLR over the first few epochs — prevents large unstable updates when weights are randomly initialized. Especially important for Transformers, where attention gradients can explode early in training.
Cyclical LR: Periodically increase and decrease LR — the rises help escape local minima and saddle points. Used in super-convergence training; the optimizer explores more of the loss landscape and can settle in a wider, flatter minimum with better generalization.
Schedule Comparison — All 6 Strategies · Epoch 0 of 100
LR vs Epoch (D3 rendered) 6 schedules · click any line to highlight · vertical line = current epoch
Constant Step Decay Exp. Decay Cosine Warmup+Cos Cyclical
All 6 schedules start from the same initial LR. Press Play to step through epochs — watch how each strategy evolves differently. Click any line to highlight it and see its formula.
Training Effect — Cosine vs Constant LR
Training Loss
Loss over epochs (D3 rendered) selected schedule vs constant LR baseline
Selected schedule Constant LR
Validation Accuracy
Accuracy over epochs (D3 rendered) selected schedule vs constant LR baseline
Selected schedule Constant LR
Press Play to animate training epoch by epoch. Green = selected schedule, orange dashed = constant LR baseline.
Schedule Builder — Custom Three-Phase Schedule
1 Warmup linear ramp-up
Duration (epochs)
T₁ 5
Starting LR fraction
f 1%
Fraction of η₀ at epoch 0; ramps to η₀ by end of warmup
2 Main Decay primary training
Decay type
Decay rate / shape
r 0.90
For cosine: unused (shape is fixed). For exp: multiply by r each epoch.
3 Fine-tuning Floor stabilise near minimum
Minimum LR (fraction of η₀)
ηmin 1%
Activate at epoch (% of total)
t 80%
Hold LR at floor for remaining epochs
Live Preview
Custom schedule preview updates in real time · D3 rendered
Phase 1: Warmup Phase 2: Decay Phase 3: Floor
Apply → Tab 2 simulation · Save → adds teal line to Tab 1
Adjust the three phases to design your custom LR schedule. Phase 1 warms up from near-zero to prevent early instability. Phase 2 is where most training happens. Phase 3 holds a small constant LR to fine-tune near the minimum without overshooting.
Current State
LR · Epoch 0
Epoch 0
% Complete 0%
Schedule Constant
Formula
Constant
η(t) = η₀
Fixed learning rate throughout all epochs — simple but often suboptimal.
LR at Key Epochs
Epoch LR
10
25
50
75
100
Convergence Estimate
Est. conv. epoch
LR at conv.
Epoch where loss improvement < 0.1% for 5 consecutive steps