Learning Rate Schedules — Interactive Explorer

Watch how learning rate schedules shape training — compare constant, step decay, cosine annealing, warmup, and cyclical strategies

Schedule

Fixed LR throughout training — simple but rarely optimal

Hyperparameters

Initial Learning Rate

η₀ 0.010

Log scale: 10^slider

Epochs

T 100

Playback

Speed 3×

What's happening?

Select a schedule and press Play to animate training. Watch how each strategy shapes the learning rate over time — and how it affects convergence speed and final accuracy.

Key Concepts ▾

Why learning rate matters: Too high and training overshoots the minimum and diverges — the loss explodes. Too low and training converges too slowly or gets stuck in a suboptimal region. LR is the single most impactful hyperparameter in any gradient-based optimizer.

Step decay: Drops LR by a fixed factor every N epochs — simple and effective but the abrupt jumps can cause brief instability when the LR suddenly halves or thirds. The staircase pattern in the loss curve is its signature.

Cosine annealing: Smoothly reduces LR following a cosine curve from initLR to near-zero — widely used in modern deep learning (ResNets, ViTs). Reaches lower final loss than constant or step decay and often finds flatter, more generalizable minima.

Warmup: Start with a very small LR and linearly increase to initLR over the first few epochs — prevents large unstable updates when weights are randomly initialized. Especially important for Transformers, where attention gradients can explode early in training.

Cyclical LR: Periodically increase and decrease LR — the rises help escape local minima and saddle points. Used in super-convergence training; the optimizer explores more of the loss landscape and can settle in a wider, flatter minimum with better generalization.

Schedule Comparison — All 6 Strategies · Epoch 0 of 100

Constant Step Decay Exp. Decay Cosine Warmup+Cos Cyclical

Log y-axis

All 6 schedules start from the same initial LR. Press Play to step through epochs — watch how each strategy evolves differently. Click any line to highlight it and see its formula.

Training Effect — Cosine vs Constant LR

Training Loss

Selected schedule Constant LR

Validation Accuracy

Selected schedule Constant LR

Press Play to animate training epoch by epoch. Green = selected schedule, orange dashed = constant LR baseline.

Schedule Builder — Custom Three-Phase Schedule

1 Warmup linear ramp-up

Duration (epochs)

T₁ 5

Starting LR fraction

f 1%

Fraction of η₀ at epoch 0; ramps to η₀ by end of warmup

2 Main Decay primary training

Decay type

Decay rate / shape

r 0.90

For cosine: unused (shape is fixed). For exp: multiply by r each epoch.

3 Fine-tuning Floor stabilise near minimum

Minimum LR (fraction of η₀)

η_min 1%

Activate at epoch (% of total)

t 80%

Hold LR at floor for remaining epochs

Live Preview

Phase 1: Warmup Phase 2: Decay Phase 3: Floor

Apply → Tab 2 simulation · Save → adds teal line to Tab 1

Adjust the three phases to design your custom LR schedule. Phase 1 warms up from near-zero to prevent early instability. Phase 2 is where most training happens. Phase 3 holds a small constant LR to fine-tune near the minimum without overshooting.

Current State

—

LR · Epoch 0

Epoch 0

% Complete 0%

Schedule Constant

Formula

Constant

η(t) = η₀

Fixed learning rate throughout all epochs — simple but often suboptimal.

LR at Key Epochs

Epoch	LR
10	—
25	—
50	—
75	—
100	—

Convergence Estimate

Est. conv. epoch —

LR at conv. —

Epoch where loss improvement < 0.1% for 5 consecutive steps