Loss Functions — Interactive Explorer

See how MSE, MAE, Cross-Entropy, and Hinge loss respond to predictions — and why choosing the right loss function matters for your problem

Loss Function

MSE = (ŷ − y)² — penalizes large errors quadratically

True Label y

Task type

True value

y 0.50

Prediction ŷ

ŷ 0.50

Drag to move predicted value and watch loss update

Outliers

n 0

Add outliers to show robustness differences between MSE and MAE

Playback

Speed 3×

What's happening?

Select a loss function and drag the prediction slider to see how each loss responds. Try adding outliers to compare MSE vs MAE robustness.

Key Concepts ▾

What is a loss function? A measure of how wrong the model's prediction is — the optimizer minimizes this value by adjusting weights via gradient descent. The choice of loss function shapes what the model learns to optimise.

MSE vs MAE: MSE squares the error so large mistakes are penalized much more than small ones — sensitive to outliers. MAE uses absolute error — robust to outliers but has a constant gradient that can oscillate near the minimum.

Why cross-entropy for classification? It penalizes confident wrong predictions extremely heavily — log(0) → ∞. This creates strong gradients that push the model away from confident mistakes, making it ideal for probability outputs.

What does the gradient tell us? The gradient ∂L/∂ŷ is the slope of the loss curve at the current prediction — it tells the optimizer which direction and how far to move. Large gradient = far from minimum, small gradient = near minimum.

Huber loss: combines MSE (for small errors) and MAE (for large errors) using a threshold δ. Smooth gradients near the minimum, robust to outliers far away. Best of both worlds for noisy regression problems.

Loss Landscape — MSE · ŷ vs L(ŷ)

Selected Loss: Loss vs Predicted Value

True label y Current ŷ ★ Minimum

All Four Loss Functions — Quick Comparison

MSE MAE Cross-Entropy Hinge

The loss curve shows L(ŷ) as a function of the predicted value. The orange dot marks your current prediction — drag the slider to move it. The minimum (★) is where the gradient is zero and loss is smallest.

Optimization Effect — Gradient Descent Under Different Losses

MSE MAE Huber Cross-Entropy (logistic)

Loss over Gradient Descent Steps

Press Play to watch gradient descent update the prediction under each loss function simultaneously. Notice how MSE has steeper gradients far from the truth, while MAE has constant gradient magnitude regardless of distance.

When to Use Which Loss Function

Regression

Outlier Robustness

MSE is pulled toward outliers; MAE ignores them

MSE fit — pulled toward outlier MAE fit — robust to outlier Outlier point

When to use: Use MSE when outliers are rare and you want smooth gradients. Use MAE when outliers are common and robustness matters. Use Huber for a balance between the two.

Binary Classification

Decision Boundary

Cross-Entropy vs Hinge for two-class separation

Class 0 Class 1 Cross-Entropy boundary Hinge margin zone

When to use: Use Cross-Entropy for probabilistic outputs (logistic regression, neural nets). Use Hinge for margin-based classifiers (SVM-style) when you want a hard decision boundary.

Multi-Class

Softmax Output

Categorical Cross-Entropy over all class probabilities

True class C0 — loss applied here Other classes L = −log(p_true)

When to use: Categorical Cross-Entropy is the standard for multi-class neural networks — it rewards confident correct predictions and heavily penalizes confident wrong ones.

Use the outlier slider (left panel) to see how MSE and MAE diverge as outliers are added. The blue fit line (MSE) gets pulled toward extremes while the green line (MAE) stays close to the bulk of the data.

Current Loss

—

L(ŷ) · MSE

Gradient ∂L/∂ŷ

—

slope of loss at current ŷ

Negative = loss decreases as ŷ increases. Optimizer moves opposite gradient direction.

Formula

MSE

L = (ŷ − y)²

Smooth everywhere · derivative = 2(ŷ−y)

All Losses at ŷ

Loss	L(ŷ)
MSE	—
MAE	—
C-Entropy	—
Hinge	—
Huber	—

at current ŷ and y