What early stopping is: A regularization technique that halts training when the model begins to overfit — validation loss stops improving while training loss keeps falling. The weights at the best validation epoch are restored.
What patience means: The number of consecutive epochs without validation improvement before training stops. Higher patience tolerates longer plateaus; lower patience halts sooner and risks underfitting.
Why val loss diverges: Once a model memorizes training noise, its weights no longer generalize. Training loss keeps falling (fitting noise), but validation loss rises. The widening gap is the generalization error.
When to stop: At the epoch where validation loss is lowest — marked by the vertical line. More patience gives the model more chances to escape temporary plateaus before deciding to stop.