What is convolution? A filter slides across the input computing dot products — detecting the same pattern everywhere in the image (translation equivariance). One filter produces one feature map.
What does stride do? Stride controls how many pixels the filter jumps at each step — larger stride = smaller output = faster computation but coarser spatial resolution.
What does padding do? Padding adds zeros around the input border. "Same" padding preserves spatial dimensions; "Valid" (no padding) shrinks the output by (k−1) pixels per side.
What does pooling do? Pooling reduces spatial dimensions by summarizing regions — max pooling keeps the strongest activation, making the network more robust to small translations in the input.
Why share weights? The same filter weights are used at every position — this drastically reduces parameters compared to a fully-connected layer and encodes the assumption that features can appear anywhere.