Artificial Intelligence Difficulty 55/100

Batch Normalization

Keeping baby blobs perfectly balanced.

⚡ The 5-second answer

Batch Normalization speeds up and stabilizes training of deep neural networks by normalizing each layer's inputs to have zero mean and unit variance.

Explain like I'm five

Imagine you're teaching a puppy to fetch. If you sometimes throw the ball far, sometimes close, the puppy gets confused. Batch Normalization is like always throwing the ball from the same spot—it makes learning consistent and faster by keeping the inputs to each step balanced.

Why it matters

It allows you to use higher learning rates and reduces sensitivity to initialization, which means models train faster and more reliably. You encounter it in nearly all modern deep learning architectures, from image classifiers to language models.

Common misconception

Many think Batch Normalization normalizes the output of a layer, but it actually normalizes the input to the next layer. Also, people assume it works the same for small batch sizes, but it becomes unstable when batches are tiny (e.g., batch size 1).

Formal definition

Batch Normalization is a technique that normalizes the activations of a layer across the mini-batch dimension by subtracting the batch mean and dividing by the batch standard deviation, then applies learnable scale and shift parameters (gamma and beta) to restore representational power. It is typically applied before the activation function in a neural network layer.