
Explain like I'm five
Imagine a dimmer switch that smoothly turns a light from completely off (0) to fully on (1), but it never goes beyond those limits. No matter how hard you twist the knob, the light stays between off and on. That's how the sigmoid function works: it takes any number and gently squeezes it into a 0-to-1 range.

Why it matters
It's the go-to tool for converting raw model outputs into probabilities, like deciding if an email is spam or not spam. You encounter it in logistic regression, neural network activation functions, and anywhere you need a smooth, bounded decision boundary.

Common misconception
Many think the sigmoid function outputs exactly 0 or 1 for extreme inputs, but it only approaches them asymptotically—it never actually reaches those values. Also, people often assume it's the only activation function, but in deep networks it can cause vanishing gradients, leading to alternatives like ReLU.

Formal definition
The sigmoid function, denoted σ(x) = 1 / (1 + e^{-x}), maps the entire real line to the open interval (0, 1). It is monotonic, differentiable, and S-shaped, with a steep slope near x=0 and flattening tails. Its output can be interpreted as a probability in binary classification tasks.