
Explain like I'm five
Imagine you're trying to guess someone's age. First, you guess 30, and you're told you're off by 5 years. Then you adjust your guess to 35, and get told you're off by 2 years. You keep making small corrections until you get it right. Gradient Boosting does this with many simple models, each fixing the mistakes of the last, until the overall guess is very accurate.

Why it matters
Gradient Boosting powers many real-world systems like search engines, recommendation algorithms, and fraud detection because it often produces the most accurate predictions. It's a go-to method for winning machine learning competitions and handling complex data with mixed features.

Common misconception
Many think Gradient Boosting is just one big model, but it's actually an ensemble of many small, weak models (usually decision trees) trained one after another. Another common mistake is confusing it with AdaBoost, which adjusts sample weights, while Gradient Boosting uses gradients to minimize errors.

Formal definition
Gradient Boosting is an ensemble machine learning technique that builds a predictive model in a stage-wise fashion by optimizing a differentiable loss function. At each iteration, a weak learner (typically a shallow decision tree) is fit to the negative gradient of the loss function with respect to the current model's predictions, thereby correcting residual errors. The final model is a weighted sum of these weak learners, where each new learner focuses on the hardest-to-predict cases.