Member-only story
Machine Learning
Ensemble Methods Explained in Plain English: Bagging
Understand the intuition behind bagging with examples in Python
In this article, I will go over a popular homogenous model ensemble method — bagging. Homogenous ensembles combine a large number of base estimators or weak learners of the same algorithm.
The principle behind homogenous ensembles is the idea of “wisdom of the crowd” — the collective predictions of many diverse models is better than any set of predictions made by a single model. There are three requirements to achieve this:
- The models must be independent;
- Each model performs slightly better than random guessing;
- All individual models have similar performance on their own.
When these three requirements are satisfied, adding more models should improve the performance of your ensemble.
Ensemble methods help to reduce variance and combat overfitting to your train dataset, thus allowing your model to better learn generalized patterns rather than overfitting to the noise in your train dataset.
Bagging
How Bagging Works
In bagging, a large number of independent weak models are combined to learn the same task with the same goal. The term “bagging” comes from bootstrap + aggregating
, whereby each weak learner is trained on a random subsample of data sampled with replacement (bootstrapping), and then the models’ predictions are aggregated.
Bootstrapping guarantees independence and diversity, because each subsample of data is sampled separately with replacement and we are left with different subsets to train our base estimators.
The base estimators are weak learners that perform only slightly better than random guessing. An example of such a model is a shallow decision tree limited to a maximum depth of three. The predictions from these models are then combined through averaging.
Bagging can be applied to both classification and regression problems. For regression problems, the final predictions will be an average (soft voting) of the predictions from base estimators. For classification problems, the final predictions will be the majority vote (hard voting).