Member-only story
A Beginner’s Guide to Converting Numerical Data to Categorical: Binning and Binarization
Imagine sifting through rows of data in a spreadsheet packed with numbers that look impressive at first glance. But when you try to analyze them, the digits feel like a maze, hard to interpret and even harder to draw conclusions from. Now, picture the same dataset, but this time, the numerical values have been grouped into tidy categories, making patterns jump out at you. It’s like watching a blurry image come into focus. Sounds better, right?

That’s exactly what converting numerical data into categorical data can do for you! In today’s post, we’ll dive into two game-changing techniques: Binning and Binarization, perfect for scenarios like those faced with datasets such as Google Playstore data, where categories — like the number of app downloads — are more telling than raw numbers.
By the end, you’ll know how to wrangle numerical data into meaningful categories with easy-to-follow code examples. Let’s get started, shall we?
Why Convert Numerical Data to Categorical?
First, let’s understand why you’d want to turn your perfectly good numerical data into categorical values.
Let’s take an example from the Google Playstore dataset. You have a column that tells you the number of times an app has been downloaded: 5,000, 100,000, 1,000,000, and so on. While the raw numbers are useful, you might notice that clustering the apps into groups like “Low downloads,” “Moderate downloads,” and “High downloads” paints a clearer picture of app performance. It also helps:
- Handle Outliers: Extreme values can skew analysis, but categorizing them into bins can prevent this.
- Simplify Representation: Grouping data into categories makes it easier to visualize and interpret.
- Improved Data Spread: Instead of dealing with an uneven distribution of values, bins provide an even spread.
Now that we’ve got that cleared up, let’s break down the two techniques: Binning and Binarization.
Binning (aka Discretization)
Binning is like taking a long, windy road of numbers and breaking it into smaller, more manageable…