Towards AI

The leading AI community and content platform focused on making AI accessible to all. Check out our new course platform: https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev

Follow publication

Member-only story

A Beginner’s Guide to Converting Numerical Data to Categorical: Binning and Binarization

Souradip Pal
Towards AI
Published in
6 min readSep 8, 2024

Imagine sifting through rows of data in a spreadsheet packed with numbers that look impressive at first glance. But when you try to analyze them, the digits feel like a maze, hard to interpret and even harder to draw conclusions from. Now, picture the same dataset, but this time, the numerical values have been grouped into tidy categories, making patterns jump out at you. It’s like watching a blurry image come into focus. Sounds better, right?

Source: Image by the Author

That’s exactly what converting numerical data into categorical data can do for you! In today’s post, we’ll dive into two game-changing techniques: Binning and Binarization, perfect for scenarios like those faced with datasets such as Google Playstore data, where categories — like the number of app downloads — are more telling than raw numbers.

By the end, you’ll know how to wrangle numerical data into meaningful categories with easy-to-follow code examples. Let’s get started, shall we?

Why Convert Numerical Data to Categorical?

First, let’s understand why you’d want to turn your perfectly good numerical data into categorical values.

Let’s take an example from the Google Playstore dataset. You have a column that tells you the number of times an app has been downloaded: 5,000, 100,000, 1,000,000, and so on. While the raw numbers are useful, you might notice that clustering the apps into groups like “Low downloads,” “Moderate downloads,” and “High downloads” paints a clearer picture of app performance. It also helps:

  • Handle Outliers: Extreme values can skew analysis, but categorizing them into bins can prevent this.
  • Simplify Representation: Grouping data into categories makes it easier to visualize and interpret.
  • Improved Data Spread: Instead of dealing with an uneven distribution of values, bins provide an even spread.

Now that we’ve got that cleared up, let’s break down the two techniques: Binning and Binarization.

Binning (aka Discretization)

Binning is like taking a long, windy road of numbers and breaking it into smaller, more manageable…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Published in Towards AI

The leading AI community and content platform focused on making AI accessible to all. Check out our new course platform: https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev

Responses (1)

Write a response