Machine Learning

Curse of Dimensionality

Prajwal Ballal
Towards AI
Published in
6 min readOct 17, 2020

--

The Curse of Dimensionality — a catchy term is termed by mathematician Richard Bellman in his book “Dynamic Programming” in 1957 which refers to the fact that problems can get a lot harder to solve on high-dimensional instances.

Let us start with a question. What is a Dimension?

Dimension in very simple terms means the attributes or features of a given dataset.

This sounds pretty simple. So why are we using such a negative word curse associated with dimensions? What is the curse here?

Let us learn the curse of dimensionality with a general example.

If anybody asks me What is a Machine learning Model?

Speaking in layman terms, when we give the dataset for training and the output of the training phase is a Model.

Suppose we have 7 models each having a different number of dimensions keeping the motive of the model the same in all the 7 models:

What we observe here is the number of features that we are giving to the training phase to generate the model is increasing exponentially.

So the question arises what is the relation between the number of dimensions and the model?

Can we say more number of features will result in a better model?

The answer is yes but …oh yes! there is a but here.

We can say that more number of features results in a better model but this is true only to a certain extent lets call that extent threshold.

Assume: In our example, Model 5 Is the threshold.

Model 5 has 14 attributes and the error rate for the same is lowest among Model 1 to Model 7.

Why does this happen? Shouldn’t Model 7 be the best model?

The reason for this is, in Model 6 and Model 7 majority of the features are redundant or sparse which means after a certain number of features all other features are not important.

Okay, but What is sparsity now? Why is it an issue?

Consider 10 evenly spaced points that can cover a One Dimensional unit line.

In a Two- Dimensional 1×1 plain, 100 evenly spaced sample points are needed to cover the whole area.

When it comes to a Three-dimensional a cube, 1,000 points will be needed to cover the whole area, you can imagine by putting only 100 points into the three-dimensional space, they will appear quite sparsely.

Data points needed to cover the entire volume increase exponentially with the increase in dimensions. The volume of the space grows very rapidly and hence the data cannot keep up and thus become sparse as seen in the figure below. The data structure and correlation in low dimensions do not apply or generalize well in a high dimensional space.

So Complicated! 2-Dimensions, 3-Dimensions, N-Dimensions? What is the Intuition for multi-dimensions?

As we all are used to two or maximum three dimensions above which we fail to visualize that is multidimensional thinking is out of our scope when it comes to picturizing. Let us try to visualize the curse of dimensionality mathematically using probability.

Take a classification problem, for example. In a classification problem, the goal is to define a distinct boundary between the classes.

Pick a random point from a line of unit length. The probability of the point being in the edges which say is less than <0.001 from the border is:

Similarly, in a two-dimensional space, consider a unit square. The probability of a point being in the edges is:

In a three dimensional space, in a unit cube, the probability of a point being in the edges is:

similarly, in an n-dimensional space, the probability of a point being on the edges is:

The point I am trying to make here is as “n” increases, the probability of a data point likely to be on the edges increases and the graph below gives a visual of the same.

Can we escape this curse?

Oh yes!!! Dimensionality Reduction.

Dimensionality Reduction is a method of converting the high dimensional variables into lower-dimensional variables without much loss of information of the features.

There are basically two types of components in dimensionality reduction which will not be covered in this article as dimensionality reduction is in itself a vast and very interesting topic but just to define the components.

Feature Selection: This technique extracts the most relevant variables from the original data set that involves three ways; filter, wrapper and embedded.

Feature Extraction: This technique is used to reduce the dimensional data to a lower-dimensional space.

By reducing the dimensionality of our data we drastically reduce the computational workload, less dimensional redundancy, and much more effective distance metrics.

Distance metrics? How distances are affected in high dimensions?

High dimension data affects distance metrics, especially Euclidean distance. Mathematically speaking the formula for Euclidean distance is,

such that,

n is the dimension.

Xi is an n-dimensional vector of real numbers

X is a vector space where X₁, X₂, …., Xi, …, Xₖ belongs to X

distmax(Xi) -> Maximum distance between Xi and Xj, where Xj belongs to X₁, …, Xₖ & Xj ≠Xi

distmin(Xi) -> Minimum distance between Xi and Xj, where Xj belongs to X₁, …, Xₖ & Xj ≠Xi

In lower dimensions, distmax() would be much higher than distmin() so the ratio will be greater than ZERO (Figure).

However, in higher dimensions distmax() and distmin() both will be the same so the ratio would actually become zero which means in high dimensions all the points are almost equidistant(Figure).

Hence in high dimensions, as all the points are equidistant there is no point in applying Euclidean distance. This is one of the major problems in Machine learning while calculating distances in high dimensions. Below is the output of the above mathematical proof simulation in python which verifies our statements regarding Euclidean distance and high dimensions.

FIGURE BELOW:

Do we have a conclusion?

In problems with a large number of features, a very high proportion of the data would be at the edges. This is called the curse of dimensionality and is the reason why feature engineering is essential.

In this article, the nature of the Curse of Dimensionality is studied using easy examples. It is evident that higher dimensions come at their own cost. The exploding nature of spatial volume is at the forefront of the reasons for the curse of dimensionality. We observed that the effects of the Curse of dimensionality are easily pronounced with as little as a few tens of dimensions.

For the distance, if we increase the dimension there is no meaning of calculating Euclidean distance because in high dimensions all the vectors are almost equidistant.

Despite the curse, many of the greatest successes of machine learning come from high-dimensional domains.

References:

1. Bellman RE (1961). Adaptive Control Processes. A Guided Tour. Princeton University Press, Princeton, NJ

2. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow — Aurélien Géron

3. High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction

4. Introduction to data mining — Pang ning tan, Micheal Steinbach, Vipin kumar

--

--

Analyst at TheMathCompany ¦¦ Post Graduation in Data Science Student from Praxis Business School ¦¦ LINKEDIN : linkedin.com/in/prajwal-ballal-8979a1b4