The Power of Click Probability Prediction

Published in

Towards AI

10 min readOct 11, 2023

Where modern recommendation systems can nearly tell what you would like to watch or eat next, the engine that fuels such algorithms is what we are going to discuss in this article.

Imagine being able to anticipate which shopping item a user is most likely to buy, what food they are most likely to order, and which article will capture their attention next. Well, this is the actual power, which Click Probability Prediction (CPP) provides. At its core, CPP is about harnessing the predictive power of data to understand and forecast user behavior. Let’s understand this in more detail.

OUTLINE —

What is CPP?
Understanding how CPP works
Applications of CPP
Steps for implementing CPP
CPP with an example
Data Collection and Preparation
Feature Engineering
Model Selection and State-of-the-art Models
Modeling CPP
Conclusion

1. What is CPP?

Click Probability Prediction (CPP) is a fundamental concept in data science that revolves around predicting the likelihood of a user clicking on a particular element within a digital interface. This element could be anything from an advertisement, a hyperlink, or a product recommendation.
In simple terms, CPP determines how likely it is for a user A to click on an item P, given that we have the historical data for user-item interactions.

2. Understanding how CPP works

Let’s consider an example,
We have a recommendation engine that recommends to users on a platform the latest content to watch. The content recommended is based on recommendation algorithms such as content-based, collaborative-filtering, and model-based, etc. The user might click on some content that we send and might not click on the other. Considered over a period of time, we now have the historical data of user-item interaction, the item being the content we are sending to the user.
Recommendation engines make recommendations by using the user features and the content features. Taking this into account, we now have the historical data for user-item interaction containing user features, content features, and whether the content sent to the user was clicked or not. This can be visualized in the table below:

Now, applying machine learning models to the above data, we can learn from this data and then make predictions on unseen data. Under the hood, the machine will learn various patterns in the user-features, content-features, etc. while training. Then it will use the learned information to make predictions on unseen content for whether content P will be clicked by user A.

3. Applications of CPP

CPP, being a very powerful tool, is used in various industries for various purposes. A very common use for CPP models is to increase the click-through rate (CTR) for various platforms. Powerful industries utilizing CPP are advertising, e-commerce, recommendation engines, etc.

Advertisers use CPP to strategically place ads on websites and determine which ad content is most likely to resonate with their target audience, ultimately optimizing campaign efficiency and budget allocation.
E-commerce platforms enhance customer experience and drive conversions by utilizing data on product click likelihood to optimize placements, suggest related items, and fine-tune search results, ultimately fostering satisfaction and loyalty.
Streaming services, news aggregators, and social media platforms utilize CPP to offer personalized content recommendations based on user behavior and preferences, enhancing user engagement and encouraging prolonged platform usage.

Using CPP, metrics like CTR, Conversion Rate (CVR), Cost Per Acquisition (CPA), Cost Per Click (CPC), and Customer Lifetime Value (CLTV) can be highly optimized in the digital marketing and advertising spaces.

4. Steps for implementing CPP

Click Probability Prediction models can be implemented using appropriate Machine Learning algorithms and proper feature selection. Some of the crucial steps to consider before modeling CPP are:

Data Collection — the core of CPP relies on historical click data. For accurate predictions, it is essential to have a dataset that encompasses historical click data, including user interactions like clicks, along with associated attributes such as timestamps, session details, and the specific items that were clicked.
Data Formatting— the data should be organized in a structured format that is conducive to analysis and modeling. Preferred formats include CSV, TSV, and JSON. These formats allow data analysis irrespective of IDEs like Jupyter, etc.
Feature Dimensions — a general approach towards modeling CPP is by utilizing features in particularly 3 dimensions:
- User Features
- Item Features
-Context Features
This approach allows us to capture a wide range of information that can influence a user’s click behavior. Let’s take a deeper look:

User-features

User-features encompass characteristics and behaviors of the user interacting with the platform or content. These can be:

Demographic Information (e.g., age, gender, location)
Behavior History (e.g., past clicks, purchases)
Engagement Metrics (e.g., time spent on site, pages viewed, avg CTR per month, etc.)

Item-features

Item features pertain to attributes or features of the content or products being presented to the user. These can be:

Product Category or Type
Price
Ratings and Reviews
Content Keywords or Tags

Product descriptions in the form of embeddings can also be used. This can help in capturing information in text format using embeddings.

Context-features

Contextual features encompass the environment or context in which the user interaction occurs with the product or content. These can be:

Time of Day
Device Type (e.g., mobile, desktop)
Location or Geo-IP Information
Referral Source (e.g., social media, search engine)

The following points can be kept in mind while implementing CPP models:
1. Try to make the data feature rich — taking into account the relevance of features
2. Encode categorical variables and prefer normalizing/standardizing the data.
3. Make sure the data is balanced — an imbalance in click data can lead to poor predictions

After the data is prepared, suitable machine learning models can be used for training and making predictions, post which performance metrics can be compared to get to the best model.

5. CPP with an example

In this section, I will demonstrate the implementation of a CPP model for a movie recommendation platform. The final goal is to optimize the recommendations being sent to the user for better engagement and retention. I will follow the above-mentioned steps, starting from data collection to feature selection and modelling. Modeling for CPP can be done in any suitable framework. I have chosen Pandas.

6. Data Collection and Processing

I have created a demo dataset that contains the movies being presented to the users with the meta information of the user and item. Since the end goal of the CPP model is to optimize the recommendations being sent to the users, we need the historical interaction data — user-item-click. The data looks like this:

Post data collection, data cleaning, and preprocessing steps need to be performed to ensure that the model receives clean data that can be used for training and prediction.

Data cleaning and preprocessing steps include — handling missing values, handling duplicates, outlier detection and treatment, handling data imbalance, and data normalization and standardization.

Data imbalance in the case of CPP is a crucial step since data misbalance can lead to wrong predictions. Methods like resampling, synthetic sampling, and ensemble techniques can be used for handling data imbalance.

7. Feature Engineering

As I mentioned earlier, the more feature rich the data is the more helpful is it for the model to capture the hidden patterns related to user and item interactions. One method of generating more features is to utilise the current features for more granularity.

Consider the “timestamp” feature, this feature can be broken down into several dimensions — day of the week, quarter of the day, hour of the day, and so on. These features will help in consuming more details about the interaction context.

Similarly, the “duration” feature can also be represented as <2 hour and >2 hour. Such features can help determine user preferences.

Other features like “user’s average CTR” — aggregated over months or weeks or both can be really helpful for analyzing user engagement patterns.

8. Model Selection and State-of-the-art Models

After feature engineering and data prep, the next step is to select the appropriate model. Model selection is a very crucial step. After a proper understanding of the model’s architecture and working, a model should be selected for modeling purposes.
Several industry standard models for CPP are Logistic Regression, Random Forest, gradient-boosted trees (GBT), and Neural Networks.

Logistic Regression due to its interpretability, efficiency, low risk of overfitting, and ability to handle binary classification tasks effectively.
Random Forest can effectively handle a mix of categorical and numerical features, making it suitable for datasets with diverse types of information related to user behavior and content attributes.
Gradient-boosted Boosted Trees because they can handle complex, non-linear relationships between features and the probability of a click. They are capable of learning intricate patterns in the data, which is crucial for accurately predicting click probabilities in scenarios with a diverse range of features and interactions.
Neural Networks due to their ability to capture complex, non-linear relationships in the data, are essential for modeling user behavior in dynamic online environments.

The state-of-the-art model used for CPP is DeepFM(Deep Factorization Machines). This integrates factorization machines with deep neural networks to model both low- and high-order feature interactions. In DeepFM models, embedding layers play a very crucial role. Feature embeddings, along with Deep Neural Networks, are implemented using different loss functions and additional tweaks like class_weights.

A vanilla MLP can be used for the first part of the modeling process, where a simple neural network with 2–3 layers and an appropriate loss function can be implemented and trained for making predictions.

I will demonstrate the implementation of CPP using a Vanilla MLP.

9. Modelling CPP

Encode variables and split data — define categorical variables and numerical variables for proper encoding and split data into train, validation, and test sets. Remove identifier variables like “user_id”, “context_id”, etc., as these do not provide any information about the attributes of the respective entities.

Define model — define the input shape, input layer, activation, and initialization parameters, layers of the model (I have chosen 3 layers), and an appropriate optimizer. We can add batch normalization at every layer to speed up convergence and reduce overfitting.

Checkpoint and Callbacks — define model checkpoints and callbacks for saving and loading the model. Make sure to give the appropriate path_name so that you can save and load the results for various modeling combinations.

Train model and make predictions — experiment with different numbers of epochs, batch_size, and pass the above-defined callbacks. Post training, make predictions on the test data.

Evaluation and Performance Metrics — select the appropriate evaluation and performance metrics like Accuracy, Precision, Recall, F1, etc. based on the problem and optimisation type. For CPP models, precision is usually given more importance, since we want more True Positives to be correct among all positives. This means when the model predicts that a user will click on a particular content/product/item, then we want it to be very accurate of this prediction.

Post-evaluation of your model, you can compare different variations of your model by incorporating different loss functions, changing the number of hidden layers, neurons in hidden layers, epochs, and batch_size. Try out several iterations tweaking different parameters.

10. Conclusion

Click Probability Prediction (CPP) is a complex problem to solve, subject to various factors such as the nature of the data, the context in which it’s being applied, and the level of granularity required. Achieving a good result in CPP modeling is often challenging due to several reasons such as inherent noisiness, high-dimensional feature space, imbalanced data, dynamic environments, interactions, and context. While achieving extremely good results in CPP modeling can be difficult, it’s possible to get close to it with careful data preprocessing, thoughtful feature engineering, and the selection of appropriate modeling techniques.

I hope this article helps in understanding the power that CPP holds in the real world and how one can implement it using various ML techniques.

Check these out as well —

Mastering Recommendation Engines with Neural Collaborative Filtering

This article is your go-to manual for crafting a recommendation engine with Neural Collaborative Filtering (NCF). All…

pub.towardsai.net

A Stacked Ensemble Technique for DFU detection

A detailed overview for developing a stacked ensemble model for the early detection of Diabetic Foot Ulcers(DFU)

medium.com

One Stop For Logistic Regression

Logistic Regression? Why is it called Regression? Is it linear? Why is it so popular? And what the hell is log odds?

medium.com

A One Stop for Support Vector Machine

Support Vectors? Machine? And, why isn’t Oswald Mosely dead?

medium.com

Classification — Evaluation Metrics

Evaluation metrics are what make a Machine learning model show how evil it was under the hood.

medium.com

The Power of Click Probability Prediction

1. What is CPP?

2. Understanding how CPP works

3. Applications of CPP

4. Steps for implementing CPP

User-features

Item-features

Context-features

5. CPP with an example

6. Data Collection and Processing

7. Feature Engineering

8. Model Selection and State-of-the-art Models

9. Modelling CPP

10. Conclusion

Mastering Recommendation Engines with Neural Collaborative Filtering

This article is your go-to manual for crafting a recommendation engine with Neural Collaborative Filtering (NCF). All…

A Stacked Ensemble Technique for DFU detection

A detailed overview for developing a stacked ensemble model for the early detection of Diabetic Foot Ulcers(DFU)

One Stop For Logistic Regression

Logistic Regression? Why is it called Regression? Is it linear? Why is it so popular? And what the hell is log odds?

A One Stop for Support Vector Machine

Support Vectors? Machine? And, why isn’t Oswald Mosely dead?

Classification — Evaluation Metrics

Evaluation metrics are what make a Machine learning model show how evil it was under the hood.

Written by Priyansh Soni