Member-only story
Building a Content-Based Recommender System
A simple web application for movie recommendations

Introduction
Recommender systems (RSs) are everywhere. Amazon, Netflix, Spotify, YouTube, and many more services and apps we use every day have in the backend some sort of recommendation engine.
RSs help users to find items they are interested in, and this can increase engagement on the platform: if a platform suggests items of interest, users will spend more time on that platform.
I just completed the RS course at the Free University of Bolzano, and I realized how broader this field is. There are many techniques and strategies to build modern and powerful RSs based on specific application domains (music, travel, movies, …)
In this article, I will build a simple web application for movie recommendations. In particular, I will adopt a content-based approach.
1. Content-Based RSs
There exist mainly two types of RSs: Content-Based RSs and Collaborative Filtering RSs. Then different advanced techniques can be applied to improve the quality of the recommendations (context awareness, session awareness, reranking, and many more).
In pure Content Based RSs items are represented as a set of features generally expressed as metadata in natural language (title, description, keywords, tags, …). The name comes from the fact that we use the “content” of each item as a “description”.
This set of metadata is then converted into a vector (vectorization), and all the vectors are compared to each other, forming a “similarity matrix”. By looking up this matrix, recommendations for a given item can be obtained.
When building an RS there are many things to consider, and ethical aspects are one of them. An RS should be fair by recommending “good items” to all the users and covering all the “item space”. For instance, we don’t want that only a few users receive good recommendations as well as we’d like that all the items available have the same probability of being retrieved.
Another important point regards evaluation. Evaluating an RS is not trivial and requires knowledge and experiments due to the sparsity of data…