Member-only story
MLOps Notes 3.2: Error Analysis for Machine learning models
Hello everyone!
This is Akhil Theerthala. Another article in the MLOps series has arrived, and I hope you enjoy it. We’ve examined the phases of a Machine Learning project, got a high-level view of deployment best practices, and are now diving into the modeling best practices. If you have missed the previous article detailing the modeling (3.1), you can read it here.
Following up on our previous discussion, here, we’ll talk about the model’s error analysis and the difficulties and best practices that come with it.

Why do we even need error analysis?
Before diving, let us take a step back and ask ourselves the question ‘why’. Let’s say we have trained a machine-learning model. Now, we somehow need to evaluate its performance. This evaluation is generally done by traditional metrics like accuracy, which determines whether the model is worth something.
We mostly won’t get the best performance when we train a model for the first time. We, more often than not, get a model with a bad performance or an average performance. So how do we tweak the original architecture to achieve our ideal performance and build something useful?
Error analysis helps us meaningfully break-down the performance of the model into groups that are easier to analyze and help us highlight the most frequent errors as well as their characteristics.
To look at the standard practices involved, let us go back to the speech transcription model we discussed in our previous articles. In the speech transcription model, we have seen noise from different areas like vehicles, people, etc. How do we analyze the model performance and find areas of improvement?
One way of analysis is to manually tag the samples under different categories and find the class with the highest scope of improvement, i.e., going through the examples manually and annotating them in a spreadsheet. In this speech transcription project, we take the labels and predictions for our model and try to recognize what kind of noise confused the model.
For simplicity, let us tag only 2 kinds of noise, one made by cars, and the other is the noise made by surrounding people. Then, we…