From Detection to Correction: How to Keep Your Production Data Clean and Reliable

Youssef Hosni
Towards AI
Published in
8 min readApr 24, 2023

--

In Production ML, data quality is everything. No matter how great your models or algorithms are, if the data you feed them is garbage, you’ll get garbage results. But how can you tell if your data is good or bad? That’s what we’re going to explore in this article.

We’ll start by discussing the importance of validating data and detecting data issues in production. Specifically, we’ll focus on two types of data issues: data and concept drift and schema and distribution skew. These issues can be…

--

--