Machine Learning Systems Pt. 2: Data Pipelines with TensorFlow Extended
Building all the data pipeline components for production ML with TFX
In part 1, I covered an overview and some of the primary challenges in doing MLOps. Implementing models at scale can be a difficult exercise due to the changing nature of data, business, and code.
In this part, I’ll show how you can build data pipeline components using TensorFlow Extended (TFX). This will follow the work and skills taught in the Machine Learning Engineering (MLOps) in Production Specialization by DeepLearning.ai, specifically the second course on the Data Lifecycle in Production. I’ll go through the final assignment here, but I’ll be applying it to a new dataset. The dataset I’ll be working with is this Stroke Prediction Dataset via Kaggle.
Table of Contents
- Data Ingestion
- Feature Selection
- Data Validation and Pipeline
- Feature Engineering