3. Context and Scopeο
This section describes the technical framework and the dataset used in this project. The developed MLOps pipeline ensures that machine learning models can be efficiently trained, managed, and deployed.
3.1 Adult Income Contextο
The Adult Income Dataset contains demographic and income-related information. The goal is to build a classification model that predicts whether an individualβs income is above or below 50.000$.
To better understand the dataset, several Jupyter Notebook analyses are available:
π Click to Expand: Environment for Exploration
π Click to Expand: Data Exploration & Validation
π Click to Expand: Data Preparation
3.2 Technical Contextο
π₯οΈ Streamlit User Interface:
A simple web frontend where users can enter their data.
π FastAPI (Backend):
Receives user inputs and processes them.
Communicates with the Data Processor to generate predictions.
Returns the prediction result to the user.
π§ Data Processor (Core Component):
Trains and stores ML models.
Performs predictions.
Stores all data processing steps in MinIO and manages model versions with MLflow.
π¦ MLflow (Model Management):
Manages different model versions.
Stores model metrics and artifacts.
ποΈ MinIO (Data Versioning):
Stores different dataset versions.
Ensures reproducibility of the pipeline.
Why is this Context Important?
This MLOps pipeline is designed to ensure that machine learning models are not just trained once but can be continuously improved and efficiently deployed.