1. Introduction and Goals

We use arc42-template as a standard documentation layout (ARC42). This documentation is especially for new developers and all new contributors of this project. This section will describe the goal and give an small overview.

1.1 Task Description

What is FH-SWF MLOPS?

  • MLOPs-Project is a study project for “Fachhochschule Südwestfalen” to learn the concepts of MLOPs.

  • Uses Adult-Income-Dataset within the MLOPs-Pipeline. The target is to make prediction for income (>50.000$/a or <50.000$/a). So a solution for a binary classification problem.

  • The goal is to create a End-to-End MLOPs solution, using best practices.

Essential Features

  • Data preparation (exploration, cleaning and versioning).

  • Model: Feature-Engineering, evaluation and hyperparameter-tuning.

  • API: FastAPI as a middleman to handle different jobs.

  • Uses CI/CD pipeline for fast and continuous deployment.

  • Frontend: Streamlit as a frontend-solution, a input-form to make predictions.

  • Quality: Tests, Monitoring concepts and documentation.

Technical and theoretical Requirements

  • Git for versioncontrol of all services.

  • MLflow for tracking, model versioning and artifact handling.

  • FastAPI for backend.

  • Docker for Containerisation.

  • Streamlit for frontend.

  • Python as a programming language.

  • Machine Learning for modeling.

1.2 Quality-Target

Quality Target

Motivation and Description

Maintainability

The pipeline should be modular and well-documented to allow easy modifications and extensions (arc42 comes in place here :)).

Reproducibility

The entire ML workflow, from data ingestion to model deployment, should be reproducible using version-controlled code and Docker containers.

Automation

CI/CD pipelines should automate testing, deployment, and monitoring to minimize manual interventions.

Performance

Model training and inference should be optimized to provide quick responses, especially in real-time applications.

User-Friendliness

The API and frontend should be intuitive and accessible for non-technical users.

Reliability

The system should be robust and capable of handling edge cases without crashing.

Threshhold

The system should be able to retrigger the training, if the performance decreases and reaches a specific threshold.

1.3 Stakeholders

Who?

Interest and Relation

A new Developer

  • Needs a overview of this project.

  • Wants to develop new features.

  • Needs a easy access and a fast introduction to this project.

User

  • Uses the application.

  • Needs an intuitive UI.

  • Expects reliability.

  • Wants to predict the Income (less or more then 50.000$/a?).

  • Needs almost real-time response.

Students

  • Wants to understand the best-practices concept for MLOPs.

  • Needs an example (this one) to understand MLOPs concepts.