GitHub - nshutijean/DVC-Mlflow-pipeline: 📅 A demo about versioning data and tracking ML experiments using DVC and Mlflow respectively.

DVC and Mlflow pipeline

Objective

The objective of this project is to demonstrate how DVC and Mlflow can be used together to version the data and track machine learning experiments.

Data

We will use the Car Evaluation dataset from the UCI Machine Learning Repository for this demo. The dataset will be used for a classification task, where we will predict the evaluation of a car based on its attributes.

The target variable or class label is the evaluation of the car, which is categorized into four values: unacc (unacceptable), acc (acceptable), good, and vgood (very good).

It can be found here: https://archive.ics.uci.edu/dataset/19/car+evaluation

Installation

To install the dependencies, you can use either pip or conda. Here are the steps:

Using pip:

python -m venv dvc-mlflow

source dvc-mlflow/bin/activate

pip install -r requirements.txt

Using conda:

conda create -n dvc-mlflow -y

conda install --yes --file requirements.txt

Model

We utilized a DecisionTreeClassifier from scikit-learn to perform the classification task. The model was trained on the car evalution dataset and evaluated using accuracy as the metric.

Usage

To run the workflow:

Clone this repo
Run dvc init to initialize DVC
Import the data using dvc pull
Execute the workflow and train the model using python train.py (this will also track the experiment with MLflow)
- You can also use arguments like python train.py gini 4 which signifies the criterion to use for splitting (gini) and the max depth of the tree (4).
Run mlflow ui to view the experiment in the MLflow UI

Blog post

Additionally, you can read this blog post which goes into detail about the DVC and Mlflow workflow (with code snippets)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.dvc		.dvc
data		data
.dvcignore		.dvcignore
.gitignore		.gitignore
README.md		README.md
eda.ipynb		eda.ipynb
requirements.txt		requirements.txt
train.py		train.py
training.ipynb		training.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DVC and Mlflow pipeline

Objective

Data

Installation

Model

Usage

Blog post

About

Releases

Packages

Languages

nshutijean/DVC-Mlflow-pipeline

Folders and files

Latest commit

History

Repository files navigation

DVC and Mlflow pipeline

Objective

Data

Installation

Model

Usage

Blog post

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages