This repository contains a project that was showcased in the PyData Brasília talk about MLFlow. The project demonstrates the use of MLFlow Projects to execute a series of code scripts in a specific order while maintaining comprehensive logging of each run. The goal of this project is to provide an efficient and organized way to manage and monitor your machine learning workflows.
- Code Execution Order: The project leverages MLFlow Projects to run a series of code scripts in a defined order. This is particularly useful when you have multiple interdependent scripts that need to be executed in a specific sequence.
- Logging and Tracking: MLFlow's logging capabilities allow you to keep track of important metrics, parameters, and artifacts produced during each run. This ensures that you have a comprehensive record of the entire workflow.
- Reproducibility: By using MLFlow Projects, you can ensure that your code runs consistently across different environments. This greatly aids in reproducing results and collaborating with other team members.
-
Clone Repository: Clone this repository to your local machine:
git clone https://github.com/nasserboan/mlflow-pydata-talk cd mlflow-pydata-talk
-
Create Environment: Set up a virtual environment and install any necessary dependencies for your project:
conda env create -f conda.yml
-
Define which steps should be run: Open the
main.py
and define which steps should be run by altering therun_steps
list. -
Run Project: Execute the project using MLFlow:
mlflow run . --experiment-name <your-experiment-name>
-
View Results: Check the MLFlow UI to view the logged metrics, parameters, and artifacts from each run:
mlflow ui
- MLFlow
- PyTorch
- Hydra
- Scikit-Learn
- Argparse
- Pandas
├── LICENSE
├── README.md <- The top-level README.
├── data
│ ├── indexes <- Indexes of the images that will be used for training and testing
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── notebooks <- Jupyter notebooks.
│
├── mlruns <- Metada from MLFlow experiments.
│
├── src <- Source code for use in this project.
│ │
│ ├── make_dataset <- Scripts to generate data.
│ │ │
│ │ ├── env.yml
│ │ ├── MLProject
│ │ └── make_dataset.py
│ │
│ ├── split <- Scripts to split and prepare data.
│ │ │
│ │ ├── env.yml
│ │ ├── MLProject
│ │ └── split_and_prepare.py
│ │
│ └── train_model <- Scripts to train a model.
│ │
│ ├── env.yml
│ ├── MLProject
│ └── train_model.py
│
│
├── conda.yml <- Conda environment for the root project.
├── config.yaml <- Config file with parameters to be imported by Hydra.
├── main.py <- Parent project to run all the other projects inside src.
└── MLProject <- MLFlow project definition.