MLFlow Project: Automated Code Execution and Logging

This repository contains a project that was showcased in the PyData Brasília talk about MLFlow. The project demonstrates the use of MLFlow Projects to execute a series of code scripts in a specific order while maintaining comprehensive logging of each run. The goal of this project is to provide an efficient and organized way to manage and monitor your machine learning workflows.

Slides - PT-BR

Features

Code Execution Order: The project leverages MLFlow Projects to run a series of code scripts in a defined order. This is particularly useful when you have multiple interdependent scripts that need to be executed in a specific sequence.
Logging and Tracking: MLFlow's logging capabilities allow you to keep track of important metrics, parameters, and artifacts produced during each run. This ensures that you have a comprehensive record of the entire workflow.
Reproducibility: By using MLFlow Projects, you can ensure that your code runs consistently across different environments. This greatly aids in reproducing results and collaborating with other team members.

Getting Started

Clone Repository: Clone this repository to your local machine:

git clone https://github.com/nasserboan/mlflow-pydata-talk
cd mlflow-pydata-talk

Create Environment: Set up a virtual environment and install any necessary dependencies for your project:
```
conda env create -f conda.yml
```
Define which steps should be run: Open the main.py and define which steps should be run by altering the run_steps list.

Run Project: Execute the project using MLFlow:

mlflow run . --experiment-name <your-experiment-name>

View Results: Check the MLFlow UI to view the logged metrics, parameters, and artifacts from each run:
```
mlflow ui
```

Tools used

MLFlow
PyTorch
Hydra
Scikit-Learn
Argparse
Pandas

Project Organization

├── LICENSE
├── README.md              <- The top-level README.
├── data
│   ├── indexes            <- Indexes of the images that will be used for training and testing
│   ├── processed          <- The final, canonical data sets for modeling.
│   └── raw                <- The original, immutable data dump.
│
├── notebooks              <- Jupyter notebooks.
│
├── mlruns                 <- Metada from MLFlow experiments.
│
├── src                    <- Source code for use in this project.
│   │
│   ├── make_dataset       <- Scripts to generate data.
│   │   │
│   │   ├── env.yml
│   │   ├── MLProject    
│   │   └── make_dataset.py
│   │
│   ├── split              <- Scripts to split and prepare data.
│   │   │
│   │   ├── env.yml
│   │   ├── MLProject    
│   │   └── split_and_prepare.py
│   │
│   └── train_model        <- Scripts to train a model.
│       │
│       ├── env.yml
│       ├── MLProject    
│       └── train_model.py
│
│
├── conda.yml              <- Conda environment for the root project.
├── config.yaml            <- Config file with parameters to be imported by Hydra.
├── main.py                <- Parent project to run all the other projects inside src.
└── MLProject              <- MLFlow project definition.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLFlow Project: Automated Code Execution and Logging

Features

Getting Started

Tools used

Project Organization

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
imgs		imgs
mlruns		mlruns
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
MLproject		MLproject
README.md		README.md
conda.yml		conda.yml
config.yaml		config.yaml
main.py		main.py

License

nasserboan/mlflow-pydata-talk

Folders and files

Latest commit

History

Repository files navigation

MLFlow Project: Automated Code Execution and Logging

Features

Getting Started

Tools used

Project Organization

About

Topics

Resources

License

Stars

Watchers

Forks

Languages