This project is intended to implement cutting-edge technologies from the field of ML and MLOps in order to create a solution, which allows the end-user to make predictions about a particular diamond's price.
The repository is divided into two branches, where master contains code related to model training and API, and streamlit stores the frontend.
If you want to navigate to a different branch, just run:
git checkout <branch_name>
After cloning the repository you first have to set it up. Configure your Python interpreter to use Python 3.10. You can use either Conda or venv.
Install the required packages using:
pip install -r src/requirements.txt
After that manually create a /conf/local directory.
There are four distinct pipelines:
- data_prep - clean, removes outliers, standardizes, and spltis the data
- train_model - train AutoGluon predictor
- evaluate_metrics - produces DataFrame and PyPlot summaries of regression metrics
- api_pipeline - automated inference pipeline using kedro-fastapi
You can run all Kedro pipelines at once using:
kedro run
or a selected pipeline:
kedro run --pipeline <pipeline_name>
You can visualize all pipelines using:
kedro viz run
or a selected pipeline using:
kedro viz run --pipeline <pipeline_name>
This plugin automatically creates an MLPredictor that can perform inference using the defined predict method. It then runs Uvicorn and exposes the API on port 8000 by default. The API creation is based on api.yml and is orchestrated by Kedro like the rest of the project (see folder api_pipeline).
To start the API, run:
kedro fast-api run
If you want to test the API locally, first run it (with the command above) and then try to make a prediction. You can do that by opening http://0.0.0.0:8000 in your browser and input the data manually in the Swagger docs.
Another way to do so is passing the parameters in the URL itself, like so:
http://0.0.0.0:8000/diamond_price?carat=1&cut=Ideal&color=E&clarity=I1&depth=1&table=1&x=1&y=1&z=1
Since the app is supposed to be shareable, after developing it locally it can be containerized into a Docker image. The way the project is packaged into a Docker image is defined in the Dockerfile.
First, build the image:
docker build -t <image_name> .
Optionally, test the image locally:
docker run -p 8000:8000 -e PORT=8000 <image_name>
Then you're free to use the image to create containers with the solution anywhere.
Our team has decided to share the app via Google Cloud's Cloud Run. In order to make that work, you need to put the Docker image you've built on Artifacts Registry. Please refer to the official documentation.
In order to use the frontend first checkout to streamlit branch:
git checkout streamlit
Then install the required packages using:
pip install -r requirements.txt
Then run the frontend using:
streamlit run app.py