Census project: Predict whether income exceeds $50K/yr based on census data. Also known as Adult dataset.
Data has been downloaded from the Udacity nd0821-c3 project starter kit
The UC Irvine Machine Learning Repository is where you can find information on the original dataset
- Create your conda environment:
$ conda create --name <your environment name> --file requirements.txt $ conda env create --file conda.yaml $ conda activate <your environment name>
Data cleaning can be performed by using the Jupyter notebook "Census_Clean_Data.ipynb". The notebook also provides a good overview of the data. Data cleaning will also be provided via Python file: "/ml/data_cleaning.py"
$ python -m tests.sanitycheck
Answer path question with "tests.api_tests.py" as test file for a check of functionality to meet course specifications
$ python -m ml.train_model
After the model has been trained successfully, the following files will be saved:
Metrics will be written to "/artifacts/slice_output.txt"
Model will be saved to file "/artifacts/model.joblib"
Encoder will be saved to "/artifacts/encoder.joblib"
Label binarizer will be saved to "/artifacts/lb.joblib"
The output will be shown on screen and also be saved in "/logs/census.log"
Start the uvicorn server with:
$ uvicorn main:app --reload
The server is then accessible via: "http://127.0.0.1:8000"
Documents can be found here: "http://127.0.0.1:8000/docs"
FastAPI tests can be performed by using the Jupyter notebook "Census_Tests_API.ipynb".
Pytest will run all tests in the tests folder and can be executed via:
$ pytest -vv
Find detailed information in the "model_card.md"
If changes have been made, github actions is called.
The Heroku app can be tested by using the Jupyter notebook "Census_Test_Heroku.ipynb".
The app deployed on Heroku can be accessed at: "https://census-salaries-d3e2956470bf.herokuapp.com/"