Alix, Paramveer, Susannah, Zoe
This project aims to analyze and predict the quality of wine based on various physicochemical properties. Using the UCI Wine Quality dataset, we conduct data preprocessing, exploratory data analysis, and build machine learning models to predict wine quality. The dataset includes multiple features, such as acidity, alcohol content, and sugar levels, which are critical in determining the quality score of wines. The project utilizes cross-validation and hyperparameter tuning to optimize model performance.
Dataset: The dataset was sourced from the UCI Machine Learning Repository.
Preprocessing: Standardization of numerical features. One-hot encoding for binary categorical features (e.g., color).
Exploratory Data Analysis: Distribution of wine quality scores. Correlation heatmaps to identify relationships between features. Key insights on influential features.
Modeling: Logistic regression was used as the base model. RandomizedSearchCV was applied for hyperparameter optimization. The model was evaluated using metrics such as accuracy, precision, recall, and F1-score.
For the first time running the project, create the conda environment by running the following in the root of the repository:
conda-lock install --name wine-quality-regressor conda-lock.yml
To run the analysis, open Jupyter lab from the root of the repository:
jupyter lab
Open notebooks/wine-quality.ipynb
in Jupyter lab and run all cells using the new wine-quality-regressor
kernel.
conda
(version 24.9.1 or higher)conda-lock
(version 2.5.7 or higher)- Python package
ucimlrepo
(version 0.0.7) jupyterlab
(version 4.2.0 or higher)nb_conda_kernels
(version 2.5.1 or higher)- Python and packages listed in
environment.yml