A hands-on tutorial on explainable methods for machine learning with Python: applications to gender bias
This repository contains the material for the tutorial held at the EuADS Summer School on Data Science for Explainable and Trustworthy AI (6-9 June 2023).
This repository contains the following folders:
- code: Python notebooks and scripts to run the examples in local.
- data: Datasets in CSV format.
- gcolab: All-in-one notebooks to be used in Google Colab.
You can see the slides used in the tutorial session too.
The code has been developed with Python 3.10.2 using Visual Code Studio with Jupyter extension. The Jupyter Notebook Renderers extension is required to visualise the interactive plots generated by dalex. Machine learning algorithms are built with sklearn and fairlearn.
If you want to run the examples on your machine, follow these steps:
- Clone/zip this repository
- Create a virtual environment using python env/conda. For python env:
python -m venv <your-venv-path>
- Activate your virtual environment (activate script on /bin or /Scripts depending on your OS)
- Install the dependencies from the requirements file. For pip:
pip install -r requirements.txt
- Go to the code folder
The examples of this tutorial use two datasets:
- Example 1: employee promotion, the original file is available on kaggle.
- Example 2: dutch census, a preprocessed file is available on github.
For each example, two notebooks are available:
Two notebooks are available for each example:
- Example 1: employee promotion
- Data analysis: exploration of the features of the dataset
- Machine learning + XAI: comparison of classifiers using different subsets of data with XAI techniques (model inspection, local explanations and counterfactuals).
- Example 2: Dutch census
- Data analysis: exploration of the features of the dataset.
- Fair machine learning + XAI: comparison of classifiers using bias mitigation methods with XAI techniques (model inspection, local explanations and counterfactuals).
This tutorial is part of the GENIA project, funded by the Annual Research Plan (2022) of the University of Córdoba (Spain).