Authors: Paula Reyero Lobo (paula.reyero-lobo@open.ac.uk), Enrico Daga (enrico.daga@open.ac.uk), Harith Alani (harith.alani@open.ac.uk)
This repository supports the paper "Supporting Online Toxicity Detection with Knowledge Graphs" (link to paper) presented at ICWSM 2022. In this work, we deal with the problem of annotating toxic speech corpora and use semantic knowledge about gender and sexual orientation to identify missing target information about these groups. The workflow followed for this experiment is presented below:
The resulting output of this code corresponds to the directory tree bellow. We release these files in the following open repository:
icwsm22-supporting-toxicity-with-KG
│ readme.md
└───data
│ │ all_data_splits.csv
│ │
│ └───gsso_annotations
│ │ file11.csv
│ └───gsso_annotations_inferred
│ │ file21.csv
│ │ identity_data_splits.csv
│ │ readme.md
└───results
│ └───1_freq_tables
│ └───2_freq_plots
│ └───3_freq_plots_category
│ └───4_candidate_scores
│ └───saved_dict
└───scripts
To set up the project using a virtual environment:
$ python -m venv <env_name>
$ source <env_name>/bin/activate
(<env_name>) $ python -m pip install -r requirements.txt
Example usage:
Using the command line from project folder to detect gender and sexual orientation entities in the text:
(<env_name>) $ python scripts/gsso_annotate.py