This is the code's repository of the paper "A Kernel-based Approach for Irony and Sarcasm Detection in Italian" presented at IronITA @ Evalita2018
The system ranked first and second at the sarcasm detection task, while it ranked sixth and seventh at the irony detection task.
Please if you use this code cite:
@inproceedings{Santilli2018AKA,
title={A Kernel-based Approach for Irony and Sarcasm Detection in Italian},
author={Andrea Santilli and Danilo Croce and Roberto Basili},
booktitle={EVALITA@CLiC-it},
year={2018}
}
This repository contains the jupyter notebook file GenerateKLPFile
used to model the features for the task as explained in the paper.
It also contains the .pickle
files with the words frequency extracted from the Irony Corpus.
-
In order to use this code you have first to download all the files in the repository, the datasets for the task and place them in the same folder.
-
Once you have downloaded these data, you need to preprocess them with a POS-tagger and lemmatizer. You have to generate a copy of each dataset downloaded
test_dataset.klp -> ../test_ironita2018_revnlt_processed.tsv
train_dataset.klp -> ../training_ironita2018_renlt_processed.tsv
and put them in the parent directory../
. These 2 new copy of the datasets that you have to generate have an extra columntext::text::S
where, for each line, the tex has been preprocessed astext::lemma::POS
-
Now you can use the jupyter notebook file
GenerateKLPFile
to generate the .klp file with all the features modelled as explained in the paper. -
At the end you have to use KeLP to use the modelled features in a kernel machine (linear combination, as explained in the paper, or other type of combination/kernel). An example of kelp classification can be found in the file
IroniTAClassifier.java