This repository contains code for running classification experiments on adversarial attacks.
First install dependencies, eg:
conda create -n react-detection python=3.8
source activate react-detection
pip install -r requirements.txt
Then add reactdetect folder to you environment. E.g. if you are using conda
conda develop <absolute path to reactdetect>
Then, plug into react-detect/attacks_dataset
folder an attacks
folder in convention of the dataset generation repo, our use pre-generated attack instances here. (not required for quickstart)
The reactdetect
folder acts as a small library that works as the building block for other modules in the repo. For a quick glance of how to train a sklearn or pytoch type model, see various examples and annotated jupyter notebook in ./reactdetect/examples
.
As the feature extraction process could be reused for various experiments, the repo contains another workflow for storing pre-computed features and running experiments at scale.
-
First, run
encode_main.py
to generate sentence level feature representations at./reprs/samplewise
. (Similarlyencode_main_tw.py
generates token level features at./reprs/tokenwise
) -
Then run
distribute_experiments.py
with proper arguments, which will createtrain.csv
andtest.csv
under./detection-experiments
folder. -
Then run
make_experiments.py
, which takes in any directory that contains train and test csvs and make them into joblib files using cached representations in./reprs
folder. You can disable distribution of token-wise features via its commandline args. -
Finally, run
detection_sw.py
that takes in any directory that contains pre-made train and test joblib files, which trains a LR/LGB/RF model per its input arguments; and logs models, outputs and metrics in a unique subdirectory.
./reactdetect contains the building blocks for the experiments, it is formulated to work as a small library when added to PYTHONPATH.
reactdetect/
├── aggregation/ # modules that does "string to id/vector" handling, eg token/feature embedder, etc
├── allennlp_wrappers/ # relevant classes that overrides original allennlp modules, e.g. trainer that takes in python list
├── data/ # dataset readers, data loaders, etc. The classes stays as close of allennlp ones as possible
├── examples/ # examples
├── feature_extraction/ # feature extractor class and feature extaction functions
├── featu..._tokenwise/ # feature extractor class and feature extaction functions, at token level
├── utils/ # helpful file io, pandas operations and magic vars, etc.
└── models/ # base model classes, the neural base model stays as close of original allennlp one as possible