asnq-challenging

ASNQ without easy negative answers.

Description

Original ASNQ dataset, proposed by TandA contains many negative pairs that should be really simple to be discovered. In the original paper, the authors state that there are three different types of negative pairs:

Sentences from the document that are in the long answer but do not contain the annotated short answers. It is possible that these sentences might contain the short answer.
Sentences from the document that are not in the long answer but contain the short answer string, that is, such occurrence is purely accidental.
Sentences from the document that are neither in the long answer nor contain the short answer.

We believe that the last category, which is the largest one, contains too many pairs with a very easily-recognizable negative answer. We do not consider those negative pairs.

Create the datasets

By default training and development set are available. In order to obtain both training, validation and test sets we suggest you to use the splits defined by The Cascade Transformers. In this case, the original development set will be split in both dev and test sets.

Install required libraries:

pip install -r requirements.txt

Create only training and validation sets with:

python create.py --split train --output_file output/asnq-challenging-train.tsv
python create.py --split validation --output_file output/asnq-challenging-dev.tsv

Create training, validation and test sets as in the Cascade Transformer with:

python create.py --split train --output_file output/asnq-challenging-train.tsv --filter input/filters/unique.train
python create.py --split validation --output_file output/asnq-challenging-dev.tsv --filter input/filters/unique.dev
python create.py --split validation --output_file output/asnq-challenging-test.tsv --filter input/filters/unique.test

where unique.train, unique.dev and unique.test are taken from here.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create.py		create.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

asnq-challenging

Description

Create the datasets

About

Releases

Packages

Languages

License

lucadiliello/asnq-challenging

Folders and files

Latest commit

History

Repository files navigation

asnq-challenging

Description

Create the datasets

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages