Code and datasets for the WWW2022 paper KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction.
- Our paper KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction has been accepted by WWW2022.
To install requirements:
pip install -r requirements.txt
We provide all the datasets and prompts used in our experiments.
The expected structure of files is:
knowprompt
|-- dataset
| |-- semeval
| | |-- train.txt
| | |-- dev.txt
| | |-- test.txt
| | |-- temp.txt
| | |-- rel2id.json
| |-- dialogue
| | |-- train.json
| | |-- dev.json
| | |-- test.json
| | |-- rel2id.json
| |-- tacred
| | |-- train.txt
| | |-- dev.txt
| | |-- test.txt
| | |-- temp.txt
| | |-- rel2id.json
| |-- tacrev
| | |-- train.txt
| | |-- dev.txt
| | |-- test.txt
| | |-- temp.txt
| | |-- rel2id.json
| |-- retacred
| | |-- train.txt
| | |-- dev.txt
| | |-- test.txt
| | |-- temp.txt
| | |-- rel2id.json
|-- scripts
| |-- semeval.sh
| |-- dialogue.sh
| |-- ...
Use the comand below to get the answer words to use in the training.
python get_label_word.py --model_name_or_path bert-large-uncased --dataset_name semeval
The {answer_words}.pt
will be saved in the dataset, you need to assign the model_name_or_path
and dataset_name
in the get_label_word.py
.
Download the data first, and put it to dataset
folder. Run the comand below, and get the few shot dataset.
python generate_k_shot.py --data_dir ./dataset --k 8 --dataset semeval
cd dataset
cd semeval
cp rel2id.json val.txt test.txt ./k-shot/8-1
You need to modify the k
and dataset
to assign k-shot and dataset. Here we default seed as 1,2,3,4,5 to split each k-shot, you can revise it in the generate_k_shot.py
Our script code can automatically run the experiments in 8-shot, 16-shot, 32-shot and standard supervised settings with both the procedures of train, eval and test. We just choose the random seed to be 1 as an example in our code. Actually you can perform multiple experments with different seeds.
Train the KonwPrompt model on SEMEVAL with the following command:
>> bash scripts/semeval.sh # for roberta-large
As the scripts for TACRED-Revist
, Re-TACRED
, Wiki80
included in our paper are also provided, you just need to run it like above example.
As the data format of DialogRE is very different from other dataset, Class of processor is also different. Train the KonwPrompt model on DialogRE with the following command:
>> bash scripts/dialogue.sh # for roberta-base
If you use the code, please cite the following paper:
@inproceedings{DBLP:conf/www/ChenZXDYTHSC22,
author = {Xiang Chen and
Ningyu Zhang and
Xin Xie and
Shumin Deng and
Yunzhi Yao and
Chuanqi Tan and
Fei Huang and
Luo Si and
Huajun Chen},
editor = {Fr{\'{e}}d{\'{e}}rique Laforest and
Rapha{\"{e}}l Troncy and
Elena Simperl and
Deepak Agarwal and
Aristides Gionis and
Ivan Herman and
Lionel M{\'{e}}dini},
title = {KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization
for Relation Extraction},
booktitle = {{WWW} '22: The {ACM} Web Conference 2022, Virtual Event, Lyon, France,
April 25 - 29, 2022},
pages = {2778--2788},
publisher = {{ACM}},
year = {2022},
url = {https://doi.org/10.1145/3485447.3511998},
doi = {10.1145/3485447.3511998},
timestamp = {Tue, 26 Apr 2022 16:02:09 +0200},
biburl = {https://dblp.org/rec/conf/www/ChenZXDYTHSC22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}