Dense-Screening-Feedback

This repository contains the code, data, and run files for the SIGIR 2024 paper Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation [arXiv].

Xinyu Mao, Shengyao Zhuang, Bevan Koopman, and Guido Zuccon. 2024. Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24). New York, USA. July 2024. 10.1145/3626772.3657921

Dependencies

The environment is based on Python 3.8. We use Tevatron (v1) for training dense retrievers, and pyserini=0.21.0 with faiss-cpu=1.7.4 for our main retrieval task with feedback. We also use the goldilocks-reproduce repository for the active learning baselines. Check the installation guides accordingly.

Data

We use the CLEF-TAR 2017-2019 collections for Subtask 2. The processed data for this paper are available on Zenodo.

Dense retrieval with explicit feedback

Run tevatron_pipe.py for dense retriever training, corpus & query encoding, and retrieval (initial ranking).

python tevatron_pipe.py --collection_split clef19_intervention \
                        --model_path ./model/clef19_intervention/biolinkbert_128_256_11 \
                        --q_max_len 128 \
                        --p_max_len 256 \
                        --train_n 11 \
                        --train_epoch 60

Rundense_query_tar.py for dense retrieval with explicit feedback. Rocchio settings are (1,1,1), (1,0.5,0.5), (1,0.8,0.2), (1,1,0).

python dense_query_tar.py --collection_split clef17_test \
                          --model ./models/biolinkbert_128_256 \
                          --n_iteration 20 --top_k 25 \
                          --output_path ./trained_results/a1_b5_c5/biolinkbert_128_256_2/clef17_test \
                          --alpha 1.0 \
                          --beta 0.5 \
                          --gamma 0.5

Baselines

BM25+RM3

Run bm25_baseline.py with the following command:

python bm25_baseline.py --collection_split clef19_intervention \
                        --baseline rm3

CLEF Runs

We select the previous best CLEF runs as baselines. Check here for more details.

TAR with Active Learning

(Logistic Regression) Run goldilocks_lr.py as follows:

python goldilocks_lr.py --collection_split clef19_dta

(BioLinkBERT) Run goldilocks_screen.py as follows:

python goldilocks_screen.py --collection_split clef19_intervention

Results

The run files and evaluated results of each dense retriever on CLEF-TAR 17-19 test sets can be found here.

Citation

If you find this repo useful for your research, please kindly cite the following paper:

@inproceedings{mao2024dense,
  title={Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation},
  author={Mao, Xinyu and Zhuang, Shengyao and Koopman, Bevan and Zuccon, Guido},
  booktitle={Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  pages={2357--2362},
  year={2024}
}

Contact

If you have any questions, feel free to contact xinyu.mao [AT] uq.edu.au (replace [AT] with @).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
baseline_results		baseline_results
baselines		baselines
.gitignore		.gitignore
README.md		README.md
dense_query_tar.py		dense_query_tar.py
eval_measures.py		eval_measures.py
rf_rocchio.py		rf_rocchio.py
tevatron_pipe.py		tevatron_pipe.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dense-Screening-Feedback

Dependencies

Data

Dense retrieval with explicit feedback

Baselines

BM25+RM3

CLEF Runs

TAR with Active Learning

Results

Citation

Contact

About

Releases

Packages

Languages

ielab/dense-screening-feedback

Folders and files

Latest commit

History

Repository files navigation

Dense-Screening-Feedback

Dependencies

Data

Dense retrieval with explicit feedback

Baselines

BM25+RM3

CLEF Runs

TAR with Active Learning

Results

Citation

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages