Query Summarized Attention for TREC-CT 2022
we need to install the elasticsearch package from your distro repo.
sudo systemctl start elasticsearch
python BM25.py
This will save the BM25.txt file in input folder.
BERT_CAT Architecture
The query and document are passed together into the transformer for semantic similarity.
To train the monobert model run,
python monobert.py
BERT_DOT Architecture
The Document and query embedding are produced separately and then dot product is done to get the relevance.
To train the colBERT model run,
python colbert.py
- SciBERT - Finedtuned on Scintific text/literature
- BlueBERT - Finetuned on PubMed dataset
- BioClinicalBERT - Finetuned on MIMIC-III dataset
Change the model name in config file.
After the BERT reranking, we will consolidate all the scores. To do that run,
python output-consolidation.py
The final re-ranked output is stored in output/final.txt We can check the metric by running,
trec_eval -m "trec_official" input/qrels2022.txt output/final.txt