Skip to content

We employed pre-trained BERT models (distillBERT, BioBert, and SciBert) for text-classifications of the titles and abstracts of clinical trials in medical psychology. The average score of AUC is 0.92. A stacked model was then built by featuring the probability predicted by distillBERT and keywords of search domains. The AUC improved to 0.96 with…

Notifications You must be signed in to change notification settings

Xiaowen-JI/Semi-automation-of-systematic-review-of-clinical-trials-in-medical-psychology-with-BERT-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Methodology

We used Bio.Entrez package of Python 3 to query , search and fetch the metainformations of the RCT studies in PubMed (search period from 2010 to 2020 February; Protocol of the systematic review has been published https://www.sciencedirect.com/science/article/abs/pii/S1087079221000307). The three BERT models of distillBERT, BioBERT and SciBERT are used to classify the title and abstract via Pytorch. We manually labelled the text by reading abstract. After diagnosing the wrong predictions, a stacked model was built by featuring the probability predicted by distillBERT and keywords of the search domains (complementary and alternative medicine). For the studies labelled as 1 (positive) based on the abstract, their full texts in PDF format were fetched from PubMed Central when available. Haystack question-answering pipeline(https://github.com/deepset-ai/haystack/#tutorials) was then fine-tunned and applied to the preprocessed full text to extract key information for further article screening.

pipeline

flowchart

Stacked Model Design (by Salash)

About

We employed pre-trained BERT models (distillBERT, BioBert, and SciBert) for text-classifications of the titles and abstracts of clinical trials in medical psychology. The average score of AUC is 0.92. A stacked model was then built by featuring the probability predicted by distillBERT and keywords of search domains. The AUC improved to 0.96 with…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published