Parse RCT PubMed abstracts following PICO framework to standarize PICO elements.
- Author: Tian Kang (tk2624@cumc.columbia.edu)
- Affiliation: Department of Biomedical Informatics, Columbia Univerisity (Dr. Chunhua Weng's lab)
- Citation: "Kang, T., Zou, S. and Weng, C., 2019. Pretraining to Recognize PICO Elements from Randomized Controlled Trial Literature. Studies in health technology and informatics, 264, p.188."
More modules coming soon for representing medical evidence information comprehensively from RCT abstracts.
Adapted from NCBI-NLP BlueBERT
-
Install
requirements.txt
-
If you want to use UMLS to standardize entities, please install 'UMLS' and 'QuickUMLS' locally
-
Download pretrained bluebert for PICO element recognition models (link in BERT )
-
Edit
parser_config.py
to customize your own diretories and BERT configuration -
Run to start parsing (specify your input in --data_dir and output directory in -- output_dir. In the input directory, each abstract text is put in one text file with its pmid as the file name. Example data is provided in
test
folder.python run_bluebert_ner_predict.py --data_dir= --output_dir=
To run examples:
python run_bluebert_ner_predict.py --data_dir=test/txt --output_dir=test/json`
Input test/txt
Parsing results test/json
- Install
requirements.txt
- If you want to use UMLS to standardize entities, please install 'UMLS' and 'QuickUMLS' locally
- Edit
parser_config.py
to customize your own diretories and installation - Run
python Phase1_NER_predict.py
to start parsing
- Download context vector pretrained in all pubmed abstracts from 1990-2019 (downlaod link in cluster/model/download.txt)
- Extract 3 files and put them under cluster/model
- TO BE CONTINUED
JSON
Input example.txt
contain over 70+ abstracts with methods sections
Parsing results folder example_json_out
{
"pmid": "11264545",
"sentences": {
"sent_1": {
"Section": "METHODS",
"text": "METHODS AND RESULTS : To determine the relative power of radiographic heart measurements for predicting outcome in dilated cardiomyopathy , we retrospectively studied 88 adult patients with chest radiographs obtained within 35 days of echocardiography .",
"entities": {
"entity_1": {
"text": "radiographic heart measurements",
"class": "Outcome",
"negation": 0,
"UMLS": "C0018787:heart,C1306645:radiograph,",
"index": 1,
"start": 10
},
"entity_2": {
"text": "predicting outcome",
"class": "Outcome",
"negation": 0,
"UMLS": "",
"index": 2,
"start": 14
},
"entity_3": {
"text": "dilated cardiomyopathy",
"class": "Participant",
"nega tion": 0,
"UMLS": "C0007193:dilated cardiomyopathy,",
"index": 3,
"start": 17
},
"entity_4": {
"text": "chest radiographs",
"class": "Participant",
"negation": 0,
"UMLS": "C1306645:radiographs,C0817096:chest,",
"index": 4,
"start": 27
},
"entity_5": {
"text": "echocardiography",
"c lass": "Participant",
"negation": 0,
"UMLS": "C0013516:echocardiography,",
"index": 5,
"start": 34
}
},
"relations": {}
},
"sent_2": {
"Section": "METHODS",
"text": "Standard radiographic variables were measured for each patient , and the cardiothoracic ( CT ) ratio , frontal cardiac area , and volume were calculated .",
"entities": {
"entity_6": {
"text": "Standard radiographic variables",
"class": "Outcome",
"negation": 0,
"UMLS": "C0038137:Standard,C1306645:radiograph,",
"index": 1,
"start": 0
},
"entity_7": {
"text": "cardiothoracic ( CT ) ratio",
"class": "Outcome",
"negation": 0,
"UMLS": "",
"index": 2,
"start": 11
},
"entity_8": {
"text": "frontal cardiac area",
"class": "Outcome",
"negation": 0,
"UMLS": "C0018787:cardiac,",
"index": 3,
"start": 17
},
"entity_9": {
"text": "volume",
"class": "Outcome",
"negation": 0,
"UMLS": "",
"inde x": 4,
"start": 22
}
},
"relations": {}
}
}
}
XML
Input test.txt
Parsing results temp.xml
A double-blind crossover comparison of pindolol , metoprolol , atenolol and labetalol in mild to moderate hypertension . 1 This study was designed to compare in a double-blind randomized crossover trial , atenolol , labetalol , metoprolol and pindolol . Considerable differences in dose ( atenolol 138 +/- 13 mg daily ; labetalol 308 +/- 34 mg daily ; metoprolol 234 +/- 22 mg daily ; and pindolol 24 +/-2 mg daily were required to produce similar antihypertensive effects .
<abstract>
<sent>
<text>A double-blind crossover comparison of pindolol , metoprolol , atenolol and labetalol in mild to moderate hypertension .</text>
<entity class='Intervention' UMLS='C0031937:pindolol' index='T1' start='5'> pindolol </entity>
<entity class='Intervention' UMLS='C0025859:metoprolol' index='T2' start='7'> metoprolol </entity>
<entity class='Intervention' UMLS='C0004147:atenolol' index='T3' start='9'> atenolol </entity>
<entity class='Intervention' UMLS='C0022860:labetalol' index='T4' start='11'> labetalol </entity>
<entity class='Participant' UMLS='C0020538:hypertension' index='T5' start='13'> mild to moderate hypertension </entity>
</sent>
<sent>
<text>1 This study was designed to compare in a double-blind randomized crossover trial , atenolol , labetalol , metoprolol and pindolol .</text>
<entity class='Intervention' UMLS='C0004147:atenolol' index='T6' start='14'> atenolol </entity>
<entity class='Intervention' UMLS='C0022860:labetalol' index='T7' start='16'> labetalol </entity>
<entity class='Intervention' UMLS='C0025859:metoprolol' index='T8' start='18'> metoprolol </entity>
<entity class='Intervention' UMLS='C0031937:pindolol' index='T9' start='20'> pindolol </entity>
</sent>
<sent>
<text>Considerable differences in dose ( atenolol 138 +/- 13 mg daily ; labetalol 308 +/- 34 mg daily ; metoprolol 234 +/- 22 mg daily ; and pindolol 24 +/-2 mg daily were required to produce similar antihypertensive effects .</text>
<attribute class='modifier' index='T10' start='1'> differences </attribute>
<entity class='Intervention' UMLS='C0004147:atenolol' index='T11' start='5'> atenolol </entity>
<attribute class='measure' index='T12' start='6'> 138 +/- 13 mg daily </attribute>
<entity class='Intervention' UMLS='C0022860:labetalol' index='T13' start='12'> labetalol </entity>
<attribute class='measure' index='T14' start='13'> 308 +/- 34 mg daily </attribute>
<entity class='Intervention' UMLS='C0025859:metoprolol' index='T15' start='19'> metoprolol </entity>
<attribute class='measure' index='T16' start='20'> 234 +/- 22 mg daily </attribute>
<entity class='Intervention' UMLS='C0031937:pindolol' index='T17' start='27'> pindolol </entity>
<attribute class='measure' index='T18' start='28'> 24 +/-2 mg daily </attribute>
<entity class='Outcome' UMLS='C0003364:antihypertensive' index='T19' start='37'> antihypertensive effects </entity>
</sent>
</abstract>