affinity prediction using BGNN and transfer learning on re-docking dataset

Accepted on Journal of Computer-Aided Molecular Design. https://link.springer.com/article/10.1007/s10822-022-00448-3

Preparation(pose prediction, too)

conda environment should have rdkit/obfit(open babel)/sklearn/pandas/scipy/pytorch

1 PDBbind data (re-docking)

http://www.pdbbind.org.cn/
Download PDBbind data(v2019) both "general set" and "refined set" and merge all the files inside folder pdbbind_files.
run "utilities/pdbbind_redo.py" to create re-docking dataset from PDBbind data
The pdbbind_redo.py file contains python path for the conda environment. Please modify the PATH_TO_PYTHON accordingly.
Assuming that it takes up to an hour to process 100 instances, the job will finish in less than 40 hours.

2 chembl_bace data (cross-docking, "bace_chembl_cd")

https://www.ebi.ac.uk/chembl/target_report_card/CHEMBL4822/
One can download IC50 values of the target protein BACE (chembl id:CHEMBL4822) or use the "BACE_IC50.csv" file to start
run "make_data_smilecomp.py", "smile_comp2.py", "conformer_gen_BACE.py", and "alignedSubdir.py" in this order.
Some of the files contains python path for the conda environment. Please modify the PATH_TO_PYTHON accordingly.
copy all the contents inside the "truth" folder into "alignedsubset"
copy "BACE_IC50.csv" into "alignedsubset"
copy "similar_pdbid_info_bace.tsv" into "alignedsubset"
move the "alignedsubset" directory to the "src2" and change the name to "chembl_bace"

3 BACE data (cross-docking, "bace_gc4_cd")

https://drugdesigndata.org/about/grand-challenge-4
Start with the "BACE_score_compounds_D3R_GC4_answers.csv" file
run "make_data_smilecomp.py", "smile_comp_bace.py", "conformer_generation_bace.py", and "alignedSubdir_bace.py" in this order.
Some of the files contains python path for the conda environment. Please modify the PATH_TO_PYTHON accordingly.
copy all the contents inside the "truth" folder into "alignedBACEsubset"
copy "BACE_score_compounds_D3R_GC4_answers.csv" into "alignedBACEsubset"
copy "similar_pdbid_info2.tsv" into "alignedBACEsubset"
move the "alignedBACEsubset" directory to the "src2" and change the name to "BACE"

data and source code path required to run BGNN library

pdbbind_files ==> Preparatory work is required (too big to upload all files)
pdbbind_index ==> no preparation required
src2/chembl_bace ==> Preparatory work is required (too big to upload all files)
src2/BACE ==> Preparatory work is required (too big to upload all files)
src2/CATS ==> unzip the compressed file
src2/gc3_CATS ==> unzip the compressed file
The name "src2" means that the code is run on the third (0,1,2) gpu card

affinity prediction(BGNN)

run the "src2/train.py"
In the first run, modify the hyperparameter usePickledData to False.

ensemble

I used ensemble for the final result. the code can be found in the utilities folder

please ask me anything if you feel confused.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

affinity prediction using BGNN and transfer learning on re-docking dataset

Preparation(pose prediction, too)

data and source code path required to run BGNN library

affinity prediction(BGNN)

ensemble

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
bace_chembl_cd		bace_chembl_cd
bace_gc4_cd		bace_gc4_cd
pdbbind_index		pdbbind_index
src2		src2
utilities		utilities
LICENSE		LICENSE
README.md		README.md

License

arwhirang/affinity_prediction_BGNN

Folders and files

Latest commit

History

Repository files navigation

affinity prediction using BGNN and transfer learning on re-docking dataset

Preparation(pose prediction, too)

data and source code path required to run BGNN library

affinity prediction(BGNN)

ensemble

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages