Skip to content

arwhirang/affinity_prediction_BGNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

affinity prediction using BGNN and transfer learning on re-docking dataset

Accepted on Journal of Computer-Aided Molecular Design. https://link.springer.com/article/10.1007/s10822-022-00448-3

Preparation(pose prediction, too)

conda environment should have rdkit/obfit(open babel)/sklearn/pandas/scipy/pytorch

1 PDBbind data (re-docking)

  • http://www.pdbbind.org.cn/
  • Download PDBbind data(v2019) both "general set" and "refined set" and merge all the files inside folder pdbbind_files.
  • run "utilities/pdbbind_redo.py" to create re-docking dataset from PDBbind data
  • The pdbbind_redo.py file contains python path for the conda environment. Please modify the PATH_TO_PYTHON accordingly.
  • Assuming that it takes up to an hour to process 100 instances, the job will finish in less than 40 hours.

2 chembl_bace data (cross-docking, "bace_chembl_cd")

  • https://www.ebi.ac.uk/chembl/target_report_card/CHEMBL4822/
  • One can download IC50 values of the target protein BACE (chembl id:CHEMBL4822) or use the "BACE_IC50.csv" file to start
  • run "make_data_smilecomp.py", "smile_comp2.py", "conformer_gen_BACE.py", and "alignedSubdir.py" in this order.
  • Some of the files contains python path for the conda environment. Please modify the PATH_TO_PYTHON accordingly.
  • copy all the contents inside the "truth" folder into "alignedsubset"
  • copy "BACE_IC50.csv" into "alignedsubset"
  • copy "similar_pdbid_info_bace.tsv" into "alignedsubset"
  • move the "alignedsubset" directory to the "src2" and change the name to "chembl_bace"

3 BACE data (cross-docking, "bace_gc4_cd")

  • https://drugdesigndata.org/about/grand-challenge-4
  • Start with the "BACE_score_compounds_D3R_GC4_answers.csv" file
  • run "make_data_smilecomp.py", "smile_comp_bace.py", "conformer_generation_bace.py", and "alignedSubdir_bace.py" in this order.
  • Some of the files contains python path for the conda environment. Please modify the PATH_TO_PYTHON accordingly.
  • copy all the contents inside the "truth" folder into "alignedBACEsubset"
  • copy "BACE_score_compounds_D3R_GC4_answers.csv" into "alignedBACEsubset"
  • copy "similar_pdbid_info2.tsv" into "alignedBACEsubset"
  • move the "alignedBACEsubset" directory to the "src2" and change the name to "BACE"

data and source code path required to run BGNN library

  • pdbbind_files ==> Preparatory work is required (too big to upload all files)
  • pdbbind_index ==> no preparation required
  • src2/chembl_bace ==> Preparatory work is required (too big to upload all files)
  • src2/BACE ==> Preparatory work is required (too big to upload all files)
  • src2/CATS ==> unzip the compressed file
  • src2/gc3_CATS ==> unzip the compressed file
  • The name "src2" means that the code is run on the third (0,1,2) gpu card

affinity prediction(BGNN)

  • run the "src2/train.py"
  • In the first run, modify the hyperparameter usePickledData to False.

ensemble

  • I used ensemble for the final result. the code can be found in the utilities folder

please ask me anything if you feel confused.

About

BGNN model for blind affinity prediction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages