Pocket2Drug

Pocket2Drug is an encoder-decoder deep neural network that predicts binding drugs given protein binding sites (pockets). The pocket graphs are generated using Graphsite. The encoder is a graph neural network, and the decoder is a recurrent neural network. The SELFIES molecule representation is used as the tokenization scheme instead of SMILES. The pipeline of Pocket2Drug is illustrated below:

If you find Pocket2Drug helpful, please cite our paper in your work :)
Pocket2Drug: An encoder-decoder deep neural network for the target-based drug design
Wentao Shi, Manali Singha, Gopal Srivastava, Limeng Pu, J. Ramanujam, and Michal Brylinsky
Frontiers in Pharmacology: 587

Usage

Dependency installation

Install Pytorch:

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

Install Pytorch-geometric:

conda install pyg -c pyg

Install BioPandas:

conda install biopandas -c conda-forge

Install selfies:

pip install selfies

Install Rdkit:

conda install rdkit -c conda-forge

Dataset

All the related data can be downloaded here. After extraction, there will be two folders:

pocket-data: files that contain information of the pockets. We will use the .mol2 files.
protein-data: files that contain information of the proteins. We wiil use the .pops and .profile files.

Train

The configurations for training can be updated in train.yaml. Set the pocket_dir to the path of pocket-data, then set pop_dir and profile_dir to the path of protein-data. Set the out_dir the folder where you want to save the output results. The other configurations are for hyper-parameter tuning and they are self-explanatory according to their names. The script train.py trains the model on a 90%-10% split of the dataset, and you can specify which fold is used for validation:

python train.py -val_fold 0

In addition, you can use a pretrained RNN to initialize the decoder, the pretrained model can be found here. The pretrained RNN is trained on the chembl dataset and can improve the performance of the model. I have wrote an exmaple for pretraining RNN here).

Sample molecules

After training, the trained model will be saved at out_dir, and we can use it to sample molecules for the pockets in the validation fold:

python sample.py -batch_size 1024 -num_batches 2 -pocket_dir path_to_dataset_folder -popsa_dir path_to_pops_folder -profile_dir path_to_profile_folder -result_dir path_to_training_output_folder -fold 0

Of course, the model can be used to sample molecules for the unseen pockets defined by user. Simply omit the -fold option, the code will run on the specified input directories:

python sample.py -batch_size 1024 -num_batches 2 -pocket_dir path_to_dataset_folder -popsa_dir path_to_pops_folder -profile_dir path_to_profile_folder -result_dir path_to_training_output_folder

Name		Name	Last commit message	Last commit date
Latest commit History 211 Commits
data		data
doc		doc
rdkit_contrib		rdkit_contrib
vocab		vocab
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataloader.py		dataloader.py
model.py		model.py
sample.py		sample.py
train.py		train.py
train.yaml		train.yaml
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pocket2Drug

Usage

Dependency installation

Dataset

Train

Sample molecules

About

Releases

Packages

Languages

License

shiwentao00/Pocket2Drug

Folders and files

Latest commit

History

Repository files navigation

Pocket2Drug

Usage

Dependency installation

Dataset

Train

Sample molecules

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages