fuses antibody 3D conformational ensemble and language representation for property prediction
current models include:
@article{rollins2024,
title = {{AbLEF}: {Antibody} {Language} {Ensemble} {Fusion} for {thermodynamically} {empowered} {property} {predictions}},
journal = {Bioinformatics},
author = {Rollins, Zachary A and Widatalla, Talal and Waight, Andrew and Cheng, Alan C and Metwally, Essam},
url = {https://doi.org/10.1093/bioinformatics/btae268},
month = apr,
year = {2024}}
@article{rollins2023,
title = {{AbLEF}: {Antibody} {Language} {Ensemble} {Fusion} for {thermodynamically} {empowered} {property} {predictions}},
journal = {The NeurIPS Workshop on New Frontiers of AI for Drug Discovery and Development (AI4D3 2023)},
author = {Rollins, Zachary A and Widatalla, Talal and Waight, Andrew and Cheng, Alan C and Metwally, Essam},
url = {https://ai4d3.github.io/papers/55.pdf},
month = dec,
year = {2023}}
- git lfs for locally stored language models
- protbert requires local installation to 'config/'
- protbert-bfd requires local installation to 'config/'
- ablang is locally installed with .yaml file
conda env create --name ablef --file alef.yaml
- Boltzmann imitator for multi-structure ensemble generation saved as pdb files (e.g., LowModeMD, MD)
- AbLEF manuscript uses LowModeMD in MOE and requires a license that can be acquired from CCG
- input sequence fasta file with variable fragment (Fv) into MOE
- homology model by running MOE Antibody Modeler Application (default settings)
- run MOE Stochastic Titration Application (nconf=50, T=300 or 400K, salt_conc=0.1)
- We also provide an open-source alternative to researchers using ImmuneBuilder and OpenMM
- input heavy (--h) and light (--l) chain sequence to generate mAb with ImmuneBuilder
- run OpenMM simulation engine to generate ensemble with implicit solvent
python ./data/ensemble.py --pdb='/pathway/to/input/mAb.pdb' --output='openmm_step_mAb.pdb' --T=300 --conc=0.1 --steps=50000
- pdb files from ensemble generation can be clustered using density based spatial clustering on the backbone atom distance matrices
python ./cluster/main.py input='/pathway/to/pdbs/' output='/pathway/to/pdbs/results' cpu_threads=28 noh=true method=dbscan eps=1.9 min_samples=1
- to utilize multi-structure ensemble fusion (LEF) pdb files in data directories are converted to pairwise distance tensors and saved as numpy arrays
- fasta files are converted to txt files for the heavy and light chain using IMGT canonical alignment (padded as zeros)
- gif below depicts an ensemble of pairwise distance tensors used for training AbLEF
python ./data/preprocess.py
- training and tuning execution is specified by the configuration files: 'config/setup.json'
- ensemble length (i.e., L or ens_L) is specified during training and inference: setup['training']['ens_L']
python ./src/train_tune.py
- setup["training"]["ray_tune"] == True
- specify hyperparameter search space in the 'main' of ./src/train_tune.py
- ray cluster must be initialized before hyperparameter tuning execution
- submit PBS script with specified num_cpus and num_gpus
- start ray cluster
ray start --head --num-cpus=8 --num-gpus=4 --temp-dir="/absolute/path/to/temporary/storage/"
python ./src/train_tune.py
- test trained/validated models on holdout by specifying config/setup.json
- AbLEF models trained on hicrt and tagg are located in models/weights
- setup['holdout']['model_path']
- setup['holdout']['holdout_data_path]
python ./src/holdout.py
- train_tune.log files are recorded and saved for every time stamped batch run
- runs are also recorded on tensorboard
- ***** = unqiue file identifier (e.g., time stamp or number)
logs/batch_*****/train_tune.log
logs/batch_*****/events.out.tfevents.***** (setup["training"]["ray_tune"] == False)
logs/batch_*****/ray_tune/hp_tune_*****/checkpoint_*****/events.out.tfevents.***** (setup["training"]["ray_tune"] == True)
- hyperparameter tune runs are implemented by ray tune and models are stored
- non-hyperparameter tuned models are also stored
logs/batch_*****/ray_tune/hp_tune_*****/checkpoint_*****/dict_checkpoint.pkl (setup["training"]["ray_tune"] == True)
models/weights/batch_*****/ALEF*****.pth (setup["training"]["ray_tune"] == False)
@article{widatalla2023,
title = {{AbPROP}: {Language} and {Graph} {Deep} {Learning} for {Antibody} {Property} {Prediction}},
journal = {ICML Workshop on Computational Biology},
author = {Widatalla, Talal and Rollins, Zachary A and Chen, Ming-Tang and Waight, Andrew and Cheng, Alan},
url = {https://icml-compbio.github.io/2023/papers/WCBICML2023_paper53.pdf},
month = jul,
year = {2023}}
- we also integrated the AbPROP codebase
- AbPROP methods are used as baselines to compare the AbLEF results with graph neural netowrks + language fusion
- graph neural networks are currently only single-structure molecular representations
- to utilize graph neural networks pdb files are converted and saved as torch geometric Data objects for GVP & GAT
python ./data/preprocess_graphs/graph_structs.py
AbLEF fuses antibody language and structural ensemble representations for property prediction.
Copyright © 2023 Merck & Co., Inc., Rahway, NJ, USA and its affiliates. All rights reserved.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.