This is the official open source repository for discrete Walk-Jump Sampling developed by ncfrey, djberenberg, kleinhenz, and saeedsaremi, from Prescient Design, a Genentech accelerator.
Assuming you have miniconda installed, clone the repository, navigate inside, and run:
./scripts/install.sh
The entrypoint walkjump_train
is the main driver for training and accepts parameters using Hydra syntax.
The available parameters for configuration can be found by running train
--help or by looking in the src/walkjump/hydra_config
directory
The entrypoint walkjump_sample
is the main driver for training and accepts parameters using Hydra syntax.
The available parameters for configuration can be found by running sample
--help or by looking in the src/walkjump/hydra_config
directory
Use the LargeMoleculeDescriptors class to compute descriptors for large molecules (proteins, antibodies, etc.) and see the code for computing Wasserstein distances between samples and reference distributions for evaluating sample quality.
See the DCS code and DCS README to evaluate samples.
We welcome contributions. If you would like to submit pull requests, please make sure you base your pull requests off the latest version of the main
branch.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
If you use the code and/or model, please cite:
@article{frey2023protein,
title={Protein Discovery with Discrete Walk-Jump Sampling},
author={Nathan C. Frey and Daniel Berenberg and Karina Zadorozhny and Joseph Kleinhenz and Julien Lafrance-Vanasse and Isidro Hotzel and Yan Wu and Stephen Ra and Richard Bonneau and Kyunghyun Cho and Andreas Loukas and Vladimir Gligorijevic and Saeed Saremi},
year={2023},
eprint={2306.12360},
archivePrefix={arXiv},
primaryClass={q-bio.BM}
}