This is the official implementation of CryoFEM (Cryo-EM Feature Enhancement Model)[paper], an image enhancement tool based on 3D convolutional neural networks for cryo-EM density map post-processing. It effectively enhances the quality of density maps, facilitating more precise interpretation at the atomic level.
Our image enhancement model can be used synergistically with AlphaFold and protein model refinement tools, e.g., PHENIX, to tackle cases where initial AlphaFold predictions are less accurate.
In the manuscript we used our forked version of OpenFold, which enables us to use custom template to perform structural predictions. Alternatively, one can use ColabFold official to implement the proposed workflow. See here for a more detailed instructions to use ColabFold and PHENIX to perform the model refinement.
-
Mar 24: We have updated the Colab notebook to make it more robust (support multiple runs and avoid google drive issues).
-
Nov 23: Our paper has been published in Briefings in Bioinformatics. Check it out here.
-
Mar 23: Try our Colab notebook to run CryoFEM on Google Colab. Simply upload your own half maps and run the prediction, then you can see the visualization and download the enhanced map. Check the Colab notebook for more details.
-
Mar 23: We have tested CryoFEM with PyTorch 2.0. By default if you run the training script with PyTorch 2.0, it will first compile to model with
torch.compile
to accelerate the training
CryoFEM is developed with Python 3.9 and PyTorch 1.12. Other important packages include:
torchio 0.18.86
scikit-image 0.19.3
mrcfile 1.4.3
numpy 1.24.1
tqdm 4.64.1
We recommend to set up a new Conda environment to install those Python packages.
- 3rd party dependency: we use ChimeraX to perform map resampling (making each voxel has the dimension of 1Å) and target generation (simulate target maps using deposited PDB models).
All the data used in the training and validation of CryoFEM are publicly available from PDB and EMDB. data_processing/train_emd_id.txt
contains the list of EMDB ID list used for model training.
- After downloading the half-maps from EMDB, using
data_processing/generate_averaged_map_from_half_maps.py
to compute and save the averaged raw map. - Checkout
data_processing/map_resampling_simulation.py
to see how to resample the raw map and simulate the target map using ChimeraX.
- Download our trained model from Google drive
- We provide a sample data at
example_data
and the corresponding inference configuration atconfigs/inference.json
. After tuning the options, e.g., trained weights path, GPU id, run the inference script as:
python inference.py --config configs/inference.json
- If
{"test_data": "save_output"=1}
, the output maps will be saved to./results/inference/yyyy-mm-dd-current-clock-time
. It would take around 10s using the sample map on a Nvidia V100 GPU.
- In addition to the resampled raw map, you'll need simulated maps as the targets to train the model. By default,
data_processing/map_resampling_simulation.py
will save the simulated map assimulated_map_{xxx}_res_2_vol_1.mrc
, where{xxx}
denotes the EMDB ID,res_2
andvol_1
indicate the simulated resolution of 2 Å and voxel size of 1 Å, respectively. - We provide a sample training configuration file at
configs/train.json
. - After tuning the options, e.g. GPU id, batch size, # of epochs, run the training script as follows:
python train_model.py --config configs/train.json
- Depending on the options in
train.json
, the training configuration, log, and/or trained models will be saved to./results/train/yyyy-mm-dd-current-clock-time
.
If you find our work helpful, please consider cite our work as follows:
@article{CryoFEM2023,
author = {Dai, Xin and Wu, Longlong and Yoo, Shinjae and Liu, Qun},
title = "{Integrating AlphaFold and deep learning for atomistic interpretation of cryo-EM maps}",
journal = {Briefings in Bioinformatics},
volume = {24},
number = {6},
pages = {bbad405},
year = {2023},
month = {11},
issn = {1477-4054},
doi = {10.1093/bib/bbad405},
url = {https://doi.org/10.1093/bib/bbad405},
}
This source code is licensed under the CSI approved 3-clause BSD license found in the LICENSE file in the root directory of this source tree.