Introduction

GLMSite is an accurate predictor for identifying nucleic-acid-binding residues using multiple-task strategy and large-scale language model. Here, the informative sequence representations and protein structures are generated through the pretrained language model and ESMFold first. Then GLMSite employs geometric vector perceptron to extract geometric and relational characteristics from predicted protein structures, as well as leverages the multiple-task deep learning strategy to obtain common binding characteristics from different nucleic acids. Finally, two nucleic-acid-specific fully connected layers are employed to learn the binding patterns of particular nucleic acids. Through comprehensive tests on DNA/RNA benchmark datasets, GLMSite was shown to surpass the latest sequence-based methods and even the state-of-the-art structure-based methods.

System requirement

python 3.8.16
numpy 1.24.2
pyg-lib 0.1.0+pt113cu116
pyparsing 3.0.9
scikit-learn 1.2.2
six 1.16.0
torch 1.13.1+cu116
torch-cluster 1.6.1+pt113cu116
torch-geometric 2.2.0
torch-scatter 2.1.1+pt113cu116
torch-sparse 0.6.17+pt113cu116
torch-spline-conv 1.2.2+pt113cu116
torchaudio 0.13.1+cu116
torchvision 0.14.1+cu116
urllib3 1.26.15
wheel 0.38.4

ProtTrans

You need to prepare the pretrained language model ProtTrans to run GLMSite:
Download the pretrained ProtT5-XL-UniRef50 model (guide).

ESMFold

The protein structures should be predicted by ESMFold to run GLMSite:
Download the ESMFold model (guide)

Run GLMSite for prediction

Simply run:

python predict.py --dataset_path ../Example/structure_data/ --feature_path ../Example/prottrans/ --input_path ../Example/demo.pkl

And the prediction results will be saved in

../Example/results

Dataset and model

We provide the datasets and the trained GLMSite models here for those interested in reproducing our paper. The datasets used in this study are stored in ../Dataset/. The trained GLMSite models can be found under ../Model/.

contact

Yidong Song (songyd6@mail2.sysu.edu.cn)
Yuedong Yang (yangyd25@mail.sysu.edu.cn)

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
Dataset		Dataset
Example		Example
IMG		IMG
Model		Model
Script		Script
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

System requirement

ProtTrans

ESMFold

Run GLMSite for prediction

Dataset and model

contact

About

Releases

Packages

Languages

biomed-AI/nucleic-acid-binding

Folders and files

Latest commit

History

Repository files navigation

Introduction

System requirement

ProtTrans

ESMFold

Run GLMSite for prediction

Dataset and model

contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages