Final Project for CS272.
Introduction to Biomedical Informatics Research
Methodology at Stanford University.
Ayush Kevin Shreyas Tom
Configuring your system for the project.
Preparing the local environment.
-
conda create -n tox python=3.9
-
conda activate tox
-
conda install -c conda-forge biopython
-
conda install -c pytorch pytorch
-
conda install scikit-learn
Making Conda available in Jupyter.
-
conda install -c anaconda ipykernel
-
python -m ipykernel install --user --name=tox
Contains data from previous papers,
including ToxIBTL, ToxDL & ToxinPred.
Contains Python files for exploratory data analysis.
This includes the reading in and wrangling of data into
a standard format ( sequences and toxic / non-toxic ),
identification of duplicate sequences, division of data
into training and test, as well as analysis of sequence
similarity.
Contains data related to CD-HIT, which we use to
determine sequences that are at least 40%
similar.
Contains Jupyter notebooks used in the
process of developing our ToxIN model.
The /ToxIBTL/
folder contains
original code from ToxIBTL.