Mentors : Daniel Zerbino, Elspeth Bruford, Ruth Seal
Applying machine learning techniques to characterising and naming lncRNA genes
Advances in RNA sequencing technologies have revealed the complexity of our genome. Long non-coding RNAs (lncRNAs) make up the majority of the non-coding transcriptome. Understanding the significance of this RNA world is one of the most important challenges faced in biology today, and the lncRNAs within it represent a gold mine of potential new biomarkers and drug targets. Its discovery is still at a preliminary stage.
To date, very few lncRNAs have been characterized in detail. However, it is clear that lncRNAs are important regulators of gene expression, and lncRNAs are thought to have a wide range of functions in cellular and developmental processes. There are many specialized lncRNA databases (like RefSeq, GENCODE, Ensembl, SGD, tair). We will use Machine Learning techniques to highlight and compare two sets of calls (of Ensembl / GENCODE and RefSeq) and determine which calls are incorrect.
Contains 5 folders namely:
- Ensembl-analysis - Where scripts for making analysis and data collected from Ensembl can be found.
- RefSeq-analysis - Where scripts for making analysis and data collected from RefSeq can be found.
- feature_selection - Where scripts for creating features can be found.
- ML - Where scripts for making ML analysis on data collected (with their features) can be found.
- add_copyright_to_all - Where script for adding copyright Info to all ipynb files can be found.
json
Pandas
Numpy
Biopython
Pyfasta
gffpandas
sklearn
-
A Deep Learning Framework for Robust and Accurate Prediction of ncRNA-Protein
-
Accurate prediction of protein lncRNA interactions by diffusion and HeteSim features
-
lncRNAnet: Long Non-coding RNA Identification using Deep Learning
-
Prediction of LncRNA Subcellular Localization with Deep Learning from Sequence Features
-
I would like to thank Daniel Zerbino for taking the time to mentor me and for providing invaluable suggestions. I truly appreciate his constant trust and encouragement!
-
Ensembl admins, helpdesk and the whole community
-
GSoC organizers, managers and Google