Skip to content
@ML4GLand

Machine Learning for Genomics

A collection of tools for investigating how DNA encodes function with machine learning

Welcome to the land of machine learning for genomics!

ML4GLand is a community for that develops and maintains tools (primarily in Python) for genomics sequence based machine learning.

Why?

Deep learning has become a popular tool for investigating gene regulation, including DNA and RNA protein binding specificity, chromatin state and architecture, and transcriptional activity. However, executing a typical workflow for building and interpreting deep learning models remains a challenge. Training nuances specific to genomics data along with complex preprocessing and interpretation methods create an especially high learning curve, and heterogeneity in implementations of most code associated with publications hinders reproducibility and extensibility. A tool for exposing existing data, models and methods to computational scientists, that can also serve as a platform for development, will greatly improve our ability to use sequence-based machine learning to interrogate gene regulatory mechanisms.

We aim to build a framework for developing sequence-to-function deep learning models

Previous work has shown the utility of such frameworks. DeepChem and scverse are excellent examples. Our mission is to put together a similar ecosystem for sequence based genomics.

Core packages

  • SeqPro -- a Python package for processing DNA/RNA sequences for machine learning.
  • SeqData -- a Python package for preparing machine learning-ready genomic sequence datasets.
  • SeqExplainer -- a Python package for interpreting sequence-to-function machine learning models.
  • EUGENe -- a Python package for streamlining and customizing end-to-end deep-learning sequence analyses in regulatory genomics.

Ecosystem packages

  • SeqDatasets -- a repository for downloading datasets and loading them with SeqData.
  • MotifData -- a Python package for handling motifs.

Usage repositories

  • tutorials - a repository of tutorials for ML4GLand tools.
  • use cases -- a repositoy of use cases that showcase ML4GLand tools potential ecosystem packages.

Pinned Loading

  1. EUGENe EUGENe Public

    Elucidating the Utility of Genomic Elements with Neural Nets

    Jupyter Notebook 65 4

  2. SeqExplainer SeqExplainer Public

    Interpreting sequence-to-function machine learning models

    Jupyter Notebook 3 1

  3. SeqPro SeqPro Public

    Genomic sequence preprocessing toolkit

    Python 10 1

  4. tutorials tutorials Public

    A set of tutorials for how to use all the tools in ML4GLand

    Jupyter Notebook 2

  5. SeqData SeqData Public

    Annotated sequence data

    Jupyter Notebook 11 1

  6. use_cases use_cases Public

    Repository documenting applications of the ML4GLand suite on published datasets

    Jupyter Notebook 1

Repositories

Showing 10 of 12 repositories
  • EUGENe Public

    Elucidating the Utility of Genomic Elements with Neural Nets

    ML4GLand/EUGENe’s past year of commit activity
    Jupyter Notebook 65 MIT 4 8 0 Updated Nov 18, 2024
  • SeqData Public

    Annotated sequence data

    ML4GLand/SeqData’s past year of commit activity
    Jupyter Notebook 11 MIT 1 1 0 Updated Nov 5, 2024
  • SeqPro Public

    Genomic sequence preprocessing toolkit

    ML4GLand/SeqPro’s past year of commit activity
    Python 10 MIT 1 1 0 Updated Nov 4, 2024
  • tutorials Public

    A set of tutorials for how to use all the tools in ML4GLand

    ML4GLand/tutorials’s past year of commit activity
    Jupyter Notebook 2 0 1 0 Updated Oct 18, 2024
  • SeqDatasets Public

    Datasets for benchmarking, testing and developing in EUGENe

    ML4GLand/SeqDatasets’s past year of commit activity
    Python 1 MIT 0 0 0 Updated Oct 13, 2024
  • SeqExplainer Public

    Interpreting sequence-to-function machine learning models

    ML4GLand/SeqExplainer’s past year of commit activity
    Jupyter Notebook 3 MIT 1 0 0 Updated Jan 25, 2024
  • use_cases Public

    Repository documenting applications of the ML4GLand suite on published datasets

    ML4GLand/use_cases’s past year of commit activity
    Jupyter Notebook 1 0 0 0 Updated Jan 3, 2024
  • .github Public
    ML4GLand/.github’s past year of commit activity
    0 0 0 0 Updated Nov 9, 2023
  • EUGENe_paper Public

    Code for generating the figures and results presented in the manuscript EUGENe: A Python toolkit for sequence activity prediction and analysis

    ML4GLand/EUGENe_paper’s past year of commit activity
    Jupyter Notebook 4 CC0-1.0 0 0 0 Updated Sep 18, 2023
  • MotifData Public

    Motif representation and analysis toolkit in Python

    ML4GLand/MotifData’s past year of commit activity
    Jupyter Notebook 0 MIT 1 0 1 Updated Jul 21, 2023

People

This organization has no public members. You must be a member to see who’s a part of this organization.