Skip to content

CloudConductor is a workflow management system that generates and executes bioinformatics pipelines

Notifications You must be signed in to change notification settings

whitebird01/CloudConductor

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CC

CloudConductor: Simplified Bioinformatics

CloudConductor is a cloud-based workflow engine for defining and executing bioinformatics pipelines in a cloud environment. Currently, the framework has been tested extensively on the Google Cloud Platform, but will eventually support other platforms including AWS, Azure, etc.

Feature Highlights

  • User-friendly
    • Define complex workflows by linking together user-defined modules that can be re-used across pipelines
    • Config_obj for clean, readable workflows (see below example)
    • 50+ pre-installed modules for existing bioinformatics tools
  • Portable
    • Docker integration ensures reproducible runtime environment for modules
    • Platform independent (currently supports GCP; AWS, Azure to come)
  • Modular/Extensible
    • Plug-N-Play with user-defined task modules
    • Easily re-use, re-combine across workflows
      • Eliminates serial copy/paste
    • Easily add or customize task modules as needed
  • Pre-Launch Type-Checking
    • Strongly-typed task modules
      • Catch pipeline errors prior to runtime
    • Pre-launch validation ensures pipeline success/failure
  • Scalable
    • Removes resource limitations imposed by cluster-based HPCCs
  • Elastic
    • VM usage automatically scales to match input file sizes, computational needs
  • Scatter-Gather Parallelism
    • In-built logic for dividing large tasks into small chunks and re-combining
  • Economical
    • Preemptible/Spot instances drastically cut workflow costs

Setting up your system

CloudConductor is currently designed only for Linux systems. You will need to install and configure the following tools to run your pipelines on Google Cloud:

  1. Python v3.6+

    You can check your Python version by running the following command in your terminal:

    $ python3 -V
    Python 3.6.8

    To install the correct version of Python, visit the official Python website.

  2. Python packages: configobj, jsonschema, requests

    You will need pip to install the above packages. After installing pip, run the following commands in your terminal:

    # Upgrade pip
    sudo pip3 install -U pip
    
    # Install Python modules
    sudo pip3 install -U configobj jsonschema requests
  3. Clone the CloudConductor repo

    # clone the repo
    git clone https://github.com/labdave/CloudConductor.git
  4. Google Cloud Platform SDK

    Follow the instructions on the official Google Cloud website.

Documentation

Get started with our full documentation to explore the ways CloudConductor can streamline the development and execution of complex, multi-sample workflows typical in bioinformatics.

Project Status

CloudConductor is actively under development. To get involved or request features, please contact Razvan Panea.

Authors & Contributors

About

CloudConductor is a workflow management system that generates and executes bioinformatics pipelines

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.8%
  • Dockerfile 0.2%