GitHub - baliga-lab/cmonkey2: Python port of cMonkey, a machine-learning based method for clustering

cMonkey₂ - Python port of the cMonkey biclustering algorithm

Description

This is the Python implementation of the cMonkey algorithm based on the original R implementation by David J. Reiss, Institute for Systems Biology.

Documentation

A complete set of documentation for installation and running of cMonkey is on the project's Github Pages.

There are also developer and user discussion groups.

Contact

Please report all bugs or other issues using the issue tracker. Please direct any and all questions to either the developer or user discussion groups.

Installation

The recommended way is to install cmonkey2 through pip

pip install cmonkey2

This will install the tools cmonkey2 and cm2view into your python environment. Please note that you will have to install MEME manually from http://meme-suite.org/

Running cmonkey2

The simplest way to run the tool (if all data available in RSAT and STRING):

$ cmonkey2 --organism <organism-code> <tab separated file of gene expressions>

To display available options:

bin/cmonkey2.sh --help

To run the example organism:

bin/cmonkey2.sh --organism hal --rsat_base_url http://networks.systemsbiology.net/rsat example_data/hal/halo_ratios5.tsv

Using directly from the source repository

Below are the instructions to use cmonkey2 directly in the source repository

Using a Docker Image

PreCyte made a Docker image based on cmonkey2 available on their github account

https://github.com/PreCyte/cMonkey2-docker/

System requirements

cMonkey₂ has been tested and runs on all tested recent versions of Linux (including debian-based [Ubuntu, Mint, Debian] and RPM-based [CentOS, Fedora]) and recent versions of Mac OS X. Additional dependencies include:

Developed and tested with Python 2.7.x and Python 3.x
scipy >= 0.9.0
numpy >= 1.6.0
biopython >= 1.63
BeautifulSoup >= 4
R >= 2.14.1
rpy2 >= 2.2.1
MEME 4.3.0 or >= 4.8.1 (all version up to 5.3.3 tested)
csh (for running MEME)
pandas
sqlalchemy and sqlalchemy-utils
svgwrite

for the human setup, Weeder 1.4.2 is needed

for running the unit tests (optional):

python-xmlrunner

for running the interactive monitoring and visualization web application (optional):

CherryPy 3
Jinja2
python-routes

Running the Unit Tests

bin/run_tests.sh

Running cmonkey2

In general, you should be able to run cmonkey2 on microbial gene expression ratios with

bin/cmonkey2.sh --organism <organism-code> <tab separated file of gene expressions>

The file can be either in your file system or a web URL.

After the program was started, a log file will be written in cmonkey.log. You can see all available options with

bin/cmonkey2.sh --help

Test Run with Halobacterium Salinarum

There is a startup script for cMonkey to run the current integrated system

bin/cmonkey2.sh --organism hal example_data/hal/halo_ratios5.tsv

Start the python based monitoring application

bin/cm2view.sh [--out [output directory]]

Another way is to run Halobacterium is specify the RSAT database

bin/cmonkey2.sh --organism hal --rsat_organism Halobacterium_NRC_1_uid57769 --rsat_base_url http://pedagogix-tagc.univ-mrs.fr/rsat --rsat_features gene --nooperons --use_BSCM example_data/hal/halo_ratios5.tsv

Running cMonkey on Human

To run cMonkey on human data, run the following code with your own <ratios.tsv> file

bin/cmonkey2.sh --organism hsa --string <stringFile> --rsat_organism Homo_sapiens_GRCh37 --rsat_URL http://rsat.sb-roscoff.fr/ --rsat_features protein_coding --nooperons <ratios.tsv>

More details for running cMonkey on human data

Running cMonkey on Human data is somewhat difficult because neither the string database nor the RSAT database has human data cleanly entered. Here are the steps for a sucessful python cMonkey run on human

Make a gene interaction file. The example data file mentioned above was generated from Biogrid around 10/6/14.
Find an RSAT mirror that has .raw chromose files and feature files. In the above example, we use Homo_sapiens_ensembl_74_GRCh37 from the main RSAT database. To annotate these we use 'protein_coding.tab' and 'protein_coding_names.tab'. In principal, other annotation files such as 'processed_transcript' would work just as well.
Adjust the upstream region searched, and perhaps modify the code to search for know TF and miRNA motifs rather than de-novo motifs. NOTE: Modiyfing the motif search step is non-trivial.

Package maintainers

General

The distribution is built using setuptools and wheel format

setup.py contains all information needed to build the distribution increase the version number before making a distribution
record user-relevant changes in CHANGELOG.rst

Build distribution

python3 setup.py sdist bdist_wheel

Uploading to PyPI

twine upload -r pypi dist/cmonkey2-*

Name		Name	Last commit message	Last commit date
Latest commit History 1,316 Commits
bin		bin
cmonkey		cmonkey
docs		docs
example_data		example_data
graphics		graphics
inferelator		inferelator
nwportal		nwportal
test		test
testdata		testdata
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.rst		CHANGELOG.rst
LICENSE		LICENSE
README.md		README.md
README.rst		README.rst
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cMonkey₂ - Python port of the cMonkey biclustering algorithm

Description

Documentation

Contact

Installation

Running cmonkey2

Using directly from the source repository

Using a Docker Image

System requirements

Running the Unit Tests

Running cmonkey2

Test Run with Halobacterium Salinarum

Start the python based monitoring application

Another way is to run Halobacterium is specify the RSAT database

Running cMonkey on Human

More details for running cMonkey on human data

Package maintainers

General

Build distribution

Uploading to PyPI

About

Releases

Packages

Contributors 8

Languages

License

baliga-lab/cmonkey2

Folders and files

Latest commit

History

Repository files navigation

cMonkey2 - Python port of the cMonkey biclustering algorithm

Description

Documentation

Contact

Installation

Running cmonkey2

Using directly from the source repository

Using a Docker Image

System requirements

Running the Unit Tests

Running cmonkey2

Test Run with Halobacterium Salinarum

Start the python based monitoring application

Another way is to run Halobacterium is specify the RSAT database

Running cMonkey on Human

More details for running cMonkey on human data

Package maintainers

General

Build distribution

Uploading to PyPI

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

cMonkey₂ - Python port of the cMonkey biclustering algorithm

Packages