You have reached the updated cMonkey site with a new R package, installation and running instructions, and additional information.

If you are looking for the original cMonkey site, you can find it here. The original cMonkey algorithm was published with the manuscript "Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks", by David J Reiss, Nitin S Baliga and Richard Bonneau.

*NEW*  cMonkey package and source code are now on Github.

S. cerevisiae12445output/cmonkey_4.1.5_sce/htmls/cluster136_profile.pngoutput/cmonkey_4.1.5_sce/htmls/cluster136_network.pngoutput/cmonkey_4.1.5_sce/htmls/cluster136_pssm1.pngoutput/cmonkey_4.1.5_sce/htmls/cluster136_pssm2.pngoutput/cmonkey_4.1.5_sce/htmls/cluster136_mot_posns.png
E. coli24329output/cmonkey_4.1.5_eco/htmls/cluster104_profile.pngoutput/cmonkey_4.1.5_eco/htmls/cluster104_network.pngoutput/cmonkey_4.1.5_eco/htmls/cluster104_pssm1.pngoutput/cmonkey_4.1.5_eco/htmls/cluster104_pssm2.pngoutput/cmonkey_4.1.5_eco/htmls/cluster104_mot_posns.png
H. pylori1236output/cmonkey_4.1.5_hpy/htmls/cluster054_profile.pngoutput/cmonkey_4.1.5_hpy/htmls/cluster054_network.pngoutput/cmonkey_4.1.5_hpy/htmls/cluster054_pssm1.pngoutput/cmonkey_4.1.5_hpy/htmls/cluster054_pssm2.pngoutput/cmonkey_4.1.5_hpy/htmls/cluster054_mot_posns.png
H. salinarum21175output/cmonkey_4.1.5_hal/htmls/cluster037_profile.pngoutput/cmonkey_4.1.5_hal/htmls/cluster037_network.pngoutput/cmonkey_4.1.5_hal/htmls/cluster037_pssm1.pngoutput/cmonkey_4.1.5_hal/htmls/cluster037_pssm2.pngoutput/cmonkey_4.1.5_hal/htmls/cluster037_mot_posns.png
A. thaliana1966output/cmonkey_4.1.5_ath/htmls/cluster494_profile.pngoutput/cmonkey_4.1.5_ath/htmls/cluster494_network.pngoutput/cmonkey_4.1.5_ath/htmls/cluster494_pssm1.pngoutput/cmonkey_4.1.5_ath/htmls/cluster494_pssm2.pngoutput/cmonkey_4.1.5_ath/htmls/cluster494_mot_posns.png


The latest version of cMonkey is still under active development and has now been applied successfully to many systems, including plants (Arabidopsis thaliana) and mammals (Homo sapiens).

The cMonkey package will enable a user to run the integraged biclustering algorithm on their own microarray data, for their own organism of interest. During initialization, it will automatically download and integrate additional data for that organism, including:

IMPORTANT NOTE: cMonkey currently does not completely support H. sapiens. We are working on a version for that specific task. In the meantime, see below for instructions on using cMonkey to run only on your Hsa expression data (i.e. without motifs or networks).


To run cMonkey on your own expression data, you will need to use a UNIX-y operating system (e.g., Mac OS-X or Linux). cMonkey is not currently supported on Windows. For Windows users, Cygwin, VirtualBox, or Amazon AWS are inexpensive options to obtain access to a UNIX system.

You will only need to do the following steps once:

  1. Install the latest version of R (I am currently using version 2.11.0). It should work for versions 2.9.x and up.
  2. Install the following R packages and their dependencies (all are helpful, but none are absolutely required): RCurl, doMC, igraph0, RSVGTipsDevice, and hwriter by typing in R:
    install.packages(c('RCurl', 'doMC', 'igraph0', 'RSVGTipsDevice', 'hwriter'))
    Install the cMonkey package from Github using devtools. In R, type:
    install.packages('devtools', dep=T)
    install_github('cmonkey', 'dreiss-isb', subdir='cMonkey')
  3. * NOTE while the package will install correctly, it generates warnings during installation. Also NOTE that the inline documentation is currently empty. This will be remedied. In the meantime, use the instructions below.
  4. * NOTE old versions may be accessed here.
  5. * NOTE there is a supplementary data package containing sample data sets to use in the examples. For more information on its use, see below. To install it, in R, type:
    install_github('cmonkey', 'dreiss-isb', subdir='')
  6. * NOTE there is an additional supplementary BIG data package containing additional big sample data sets. For more information on its use, see below. To install, again:
    install_github('cmonkey', 'dreiss-isb', subdir='cMonkey.bigdata')
  7. For motif detection, cMonkey uses the MEME suite. Currently, only versions 3.0.14 or 4.3.0 are supported (although others should work).
    1. NEW: On UNIX-y OSes, meme, mast, and dust should be installed automatically upon the first run of cMonkey. If this fails, it may be installed from within R via:
      This is still somewhat experimental. If both options fail, see item (b.), below.
    2. If (a.) fails, meme, mast, and dust may be installed in the local [CWD]/progs directory, where [CWD] is the working directory from which you will be running R/cMonkey. You may do this via soft links (e.g. mkdir progs; ln -s /usr/local/bin/meme progs/meme). To find out what [CWD] is, type getwd() in R. You can change this directory in R by typing setwd("/new/dir/").
Email me if you have any problems with any of these instructions.


  1. First, various cMonkey parameters and instructions on how to set them are described below.
  2. Run cMonkey on the Halobacterium EGRIN expression data set (NOTE you will have to load the package; see above) (sample results):
    library( cMonkey ); library( ); data( halo )
    e <- cmonkey( halo )
    ... this will take ~5 hours to run. NOTE that the first time cMonkey is run on your system, it will take some time to download the additional data, but these files will be cached locally for future runs.
  3. It may be useful to pre-initialize a cMonkey run, which loads all the data and prepares the environment for performing the optimization. The pre-initialized environment may be saved, and then optimized later:
    library( cMonkey ); library( ); data( halo )
    e <- cmonkey.init( halo )
    cmonkey( e )
  4. The cmonkey(...) function returns a new environment object containing all data and additional functions resulting from the data analysis performed. This object will subsequently be used for exploration of the results (see below).
  5. cMonkey can just as easily be run on the sample H. pylori and M. pneumoniae expression data sets; just replace halo in the example above, with hpy (sample results) or mpn (sample results), respectively.
  6. The separate "cMonkey.bigdata" supplemental R package contains additional large sample data sets for S. cerevisiae (yeast; sample results), E. coli (ecoli; sample results), and A. thaliana (ath; sample results).
  7. Finally, run cMonkey on your own expression data matrix, for your organism of interest (e.g. B. subtilis; all organism codes may be found here):
    library( cMonkey )
    ratios <- read.delim( file='my_ratios.tsv', sep='\t',, header=T )
    e <- cmonkey( organism='bsu' )
  8. And to run cMonkey on your H. sapiens expression data, without motifs or networks (we are working on a version which will have that capability), use the following commands:
    library( cMonkey )
    ratios <- read.delim( file='my_ratios.tsv', sep='\t',, header=T )
    e <- cmonkey( organism='hsa',, post.adjust=FALSE, mot.weights=numeric(), net.weights=numeric() )


Once a cMonkey run is complete, you may use the following to explore the results:
  1. Write out the clustering results to a set of interactive web-browseable files (similar to these) for visualization and exploration of biclusters in a web browser and/or via Gaggle and Firefox using Firegoose:
  2. Write out the clustering results to a set of Cytoscape files for visualization and exploration of biclusters, their genes and motifs in a network format:
  3. Remind yourself the number of biclusters that cMonkey was told to find (other user-defined or default parameters may be accessed in a similar manner):
  4. Print out a table with a summary of the better clusters (best clusters listed first):
  5. Plot some summary statistics and trends of (mean) scores during optimization, similar to these:
  6. Plot a bicluster (e.g. cluster number 37 -- example):
    e$plotClust( 37 )
  7. Get the gene or condition members of cluster number 37:
    e$get.rows( 37 )
    e$get.cols( 37 )
  8. Get the number of genes or conditions that is in each cluster from a given list:
    e$clusters.w.genes( geneList )
    e$clusters.w.conds( condList )
  9. Get the number of genes with a given function annotation substring (e.g. "ribosom" to query "ribosome" or "ribosomal") that is in each cluster:
    e$clusters.w.func( func )
More functions and parameters will be documented on an as-asked-about basis.


cMonkey has a multitude of internal parameters which affect various aspects of its performance and data integration.
For example, additional may be included by a simple tweaking of the parameters. Most of these are currently undocumented but please contact me if you are interested in such possibilities.
Input parameters and data may be pre-set in one of several different ways, including:
  1. Pre-setting them in the global environment prior to starting cmonkey(), as in:
    parallel.cores <- 2
    k.clust <- 200
    e <- cmonkey()
  2. Or setting them in the cmonkey() call itself, as in:
    e <- cmonkey( parallel.cores=2, k.clust=200 )
  3. Or adding them to a list or environment object which is passed to the cmonkey() call, e.g.:
    mylist <- list( parallel.cores=2, k.clust=200 )
    e <- cmonkey( mylist )
... or any combination thereof.

Some parameters which may be of general interest:
More functions and parameters will be documented on an as-asked-about basis.