Skip to main content

New cMonkey R Package and Code

cMonkey R package

The cMonkey R package

You have reached the updated cMonkey site with a new R package download, installation and running instructions, and additional information.

If you are looking for the original cMonkey site, you can find it here. The original cMonkey algorithm was published in the manuscript "Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks", by David J Reiss, Nitin S Baliga and Richard Bonneau.

organismn.genesn.arraysscoreresidualprofilenetworkmotif1motif2motif.posns
S. cerevisiae12445-2.6370.324http://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_sce/htmls/cluster136_profile.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_sce/htmls/cluster136_network.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_sce/htmls/cluster136_pssm1.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_sce/htmls/cluster136_pssm2.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_sce/htmls/cluster136_mot_posns.png
E. coli24329-4.1300.438http://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_eco/htmls/cluster104_profile.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_eco/htmls/cluster104_network.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_eco/htmls/cluster104_pssm1.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_eco/htmls/cluster104_pssm2.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_eco/htmls/cluster104_mot_posns.png
H. pylori1236-2.1940.221http://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_hpy/htmls/cluster054_profile.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_hpy/htmls/cluster054_network.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_hpy/htmls/cluster054_pssm1.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_hpy/htmls/cluster054_pssm2.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_hpy/htmls/cluster054_mot_posns.png
H. salinarum21175-1.9810.312http://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_hal/htmls/cluster037_profile.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_hal/htmls/cluster037_network.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_hal/htmls/cluster037_pssm1.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_hal/htmls/cluster037_pssm2.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_hal/htmls/cluster037_mot_posns.png
A. thaliana1966-4.5890.302http://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_ath/htmls/cluster494_profile.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_ath/htmls/cluster494_network.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_ath/htmls/cluster494_pssm1.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_ath/htmls/cluster494_pssm2.pnghttp://baliga.systemsbiology.net/cmonkey/output/cmonkey_4.1.5_ath/htmls/cluster494_mot_posns.png

To run cMonkey on your own expression data, you will currently need a UNIX-y system (Mac OS X or Linux). You will only need to do the following steps once:

  1. Install the latest version of R (I am currently using version 2.10.1). It should work for versions 2.9.x and 2.10.x.
  2. Install the following (helpful, but not absolutely required) R packages: RCurl, multicore, ff, igraph, RSVGTipsDevice, and hwriter by typing in R:
    install.packages( c( "RCurl", "multicore", "ff", "igraph", "RSVGTipsDevice", "hwriter" ) )
    The latter 3 packages are required only for plotting and graphical output, not for the algorithm itself. NOTE the multicore package does not seem to load when using the R GUI in Mac OS X. If you want cMonkey to utilize all processor cores on your machine, run R from a terminal instead.
  3. Download and install the most recent (currently version 4.4.5) cMonkey R source package on your system. In R, type:
    install.packages( "http://baliga.systemsbiology.net/cmonkey/cMonkey_4.4.5.tar.gz", repos=NULL, type="source" )
    NOTE that the package does install correctly, but may produce warnings during installation. Also NOTE that the inline documentation is currently empty. This will be remedied. In the meantime, look at the instructions below.
  4. If you plan to do motif detection (and of course you do!), download and compile/install the following:
    1. MEME version 3.0.14. Specific NOTES:
      • Command line programs meme and mast must be located in the [CWD]/progs directory, where [CWD] is the working directory where you will be running cMonkey from. You may do this via soft links (ln -s).
      • To find out what [CWD] is, type cwd() in R. You can change this directory in R by typing setwd("where/I/want/to/be/").
      • Other versions of MEME are probably fine; however version 3.0.14 is the only version that I will support.
    2. dust for masking low-information content regions in the DNA sequences. As with MEME, place the dust executable (or a link to it) in the [CWD]/progs directory. NOTE that dust is not required, but is highly recommended.
... now you're all set!

Email me if you have any problems with any of these instructions.



Now for the examples (including loading the cMonkey package):
  1. Run cMonkey on the Halo EGRIN expression data set (sample results are here):
    library( cMonkey )
    data( halo )
    attach( halo )
    e <- cmonkey()
    
    ... this will take a while to run. Have some coffee. NOTE that the first time cMonkey is run on your system, it will take a while to download the data files (especially the EMBL STRING interactions file), but these files will be cached for later runs.
  2. The cmonkey() call above returns a new environment object containing all data and additional functions resulting from the data analysis performed. This object will be used later for further exploration of the results (see below).
  3. cMonkey can just as easily be run on the sample H. pylori expression data set; just replace halo with hpy (sample results here).
  4. I have created a second package containing sample data sets for S. cerevisiae (yeast; sample results), E. coli (ecoli; sample results), and A. thaliana (ath; sample results). Since they are large, I made a "cMonkey.bigdata" supplemental R package, which you can install alongside the cMonkey package.
  5. Run cMonkey on your own Halo expression data matrix, e.g. from a tab-delimited file or broadcast to R via the R/Gaggle interface:
    library( cMonkey )
    data( halo )
    halo$ratios <- NULL ## Remove the pre-loaded Halo ratios matrix
    attach( halo )
    ## Load your own ratios file, for example:
    ratios <- read.delim( file="my_ratios.tsv", sep="\t", as.is=T, header=T )
    ## Another way to get the ratios matrix is via the Gaggle/R interface.
    ## This assumes that you have already broadcast the matrix from another application:
    library( gaggle ); ratios <- getMatrix() 
    e <- cmonkey()
    
    ... this will take a while to run. Have some coffee.
  6. Once a cMonkey run is complete, you may use the following to explore the results:
  7. Write out the clustering results to a set of files (similar to these) for visualization and exploration of biclusters in a web browser and/or via Gaggle and Firefox using Firegoose:
    e$write.project()
  8. Remind yourself what the number of biclusters that cMonkey was told to find:
    e$k.clust
  9. Print out a table with a summary of the better clusters (best clusters listed first). Note that the column labels are offset - the first column printed is the cluster number:
    e$cluster.summary()
  10. Plot some summary statistics and trends of (mean) scores during optimization, similar to these:
    e$plot.stats()
  11. Plot a cluster (e.g. cluster number 37 -- example):
    e$plotClust( 37 )
  12. Get the gene members of cluster number 37:
    e$get.rows( 37 )
  13. Get the condition members of cluster number 37:
    e$get.cols( 37 )
  14. Get the number of genes from a given list that is in each cluster:
    e$clusters.w.genes( geneList )
  15. Get the number of genes with a given function that is in each cluster:
    e$clusters.w.func( func )
... and more to come!