![]()
CSpotRun: A cloud infrastructure for automating machine-learning algorithm runs in an inexpensive, unpredictable environment
What is CSpotRun?
The advent of cloud computing allowed smaller organizations to leverage huge amounts of computing power, but this power comes at a cost. Both institutional computing clusters and commercially available clouds such as Amazon Elastic Compute Cloud (EC2) or Microsoft Azure costs can become prohibitive for large scale projects with massive amounts of computation. The introduction of EC2 Spot Instance Pricing reduces the cost substantially, but with the caveat that your computing nodes may terminate at any time. In a spot market, users bid on excess compute cycles, and a user's jobs will run as long as his/her bid exceeds the current spot price. When the current spot price becomes higher than the user's bid the jobs will be terminated. In order to ensure that a job finishes, a job must be able to save its state externally so it can be restarted later either when the spot price returns into the range of the chosen spot price or with a higher spot price.We present a web application and cloud infrastructure called CSpotRun which can automate the running of time- and CPU-intensive computing jobs on multiple virtual computing nodes in an environment where the nodes may be shut down at any time.
CSpotRun could be adapted to run any computing job, but we focus on cMonkey, a machine-learning algorithm for biological regulatory network inference.
While frameworks similar to CSpotRun--such as Hadoop--already exist, certain aspects of cMonkey, and the necessity to run within the Spot Instance context make their use unfeasible. CSpotRun addresses these concerns.
CSpotRun is not provided as a public web site at this time. This is because running it incurs costs and we do not have an infrastructure to bill other users. However, any organization may host their own instance of CSpotRun using their own Amazon Web Services account. See the following link for more information about installing CSpotRun.

