Why EGRIN 2.0?A foremost challenge in systems biology is to understand how just a few transcription factors (TFs) in a microbial genome generate a wide array of nuanced responses to varied environmental challenges. EGRIN 2.0 is a new model for the complete gene regulatory network (GRN) of a prokaryote. This model is reverse engineered directly from gene expression data and genomic sequence, and hence the methodology to generate EGRIN 2.0 is applicable to any prokaryotic organism.
- ‘Big data’ cancer research has revealed a new spectrum of genetic mutations across tumors that need understanding.
- Existing methods for analyzing DNA defects in cancer are blind to how those mutations actually behave.
- ISB scientists developed a new approach using physics- and structure-based modeling to systematically assess the spectrum of mutations that arise in several gene regulatory proteins in cancer.
A significant challenge in cancer research is having the right tools and methods to analyze the myriad data generated by large-scale projects such as The Cancer Genome Atlas. Existing methods are too statistical and are blind to the way in which mutations actually affect protein function and biophysical mechanisms.
Researchers at Institute for Systems Biology have developed a new method that systematically investigates how small mutations in the DNA sequences that encode proteins result in structural and energetic changes that impact protein function. This study, a collaboration of the Baliga and Shmulevich labs, simulated the physics and energetics of protein structure and function in order to assess the impacts of a broad spectrum of protein mutations that have been observed in human cancers. Specifically, researchers considered mutations in proteins called transcription factors that regulate cellular function in order to predict and identify the putative effects of these mutations on gene expression and cell regulatory pathways.
Title: Structure-based predictions broadly link transcription factor mutations to gene expression changes in cancers
Journal: Nucleic Acids Research
Authors: Justin Ashworth, Brady Bernard, Sheila Reynolds, Christopher L. Plaisier, Ilya Shmulevich, Nitin S. Baliga
Critical losses of protein function in gene regulatory proteins and tumor suppressors can lead to cellular dysregulation and the hallmarks of cancer. Using a physics-based approach to assess the impacts of mutations resulted in higher mechanistic accuracy in determining links between mutations in transcription factors and changes in gene expression. The method also illustrates a quantitative relationship between the relevant thermodynamic impacts of each unique protein mutation and its prevalence in cancerous tissues.
Integrating molecular biophysics and structure-based modeling into systems-based analyses of disease mutations will improve understanding of the molecular genetics of cancers and to interpret the complex data about mutations that are now available.
Read more about ISB’s work with The Cancer Genome Atlas project.
- Nearly a decade ago, ISB’s Baliga Lab published a landmark paper describing cMonkey, an innovative method to accurately map gene networks within any organism from microbes to humans.
- Two new papers describe the benchmark results of cMonkey and also the release of cMonkey2, which performs with higher accuracy.
- Using this approach, genetic and molecular data generated from any organism, be it a bacterium or a birch tree, can be explored and analyzed from a network perspective.
For most of the history of biology, data was a limiting quantity that was painstakingly gathered and meticulously curated. However, the meteoric rise of sequencing technology accompanied by the parallel emergence of computing has fueled a recent data explosion in biology. Such novel technologies allow scientists to ask new questions and revisit old ones with a fresh perspective. However, it also brings new problems when interpreting the deluge of information.
Nearly a decade ago, members of the Baliga group at the Institute for Systems Biology published a landmark paper describing cMonkey, an innovative method to accurately map gene networks within any organism from microbes to humans. The method takes advantage of expanding biological datasets and computational power, and has since been applied to many different organisms across a huge range of publicly and privately generated experimental data.
Prior to cMonkey, it was common practice to group sets of genes that had similar expression levels across the experimental conditions using a process termed clustering. However, this practice inherently assumes that gene clusters are static. While this assumption may apply in limited circumstances, such as a binary treatment vs. control experimental setup, it may not be true across a wide variety of conditions. Thus, ISB researchers set out to create a novel method to cluster both genes and conditions simultaneously, known as biclustering. Unlike clustering, biclustering is a difficult and arguably impossible computational problem, meaning that no solution is guaranteed to be optimal. Nevertheless, biclustering algorithms including cMonkey have demonstrated practical value in dealing with real biological data.
In an update, published on April 15 in the journal Nucleic Acids Research, ISB researchers present the evolution of cMonkey. The paper describes updates to the algorithm – cMonkey2 – and assessed performance against alternative platforms using three distinct datasets from two different bacterial organisms and human cancer cells. Detailed performance benchmarks demonstrate the efficacy of cMonkey2 in accurate network reconstruction as well as the broad applicability across cell types. Performance aside, a major addition to the software is that is has been converted from being solely a biclustering algorithm to a biclustering and data integration platform. The new platform enables easy integration of many different additional data types, allowing users to expand the data classes to include categories we have not thought of yet.
Indeed, one of the signature features of cMonkey is its ability to integrate additional relevant sources of information. For example, many metabolic pathways have been experimentally defined in bakers yeast and various bacteria. Such pathway information can be assimilated as an association network that guides the algorithm in finding parsimonious biclusters. Various association networks based on interactions among proteins, DNA, and other molecules can also be used to aid this process. In the end, the clusters that arise are the ones that can best account for all the disparate types of data that are fed into the algorithm. This integrative approach gave cMonkey an advantage compared to other biclustering algorithms when it was released, and continues to be a distinguishing characteristic in its’ updated form.
There is a major opportunity to build bicluster networks from plentiful publicly available consortium datasets generated by multiple independent laboratories. This opportunity also presents a challenge because variation between the sources can introduce noise that reduces bicluster quality. In a paper published on April 15 in the journal BMC Systems Biology, ISB researchers have added a new metric to the cMonkey scoring algorithm to improve the quality of biclusters when dealing with highly variable source datasets. This metric improved the accuracy of condition-specific gene clustering, including a demonstrable enhancement in predicting a physiological response to nutrient shifts in yeast cells.
Beyond including the new bicluster quality metric, the revised cMonkey2 is now modularized to facilitate facile incorporation of additional data types, as well as adjustment of the weights those data types receive in the biclustering calculations. The resulting outputs of biclusters are easily interrogated using an intuitive web-based framework, and the data can be further analyzed and visualized using additional software that has been developed by the Baliga laboratory, different groups at the ISB, and the rest of the scientific community. A final note worth mentioning is the programming language: Originally built in the statistical modeling environment R, the updated version has been rewritten in Python, one of the most widely used programming languages today.
The benchmark results in the paper speak for themselves, but suffice it to say this is a uniquely comprehensive and powerful tool for modern systems biology research. To encourage widespread adoption of cMonkey2, the documentation for usage and development has been updated and expanded. Using this approach, genetic and molecular data generated from any organism, be it a bacterium or a birch tree, can be explored and analyzed from a network perspective.
Image above: Scanning EM of bacteria being eaten by white blood cell. Photo Credit: Adrian Ozinsky
From our inception, we at ISB have been committed to knowledge transfer. This profound sense of responsibility to share what we learn serves as the foundation for our signature professional course on systems biology. This year’s course, which takes place July 27-31 in ISB’s conference facility, will offer a few new features, including lightning talks about systems biology technologies and a mini symposium consisting of research vignettes from nine ISB researchers. The course is geared toward graduate students, post-doctoral fellows and principal investigators. If you or someone you know would be interested in participating, visit course.systemsbiology.net for registration info. (This event is co-sponsored by ISB’s Center for Systems Biology.)
Dr. Chris Plaisier, a senior research scientist in the Baliga Lab and one of the organizers of the summer course, shares some thoughts:
Q: Why is it important for ISB to offer a systems biology course?
CP: ISB does lots of amazing science and we do try to break the insights down for the public who pays for our research to see. But that isn’t enough. What we really need is a way to get the new systems biology ideas and methods we and others have developed into the hands of our colleagues. Science is a group effort and by teaching others to use our systems approaches we will greatly enhance their ability to conduct research and solve complex biological problems.
Q: What distinguishes the systems biology course at ISB from any other?
CP: Last year we revamped and updated our course to be focused on Systems Biology of Disease because there were no courses like it out there. The goal of this course is to give researchers the tools needed to divide patients into meaningful groups, discover molecular markers that distinguish these groups, and how to use networks to discover drug targets. Together these tools allow researchers to do personalized medicine where treatments are tailored to a specific patient’s disease.
Q: What do students gain from experiencing ISB’s course?
CP: Participants in the ISB summer course will gain experience using systems biology approaches that allow them to carry out personalized medicine. They will also experience presentations of excellent examples of systems biology from world experts and the exceptional researchers at ISB.
Q: How does learning systems biology help researchers?
CP: It is vitally important to think more holistically in this age of big data and with an understanding that interactions can be just as important as single effects. The traditional approach of focusing on single isolated components just won’t work in the complex systems that are human beings. We advocate instead to embrace the complexity and have developed approaches that allow us to uncover the underlying causes of human diseases no matter how complex.
Q: Why should non-scientists care about systems biology?
CP: Systems biology is the future of research and will provide the tools needed to understand complex diseases like cancer, heart disease, obesity, etc., that are plaguing our society. With systems biology, we will learn how all of the different aspects of human health (diet, genetics and environment) affect these diseases and how to treat each individual patient with personalized medicine.