The Inferelator is an algorithm for inferring predictive regulatory networks from gene expression data.
It does so by selecting the regulators (transcription factors or environmental factors) whose levels are most predictive of each gene or bicluster’s expression (see cMonkey for more information). Using linear regression, L1 shrinkage and model selection via the LASSO coupled with 10-fold cross validation to strictly enforce parsimony and avoid overfitting, the method fits a multivariate kinetic model of gene expression that includes a sigmoidal activation model via the logistic function and mean decay rate parameter (τ). The model allows for the simultaneously fitting of time-course (τ/Δt > 0) and steady-state (τ/Δt ≈ 0) data, and was chosen from the class of generalized linear models to allow for fast parameter estimation and cross-validation. In addition, we developed a simple way of incorporating a generalized-linear extension of pairwise-logical interactions (AND, OR, XOR) between predictors using the functions min and max (which mimics physical chemistry derivations of logical interactions).
Thus, our generalized-linear dynamical network model cleanly incorporates some details of kinetic models, while maintaining the simplicity, flexibility, and robustness of linear and boolean models.
When integrated with cMonkey, it can also pair potential regulators with their putative cis-elements (DNA binding sites). We used the Inferelator to learn the global regulatory network of H. salinarum NRC-1.
Personnel: Vesteinn Thorsson, Richard Bonneau, David J Reiss, Leroy Hood, Nitin S Baliga
Reference: Bonneau R, Reiss DJ, Shannon P, Facciotti M, Hood L, Baliga NS, Thorsson V. The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol. 2006;7(5):R36