Inferring dynamic genetic networks with low order independencies
In this paper, we propose a novel inference method for dynamic genetic networks which makes it possible to face with a number of time measurements n much smaller than the number of genes p. The approach is based on the concept of low order conditional dependence graph that we extend here in the case of Dynamic Bayesian Networks. Most of our results are based on the theory of graphical models associated with the Directed Acyclic Graphs (DAGs). In this way, we define a minimal DAG G which describes exactly the full order conditional dependencies given the past of the process. Then, to face with the large p and small n estimation case, we propose to approximate DAG G by considering low order conditional independencies. We introduce partial qth order conditional dependence DAGs G(q) and analyze their probabilistic properties. In general, DAGs G(q) differ from DAG G but still reflect relevant dependence facts for sparse networks such as genetic networks. By using this approximation, we set out a non-bayesian inference method and demonstrate the effectiveness of this approach on both simulated and real data analysis. The inference procedure is implemented in the R package ‘G1DBN’ freely available from the CRAN archive.
💡 Research Summary
The paper addresses the challenging “small‑n, large‑p” problem that arises when trying to infer gene‑regulatory networks from short time‑course microarray experiments. Traditional static models such as Gaussian graphical models (GGMs) or correlation networks fail to capture directionality and cyclic motifs that are common in biological systems, while full‑order Dynamic Bayesian Networks (DBNs) become intractable when the number of genes (p) far exceeds the number of time points (n).
The authors first formalize a stochastic process X = {X_it} for p genes observed at n discrete time points and prove, using the theory of graphical models (parents, moral graph, ancestral sets, and the directed global Markov property), that there exists a minimal DAG ˜G that exactly encodes all full‑order conditional dependencies of X given its entire past. This DAG is the “true” DBN structure but is impossible to estimate directly in the small‑n regime.
To overcome this, the paper introduces a family of low‑order conditional dependence graphs G(q), where q denotes the maximal number of conditioning variables used to test independence between any pair of variables. When q = 0 the graph reduces to simple pairwise correlations; q = 1 uses first‑order partial correlations, and higher q values incorporate more conditioning information while still keeping the dimensionality low. The authors derive probabilistic properties of G(q), establishing inclusion relationships with ˜G under a faithfulness assumption (i.e., the probability distribution faithfully reflects the graph structure). In particular, for sparse biological networks, low‑order graphs often contain most of the true edges, and when the true concentration graph is a forest, G(0‑1) coincides exactly with it.
Based on these theoretical results, the authors propose a non‑Bayesian inference procedure. For each ordered pair (i, j) they compute the q‑order partial correlation between X_j,t‑1 and X_i,t, test its significance (using asymptotic normal approximations or bootstrap), and retain the edge if the null hypothesis of conditional independence is rejected. Multiple‑testing correction (e.g., Benjamini–Hochberg) controls the false discovery rate across the p(p‑1) possible directed edges. The resulting directed graph is an estimate of the DBN structure; edge signs are inferred from the sign of the estimated regression coefficient, providing information about activation versus repression.
The method is implemented in the R package G1DBN, which automates data preprocessing, selection of q, significance testing, and graph construction. The authors evaluate the approach on both simulated data and two real biological datasets. In simulations, varying network sparsity, sample size, and q demonstrates that q = 1 or 2 yields the best trade‑off between sensitivity and specificity, especially when the true network is sparse.
For real data, the first case study uses the classic yeast (Saccharomyces cerevisiae) cell‑cycle expression dataset. The inferred network recovers known transcriptional regulators (e.g., MBF, SBF) and correctly identifies feed‑forward loops while producing fewer spurious edges than standard GGM or full‑order DBN methods. The second case study analyzes a diurnal expression series from Arabidopsis thaliana. The method uncovers a coherent cyclic structure linking photosynthesis‑related genes with circadian regulators, again matching prior biological knowledge and suggesting novel regulatory hypotheses.
In summary, the paper contributes:
- A rigorous definition of a minimal DBN DAG ˜G that captures full‑order temporal dependencies.
- The concept of q‑order conditional dependence DAGs G(q) as a principled low‑dimensional approximation.
- Theoretical inclusion results linking G(q) and ˜G under faithfulness.
- A practical, computationally efficient, non‑Bayesian edge‑selection algorithm that scales to thousands of genes with only a handful of time points.
- An open‑source R implementation and empirical validation on both synthetic and real gene‑expression time‑course data.
The work demonstrates that exploiting low‑order conditional independencies can dramatically reduce the effective dimensionality of dynamic network inference, making it feasible to reconstruct biologically meaningful gene‑regulatory networks from the limited time‑course data that are typical in modern genomics. Future extensions may include non‑linear dynamics, time‑varying network structures, and integration of external covariates such as environmental stimuli.
Comments & Academic Discussion
Loading comments...
Leave a Comment