Sensitivity of inferences in forensic genetics to assumptions about founding genes

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Many forensic genetics problems can be handled using structured systems of discrete variables, for which Bayesian networks offer an appealing practical modeling framework, and allow inferences to be computed by probability propagation methods. However, when standard assumptions are violated–for example, when allele frequencies are unknown, there is identity by descent or the population is heterogeneous–dependence is generated among founding genes, that makes exact calculation of conditional probabilities by propagation methods less straightforward. Here we illustrate different methodologies for assessing sensitivity to assumptions about founders in forensic genetics problems. These include constrained steepest descent, linear fractional programming and representing dependence by structure. We illustrate these methods on several forensic genetics examples involving criminal identification, simple and complex disputed paternity and DNA mixtures.

💡 Research Summary

The paper addresses a fundamental problem in forensic genetics: how sensitive are the inferences drawn from Bayesian network (BN) models to the assumptions made about the “founding genes” – the genes of individuals that are not directly observed but are inherited by the subjects of interest. Under the standard forensic assumptions (a homogeneous population, Hardy–Weinberg equilibrium, known allele frequencies, and no identity by descent), the founding genes are independent and the likelihood ratio (LR) for a hypothesis can be computed efficiently by standard probability‑propagation algorithms on a BN. However, real cases often violate these assumptions: allele frequencies may be uncertain, individuals may be related (identity by descent), or the population may be heterogeneous (sub‑populations). These violations introduce statistical dependence among the founding genes, which complicates exact BN inference.

The authors propose three complementary methodologies for assessing the sensitivity of the log‑LR, denoted (h(f)=\log\frac{P(T=1|E)}{P(T=0|E)}), to changes in the joint distribution (f) of the founding genes:

Constrained Steepest Descent (CSD).
By differentiating (h(f)) with respect to (f), the gradient (\nabla h(f)) is obtained. The direction of maximal change under linear equality constraints (probability simplex, possible symmetry constraints) is the projection of this gradient onto the feasible set. CSD provides a first‑order bound (|h(f)-h(f_0)|\le |\nabla h(f_0)||f-f_0|) and is useful for exploring infinitesimal departures from the baseline model.
Linear Fractional Programming (LFP).
Since (h(f)=\log\frac{p_1^\top f}{p_0^\top f}) is a ratio of linear forms, the problem of finding the worst‑case and best‑case LR under linear constraints on (f) can be cast as a linear fractional program. By a standard transformation (e.g., Charnes‑Cooper), the problem becomes a linear program that can be solved with existing solvers. This approach yields exact extremal LR values for a whole class of admissible (f) (e.g., those arising from a Dirichlet prior on allele frequencies).
Structural Representation of Dependence.
The authors show how to embed dependence directly into the BN by adding extra parent–child arcs among founding gene nodes or by introducing latent variables (e.g., a sub‑population indicator (R) or identity‑by‑descent pattern variables (\pi)). This yields an enlarged DAG that still permits exact inference via junction‑tree algorithms, albeit at higher computational cost. To mitigate the explosion in state space when many markers are used, they propose a “within‑marker” strategy: run separate BNs for each marker conditioned on the latent variable, then combine the results by averaging over the latent distribution.

The paper illustrates these methods on several forensic scenarios:

Criminal identification – evaluating the LR that a suspect left DNA at a crime scene. Sensitivity to uncertain allele frequencies (modeled with a Beta prior) is modest (≈0.1–0.3 log units), whereas incorporating a first‑degree relative relationship can increase the LR by a factor of 5 or more.
Simple and complex disputed paternity – with one or multiple markers and possibly multiple alleged fathers. Sub‑population structure (e.g., two ethnic groups with different allele spectra) can shift the LR dramatically; ignoring it leads to inflated false‑positive rates, especially in the complex case where multiple children share markers.
DNA mixtures – where two or more contributors are present. The authors model each contributor’s genotype as a set of founding genes and use latent mixture proportion variables. The structural approach captures the dependence induced by a common contributor across markers, while LFP provides bounds on the LR under different allele‑frequency priors.

Empirical results show that the most influential violations are identity‑by‑descendant and population heterogeneity; uncertainty in allele frequencies alone has a relatively minor effect. The three methods complement each other: CSD offers rapid, local sensitivity diagnostics; LFP delivers rigorous worst‑case bounds; structural BNs give exact results when computational resources permit.

In conclusion, the paper delivers a systematic framework for quantifying how forensic genetic inferences depend on the assumptions about unobserved founding genes. By combining analytic (CSD), optimization (LFP), and graphical (structural BN) techniques, practitioners can select the most appropriate tool for their case complexity, thereby providing courts with a clearer picture of the robustness of DNA evidence. Future work is suggested on extending the framework to single‑nucleotide‑polymorphism (SNP) panels and deeper pedigree structures.

Sensitivity of inferences in forensic genetics to assumptions about founding genes

💡 Research Summary

Comments & Academic Discussion

Leave a Comment