Branch lengths for geodesics in the directed landscape and mutation patterns in growing spatially structured populations
Consider a population that is expanding in two-dimensional space. Suppose we collect data from a sample of individuals taken at random either from the entire population, or from near the outer boundary of the population. A quantity of interest in population genetics is the site frequency spectrum, which is the number of mutations that appear on $k$ of the $n$ sampled individuals, for $k = 1, \dots, n-1$. As long as the mutation rate is constant, this number will be roughly proportional to the total length of all branches in the genealogical tree that are on the ancestral line of $k$ sampled individuals. While the rigorous literature has primarily focused on models without any spatial structure, in many natural settings, such as tumors or bacteria colonies, growth is dictated by spatial constraints. Many such two dimensional growth models are expected to fall in the KPZ universality class. In this article we adopt the perspective that for population models in the KPZ universality class, the genealogical tree can be approximated by the tree formed by the infinite upward geodesics in the directed landscape, a universal scaling limit constructed in \cite{dov22}, starting from $n$ randomly chosen points. Relying on geodesic coalescence, we prove new asymptotic results for the lengths of the portions of these geodesics that are ancestral to $k$ of the $n$ sampled points and consequently obtain exponents driving the site frequency spectrum as predicted in \cite{fgkah16}. An important ingredient in the proof is a new tight estimate of the probability that three infinite upward geodesics stay disjoint up to time $t$, i.e., a sharp quantitative version of the well studied N3G problem, which is of independent interest.
💡 Research Summary
**
The paper investigates the genealogical structure of two‑dimensional spatially expanding populations that belong to the Kardar‑Parisi‑Zhang (KPZ) universality class. The authors adopt a novel viewpoint: the genealogy of a sample of n individuals can be approximated by the tree formed by the infinite upward geodesics in the directed landscape (DL), a universal scaling limit recently constructed by Dauvergne, Ortmann and Virág. In this framework, each sampled individual corresponds to a starting point in the DL, and its ancestral line is the unique infinite geodesic that emanates from that point and proceeds forward in time. The coalescence of these geodesics encodes the merging of ancestral lineages.
The central object of interest is the site‑frequency spectrum (SFS), i.e. the vector ((M_{1,n},\dots,M_{n-1,n})) where (M_{k,n}) counts mutations that appear in exactly k of the n sampled genomes. Assuming a constant mutation rate (\theta), the number of mutations on a branch is Poisson with mean (\theta) times the branch length. Consequently, understanding the SFS reduces to analyzing the total length (L_{k,n}) of all branches that support exactly k leaves. The authors consider two sampling regimes: (i) points drawn uniformly from the entire occupied region, and (ii) points drawn from the outer frontier (the “edge”) of the region.
A major technical contribution is a sharp quantitative estimate for the three‑arm (N3G) problem in the DL: the probability that three independent infinite geodesics remain disjoint up to time t decays as (c,t^{-2/3}+O(t^{-1})). The proof combines properties of the Airy line ensemble, the metric composition of the DL, and a detailed analysis of “geodesic watermelons” (triples of geodesics that stay apart). This estimate yields precise control over the coalescence depth of the geodesic tree, showing that the typical time at which k lineages first share a common ancestor scales as (t_k\sim k^{2/5}) for bulk sampling and as (t_k\sim k^{1/2}) for edge sampling.
Using the N3G bound, the authors prove two main theorems. For bulk sampling they show \
Comments & Academic Discussion
Loading comments...
Leave a Comment