Geometric Model Selection for Latent Space Network Models: Hypothesis Testing via Multidimensional Scaling and Resampling Techniques
Latent space models assume that network ties are more likely between nodes that are closer together in an underlying latent space. Euclidean space is a popular choice for the underlying geometry, but hyperbolic geometry can mimic more realistic patterns of ties in complex networks. To identify the underlying geometry, past research has applied non-Euclidean extensions of multidimensional scaling (MDS) to the observed geodesic distances: the shortest path lengths between nodes. The difference in stress, a standard goodness-of-fit metric for MDS, across the geometries is then used to select a latent geometry with superior model fit (lower stress). The effectiveness of this method is assessed through simulations of latent space networks in Euclidean and hyperbolic geometries. To better account for uncertainty, we extend permutation-based hypothesis tests for MDS to the latent network setting. However, these tests do not incorporate any network structure. We propose a parametric bootstrap distribution of networks, conditioned on observed geodesic distances and the Gaussian Latent Position Model (GLPM). Our method extends the Davidson-MacKinnon J-test to latent space network models with differing latent geometries. We pay particular attention to large and sparse networks, and both the permutation test and the bootstrapping methods show an improvement in detecting the underlying geometry.
💡 Research Summary
The paper tackles a fundamental problem in latent space network modeling: determining whether the unobserved geometry that governs edge formation is Euclidean or hyperbolic. The authors build on the observation that many latent‑space models assume the probability of an edge between two nodes declines with their latent distance, and that hyperbolic space, because of its exponential volume growth, can reproduce hallmark features of complex networks such as heavy‑tailed degree distributions, high clustering, and short average path lengths.
A common practice, following Papamic halis et al. (2022), is to compute the shortest‑path distance matrix of an observed undirected graph, feed it into classical multidimensional scaling (MDS) under two candidate manifolds (Euclidean and hyperbolic), and compare the resulting stress values. The geometry with lower stress is declared the better fit. However, stress is a point estimate and provides no measure of sampling variability; a single observed stress difference may be due to random fluctuations, especially in large, sparse graphs.
To address this, the authors propose two complementary inferential procedures.
-
Permutation Test Adapted to Networks – Traditional MDS permutation tests randomize the dissimilarity matrix, which would destroy metric properties such as the triangle inequality. Instead, the authors permute the adjacency matrix while preserving edge density and connectivity (only the upper‑triangle entries are shuffled). For each permuted graph they recompute the shortest‑path distances, run Euclidean and hyperbolic MDS, and record the stress difference. Repeating this many times yields an empirical null distribution of stress differences under the hypothesis that the observed distances are exchangeable. The observed stress difference is then compared to this distribution to obtain a p‑value. This test is conservative because it ignores any latent‑space structure, but it respects the combinatorial constraints of a graph.
-
Parametric Bootstrap Using the Gaussian Latent Position Model (GLPM) – The authors assume the data were generated by a GLPM: each node i has a latent coordinate (z_i \sim \mathcal{N}(0,\gamma I_d)) (with (d=2) in the experiments), and edges are drawn independently with probability (\tau \exp{-|z_i - z_j|^2/(2\phi)}). Under this model, the conditional probability that the observed shortest‑path distance (\delta_{ij}) equals k given a latent distance (d_{ij}) can be expressed analytically using results from Fronczak et al. (2004) and Rastelli et al. (2016). The authors derive (\ell_k(z_i,z_j)=P(\delta_{ij}=k \mid d_{ij})) and, via Bayes’ rule, obtain the posterior distribution (P(d_{ij}\mid \delta_{ij}=k) \propto \ell_k(d_{ij}) P(d_{ij})). Because the marginal distribution of latent distances follows a chi distribution (for two‑dimensional Gaussian coordinates), this posterior can be sampled directly.
The bootstrap procedure therefore proceeds as follows: (a) for each observed pair (i,j) with shortest‑path length k, draw a latent distance from the posterior; (b) construct a synthetic distance matrix; (c) convert distances back to edge probabilities using the GLPM link function; (d) generate a binary adjacency matrix by Bernoulli trials; (e) apply Euclidean and hyperbolic MDS to the synthetic graph and compute the stress difference. Repeating steps (a)–(e) many times yields a bootstrap distribution that respects the latent‑space geometry implied by the data.
The authors evaluate both methods through extensive simulations. Networks are generated under both Euclidean and hyperbolic GLPMs with varying sparsity (average degree 0.1–5) and sizes ranging from 1,000 to 20,000 nodes. They find that the permutation test is indeed conservative: it rarely rejects the null, especially when the graph is very sparse. In contrast, the GLPM‑based bootstrap attains high power (often >80 %) for correctly identifying the true geometry when the average degree is moderate (2–3) and the network is large. Moreover, the bootstrap’s estimated parameters (\tau) and (\phi) closely match the true simulation values, indicating that the procedure can simultaneously perform geometry selection and latent‑parameter inference.
Real‑world applications include several publicly available networks (e.g., a Facebook friendship network, a protein‑protein interaction network, and a citation network). In these cases, the naive stress‑difference rule sometimes selects Euclidean geometry, whereas both the permutation and bootstrap tests favor hyperbolic geometry, consistent with the known hierarchical and tree‑like organization of these systems.
Overall, the paper makes three methodological contributions:
- It formalizes a hypothesis‑testing framework for MDS‑based geometry selection, moving beyond ad‑hoc stress comparison.
- It designs a network‑aware permutation scheme that preserves edge density and connectivity while randomizing structure.
- It introduces a parametric bootstrap grounded in the Gaussian latent position model, leveraging an analytically derived conditional distribution of latent distances given observed shortest‑path lengths.
These advances provide a statistically principled way to assess uncertainty in latent‑space geometry selection, especially for large, sparse graphs where traditional methods falter. The work opens avenues for more robust model comparison in network science, and the bootstrap machinery could be extended to other latent‑space families (e.g., spherical or mixed‑curvature spaces) or to incorporate covariates and directed edges.
Comments & Academic Discussion
Loading comments...
Leave a Comment