Covariate Selection for Joint Latent Space Modeling of Sparse Network Data

Covariate Selection for Joint Latent Space Modeling of Sparse Network Data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Network data are increasingly common in the social sciences and infectious disease epidemiology. Analyses often link network structure to node-level covariates, but existing methods falter with sparse networks and high-dimensional node features. We propose a joint latent space modeling framework for sparse networks with high-dimensional binary node covariates that performs covariate selection while accounting for uncertainty in estimated latent positions. Building on joint latent space models that couple edges and node variables through shared latent positions, we introduce a group lasso screening step and incorporate a measurement-error-aware stabilization term to mitigate bias from using estimated latent positions as predictors. We establish prediction error rates for the covariate component both when latent positions are treated as observed and when they are estimated with bounded error; under uniform control across $q$ covariates and $n$ nodes, the rate is of order $O(\log q / n)$ up to an additional term due to latent position estimation error. Our method addresses three challenges: (1) incorporating information from isolated nodes, which are common in sparse networks but often ignored; (2) selecting relevant covariates from high-dimensional spaces; and (3) accounting for uncertainty in estimated latent positions. Simulations show predictive performance remains stable as covariate sparsity grows, while naive approaches degrade. We illustrate how the method can support efficient study design using household social networks from 75 Indian villages, where an emulated pilot study screens a large covariate battery and substantially reduces required subsequent data collection without sacrificing network predictive accuracy.


💡 Research Summary

This paper tackles a common yet challenging scenario in modern social‑science and infectious‑disease research: a highly sparse network together with a high‑dimensional set of binary node‑level covariates. Traditional latent‑space models (LSM) represent each node by a low‑dimensional latent position (Z_i) and model edge probabilities as a function of the inner product (or distance) between two positions. While powerful for capturing homophily, transitivity, and community structure, standard LSMs are ill‑suited for (i) selecting a small subset of truly relevant covariates from a large pool, and (ii) dealing with isolated nodes that contribute little edge information in a sparse graph.

The authors extend the joint latent‑space framework of Zhang et al. (2022) by coupling both the adjacency matrix (A) and the binary covariate matrix (Y) to the same latent positions. Specifically, edges follow
\


Comments & Academic Discussion

Loading comments...

Leave a Comment