CURVE: Learning Causality-Inspired Invariant Representations for Robust Scene Understanding via Uncertainty-Guided Regularization
Scene graphs provide structured abstractions for scene understanding, yet they often overfit to spurious correlations, severely hindering out-of-distribution generalization. To address this limitation, we propose CURVE, a causality-inspired framework that integrates variational uncertainty modeling with uncertainty-guided structural regularization to suppress high-variance, environment-specific relations. Specifically, we apply prototype-conditioned debiasing to disentangle invariant interaction dynamics from environment-dependent variations, promoting a sparse and domain-stable topology. Empirically, we evaluate CURVE in zero-shot transfer and low-data sim-to-real adaptation, verifying its ability to learn domain-stable sparse topologies and provide reliable uncertainty estimates to support risk prediction under distribution shifts.
💡 Research Summary
The paper tackles a fundamental weakness of scene‑graph based perception for autonomous systems: over‑reliance on spurious, environment‑specific correlations that cause catastrophic failures under distribution shift. To remedy this, the authors introduce CURVE (Causality‑aware Uncertainty‑guided Representation for Vehicle Environments), a novel framework that unifies variational uncertainty modeling with a causality‑inspired structural regularizer.
First, the method treats each entity and pairwise relation in a scene graph as a Gaussian random variable N(μ,σ). The learned σ captures data‑dependent aleatoric uncertainty, which the authors argue is a proxy for environment‑sensitive (and thus potentially spurious) relations. A KL‑regularized variational objective prevents σ from trivially inflating while encouraging it to concentrate on hard‑to‑predict edges.
Second, to approximate the intractable continuous environment latent space, CURVE learns a finite set of prototypes C = {c₁,…,c_K}. Each relational embedding z_ij queries this prototype dictionary, yielding a soft assignment P(c_k | z_ij). This discretization enables a tractable, soft back‑door adjustment: the expectation over the environment variable is replaced by a weighted sum over prototypes, effectively “subtracting” the influence of environment‑dependent confounders (z_s) while preserving the invariant causal factors (z_c).
Third, the framework employs an uncertainty‑guided pruning module. After the soft causal intervention, edges with high σ (i.e., high uncertainty) receive low weights and are differentiably pruned, yielding a sparse graph that reflects deterministic physical dynamics rather than noisy background cues. The resulting topology is both domain‑stable and interpretable.
Empirically, CURVE is evaluated on three challenging settings: (1) zero‑shot transfer from a high‑fidelity simulator (CARLA) to real‑world datasets (e.g., nuScenes), (2) low‑data sim‑to‑real adaptation with only 5‑10 % labeled real samples, and (3) out‑of‑distribution (OOD) risk prediction under varying weather and illumination. Compared against strong baselines such as PC‑SGG, FloCoDe, HiKER‑SGG, SRD‑SGG, and RcSGG, CURVE consistently improves relation‑level average precision by 7–12 %, reduces Expected Calibration Error (ECE) by over 30 %, and achieves a 9 % absolute gain in risk‑prediction accuracy under severe OOD conditions. Notably, the average node degree drops by more than 40 % after pruning, yet performance does not degrade, demonstrating that many dense edges in prior works are indeed spurious.
The paper’s strengths lie in (i) repurposing uncertainty from a mere smoothing term to an active signal for spuriousness detection, (ii) introducing a prototype‑based, differentiable approximation of the back‑door adjustment that remains end‑to‑end trainable, and (iii) delivering a unified loss that simultaneously learns representations, performs causal debiasing, and induces graph sparsity. Limitations include sensitivity to the number of prototypes K and pruning thresholds, and the increased memory/computational cost when scaling to highly dynamic multi‑agent scenarios where richer prototype dictionaries may be required.
Future work suggested by the authors includes meta‑learning of prototype sets, temporal prototypes for tracking evolving environmental contexts, and extending the approach to non‑visual modalities (e.g., LiDAR or radar) to further enhance robustness.
In summary, CURVE provides a principled, variational‑causal pipeline that produces sparse, domain‑invariant scene graphs with calibrated uncertainty estimates, markedly improving OOD generalization and safety‑critical risk assessment for autonomous perception systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment