SplatCo: Structure-View Collaborative Gaussian Splatting for Detail-Preserving Rendering of Large-Scale Unbounded Scenes
We present SplatCo, a structure-view collaborative Gaussian splatting framework for high-fidelity rendering of complex outdoor scenes. SplatCo builds upon three novel components: 1) a cross-structure collaboration module that combines global tri-plane representations, which capture coarse scene layouts, with local context grid features representing fine details. This fusion is achieved through a hierarchical compensation mechanism, ensuring both global spatial awareness and local detail preservation; 2) a cross-view pruning mechanism that removes overfitted or inaccurate Gaussians based on structural consistency, thereby improving storage efficiency and preventing rendering artifacts; 3) a structure view co-learning module that aggregates structural gradients with view gradients,thereby steering the optimization of Gaussian geometric and appearance attributes more robustly. By combining these key components, SplatCo effectively achieves high-fidelity rendering for large-scale scenes. Code and project page are available at https://splatco-tech.github.io.
💡 Research Summary
SplatCo introduces a novel framework for high‑fidelity, real‑time rendering of large‑scale, unbounded outdoor scenes using 3D Gaussian splatting. The authors identify three core shortcomings of existing approaches: (1) insufficient global structural coherence, (2) loss of fine‑grained local detail, and (3) redundancy and over‑fitting of Gaussians due to lack of cross‑view constraints. To address these, SplatCo proposes three tightly coupled modules.
The Cross‑Structure Collaboration Module (CSCM) fuses a global tri‑plane representation—three axis‑aligned 2‑D feature maps that capture coarse scene layout—with a local context‑grid that stores high‑resolution 3‑D features for fine detail. A hierarchical compensation mechanism operates in three stages (coarse, intermediate, fine), where tri‑plane features guide and correct the context‑grid, while the grid feeds back missing high‑frequency information to the tri‑plane. This bidirectional compensation yields dense embedded features that preserve both global spatial awareness and local geometry.
The Cross‑View Pruning Mechanism (CVPM) introduces a cross‑view consistency loss (L_CVC) that measures how consistently each Gaussian projects across multiple camera views. Gaussians whose projections deviate beyond a learned threshold are deemed over‑fitted or geometrically inaccurate and are either attenuated or removed. This adaptive pruning reduces the total number of Gaussians by roughly 30 % on average, cutting memory usage and eliminating rendering artifacts such as ghosting and texture flickering.
The Structure‑View Co‑learning (SVC) module combines structural gradients derived from the tri‑plane and context‑grid with view‑dependent gradients from photometric loss. By aggregating these two gradient streams, the optimization of Gaussian positions, scales, rotations, and appearance parameters becomes more robust, especially during early training when geometry is unstable.
Extensive experiments on 13 diverse large‑scale datasets—including public benchmarks Mill‑19, MatrixCity, Thanks & Temples, WHU, and several custom aerial captures—show that SplatCo consistently outperforms state‑of‑the‑art 3DGS variants (CityGS‑X, MVGS, Scaffold‑GS, etc.). Quantitatively, SplatCo achieves an average PSNR of 23.0 dB, SSIM of 0.759, and LPIPS of 0.270, improving PSNR by 0.8–2.5 dB and SSIM by 0.02–0.05 over the best baselines. Qualitative results demonstrate sharper textures on building facades, vegetation, and road surfaces, with far fewer blurry regions or Gaussian clustering near camera trajectories. Real‑time rendering (>30 FPS) is maintained on a single consumer‑grade GPU (e.g., RTX 3080).
Implementation details reveal that the hierarchical compensation is realized through three decoder levels that progressively upsample tri‑plane features and concatenate them with interpolated context‑grid vectors. The pruning step uses a per‑Gaussian confidence score derived from L_CVC and geometric consistency, applying a soft‑mask before hard deletion. SVC integrates the two gradient streams via a learnable weighting factor, allowing the network to balance structural guidance against view‑specific photometric error.
Limitations include residual artifacts at tri‑plane boundaries in extremely dense urban scenes and the current single‑GPU training pipeline, which may not scale to scenes requiring billions of Gaussians without distributed training. Future work could explore multi‑GPU parallelism, incorporation of external geometric priors (LiDAR, GIS), and adaptive tri‑plane orientations to further reduce boundary artifacts.
In summary, SplatCo presents a cohesive, memory‑efficient, and high‑quality solution for large‑scale Gaussian splatting, advancing the state of the art in both visual fidelity and computational practicality.
Comments & Academic Discussion
Loading comments...
Leave a Comment