SDFoam: Signed-Distance Foam for explicit surface reconstruction

Reading time: 5 minute
...

📝 Original Info

  • Title: SDFoam: Signed-Distance Foam for explicit surface reconstruction
  • ArXiv ID: 2512.16706
  • Date: 2025-12-18
  • Authors: Antonella Rech, Nicola Conci, Nicola Garau

📝 Abstract

Neural radiance fields (NeRF) have driven impressive progress in view synthesis by using ray-traced volumetric rendering. Splatting-based methods such as 3D Gaussian Splatting (3DGS) provide faster rendering by rasterizing 3D primitives. RadiantFoam (RF) brought ray tracing back, achieving throughput comparable to Gaussian Splatting by organizing radiance with an explicit Voronoi Diagram (VD). Yet, all the mentioned methods still struggle with precise mesh reconstruction. We address this gap by jointly learning an explicit VD with an implicit Signed Distance Field (SDF). The scene is optimized via ray tracing and regularized by an Eikonal objective. The SDF introduces metric-consistent isosurfaces, which, in turn, bias near-surface Voronoi cell faces to align with the zero level set. The resulting model produces crisper, view-consistent surfaces with fewer floaters and improved topology, while preserving photometric quality and maintaining training speed on par with RadiantFoam. Across diverse scenes, our hybrid implicit-explicit formulation, which we name SDFoam, substantially improves mesh reconstruction accuracy (Chamfer distance) with comparable appearance (PSNR, SSIM), without sacrificing efficiency.

💡 Deep Analysis

📄 Full Content

Reconstructing 3D geometry and appearance from multiview images is a long-standing problem in computer vision and graphics. Classical approaches based on multi-view stereo or surface meshing struggle to recover accurate geometry for scenes with complex materials, thin structures, or indirect illumination. Neural implicit representations have recently emerged as a powerful alternative, achieving remarkable reconstruction quality by learning the scene composition directly from images.

In particular, radiance-based methods [1,16,20,22] represent a scene as a volumetric density field optimized for appearance, whereas SDF-based methods [28][29][30] employ signed distance functions to recover implicit surfaces with higher geometric accuracy. However, learning both visually and geometrically consistent scenes still remains challenging.

We propose SDFoam, a framework that unifies geometry reconstruction and appearance modeling within a single representation (Fig. 1). As in Radiant Foam [7], we represent the scene as a 3D Voronoi diagram whose dual Delau-nay triangulation induces a sparse volumetric mesh. In SD-Foam, however, each Voronoi cell is parameterized not only by its centroid and color, but also by a locally defined signed distance value, turning the Voronoi-Delaunay structure into a jointly implicit-explicit representation that is both differentiable and spatially coherent. The signed distance field is learned together with the Voronoi structure through differentiable rendering, allowing the same representation to drive both surface reconstruction and view synthesis.

A key contribution of SDFoam is its novel, fast mesh extraction strategy. Instead of relying on post-hoc surface reconstruction algorithms, our method extracts the surface directly from the trained Voronoi structure by leveraging the SDF. This allows a direct transition between implicit and explicit representations without altering the learned surface topology.

In summary, SDFoam: • Offers a good trade-off between visual appearance and mesh reconstruction, closing the gap between radiance fields and SDF-based methods; • Accelerates SDF-based reconstruction and rendering by exploiting a dual Voronoi-Delaunay structure, yielding faster training and inference than prior SDF approaches; • Is, to our knowledge, the first framework to couple a 3D Voronoi tessellation with a learned SDF, providing a jointly implicit-explicit scene representation; • Enables fast mesh extraction, up to 5× faster than a naive density-based thresholding on RadiantFoam [7].

View synthesis aims to generate novel views of a scene given a set of observed images. Early approaches relied on classical reconstruction methods such as Structure-from-Motion (SfM) [25], to obtain coarse reconstructions via sparse point clouds, and Multi-View Stereo (MVS), which can guide image re-projection for novel view synthesis, through a denser 3D reconstrution [6]. These methods suffer from inherent limitations: regions with missing or poorly observed data result in incomplete reconstructions, and over-reconstruction can occur in areas with uncertain geometry. Neural rendering approaches address these limitations by learning continuous and differentiable scene representations that enable inference of missing regions, recovery of fine geometric details, and synthesis of novel views directly from images. Neural Radiance Fields (NeRF) [21] marked a significant breakthrough in neural rendering and novel view synthesis. NeRF models a scene as a continuous volumetric field of density and radiance, parameterized by a neural network. Although NeRF achieves high-quality results, it is computationally intensive, and many subsequent methods have focused on accelerating both training and rendering. For instance, Instant-NGP [22] leverages a hash table-based data structure to improve efficiency, while Plenoxels [5] represents the scene as a sparse voxel grid of spherical harmonics, eliminating the need for a neural network. Other notable approaches, such as TensoRF [3], DVGO [26], FastNeRF [20], Mip-NeRF [1], and SparseNerF [9], also employ different strategies to accelerate computation while maintaining visual quality.

Despite achieving high visual quality, in large part they still suffer from dense sampling along rays. This motivated the development of point-based methods, which represent a scene as sparse collections of points or primitives, where each primitive encodes local geometry and appearance, reducing unnecessary sampling.

Building on the idea of point-based methods, Gaussian Splatting (3DGS) [15] models each scene element as an anisotropic 3D Gaussian with color and viewdependent properties. Beyond 3DGS, several alternative 3D and 2D primitive representations have been explored. 2DGS [12] uses 2D Gaussians, Triangle Splatting [11] represents scenes with explicit surface triangles, and methods such as DMTet [24], Deformable Beta Splatting [17], and Tetrahedron Splatting [8] leverage tetra

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut