$R^2$-Mesh: Reinforcement Learning Powered Mesh Reconstruction via Geometry and Appearance Refinement
Mesh reconstruction from Neural Radiance Fields (NeRF) is widely used in 3D reconstruction and has been applied across numerous domains. However, existing methods typically rely solely on the given training set images, which restricts supervision to limited observations and makes it difficult to fully constrain geometry and appearance. Moreover, the contribution of each viewpoint for training is not uniform and changes dynamically during the optimization process, which can result in suboptimal guidance for both geometric refinement and rendering quality. To address these limitations, we propose $R^2$-Mesh, a reinforcement learning framework that combines NeRF-rendered pseudo-supervision with online viewpoint selection. Our key insight is to exploit NeRF’s rendering ability to synthesize additional high-quality images, enriching training with diverse viewpoint information. To ensure that supervision focuses on the most beneficial perspectives, we introduce a UCB-based strategy with a geometry-aware reward, which dynamically balances exploration and exploitation to identify informative viewpoints throughout training. Within this framework, we jointly optimize SDF geometry and view-dependent appearance under differentiable rendering, while periodically refining meshes to capture fine geometric details. Experiments demonstrate that our method achieves competitive results in both geometric accuracy and rendering quality.
💡 Research Summary
R²‑Mesh introduces a reinforcement‑learning‑driven framework for high‑fidelity mesh reconstruction that leverages Neural Radiance Fields (NeRF) as both a source of pseudo‑supervision and a generator of informative viewpoints. Traditional NeRF‑to‑mesh pipelines rely exclusively on a fixed set of captured images, which limits geometric constraints and fails to adapt to the changing importance of each view during optimization. R²‑Mesh addresses these shortcomings with two key innovations. First, after an initial NeRF training phase (Stage 1) using Instant‑NGP, the learned density grid is converted into a coarse Signed Distance Function (SDF) grid, providing an initial surface representation. Second, during Stage 2 the method augments the training set with NeRF‑rendered images from a pool of candidate viewpoints uniformly sampled on a virtual sphere around the scene.
To select the most beneficial viewpoints, the authors adopt an Upper Confidence Bound (UCB) algorithm, a lightweight bandit‑style reinforcement learning strategy that balances exploration and exploitation without requiring additional neural networks. For each candidate view a, a reward rₐ is defined as a weighted sum of a color component (MSE + LPIPS between the mesh rendering and the NeRF rendering) and a geometry component (MSE between binary foreground masks derived from depth). The UCB value ˆrₐ(t) + c·√(ln t / Nₐ(t)) is computed at every training iteration; the top‑k views with the highest UCB scores are treated as pseudo‑ground‑truth for that iteration, while m real images are sampled from the original dataset.
Mesh refinement proceeds by jointly optimizing the SDF and view‑dependent appearance fields. The SDF is represented with FlexiCubes, which attach learnable deformation vectors and weight parameters to each grid vertex, enabling continuous topology‑aware updates. Differentiable rendering (via nv‑diff) supplies gradients for both geometry and appearance. The total loss combines a Charbonnier color loss, total variation regularization on the SDF, and a FlexiCubes deformation regularizer. After convergence, UV unwrapping is performed with xatlas and the final mesh is exported.
Extensive experiments on the NeRF‑synthetic benchmark and the real‑world DTU dataset demonstrate that R²‑Mesh outperforms prior SDF‑based methods such as NV‑diff, NeRF2Mesh, and others in Chamfer‑L1, F‑score, PSNR, SSIM, and LPIPS. Qualitative results show finer surface details, smoother geometry, and more faithful novel‑view renderings, especially in scenes with complex occlusions and thin structures where the adaptive viewpoint selection proves most beneficial.
The paper acknowledges limitations: the candidate viewpoints are uniformly sampled on a sphere, which may miss optimal poses for highly anisotropic scenes; the UCB hyper‑parameters (exploration constant c and reward weighting α) require dataset‑specific tuning; and the initial NeRF training adds computational overhead. Future work could incorporate scene‑aware viewpoint generation, meta‑learning to auto‑tune bandit parameters, and lightweight NeRF variants to improve efficiency. Overall, R²‑Mesh presents a compelling integration of reinforcement learning and neural rendering to push the state of the art in mesh reconstruction.
Comments & Academic Discussion
Loading comments...
Leave a Comment