Using Gaussian Splats to Create High-Fidelity Facial Geometry and Texture

Reading time: 5 minute
...

📝 Original Info

  • Title: Using Gaussian Splats to Create High-Fidelity Facial Geometry and Texture
  • ArXiv ID: 2512.16397
  • Date: 2025-12-18
  • Authors: Haodi He, Jihun Yu, Ronald Fedkiw

📝 Abstract

We leverage increasingly popular three-dimensional neural representations in order to construct a unified and consistent explanation of a collection of uncalibrated images of the human face. Our approach utilizes Gaussian Splatting, since it is more explicit and thus more amenable to constraints than NeRFs. We leverage segmentation annotations to align the semantic regions of the face, facilitating the reconstruction of a neutral pose from only 11 images (as opposed to requiring a long video). We soft constrain the Gaussians to an underlying triangulated surface in order to provide a more structured Gaussian Splat reconstruction, which in turn informs subsequent perturbations to increase the accuracy of the underlying triangulated surface. The resulting triangulated surface can then be used in a standard graphics pipeline. In addition, and perhaps most impactful, we show how accurate geometry enables the Gaussian Splats to be transformed into texture space where they can be treated as a view-dependent neural texture. This allows one to use high visual fidelity Gaussian Splatting on any asset in a scene without the need to modify any other asset or any other aspect (geometry, lighting, renderer, etc.) of the graphics pipeline. We utilize a relightable Gaussian model to disentangle texture from lighting in order to obtain a delit high-resolution albedo texture that is also readily usable in a standard graphics pipeline. The flexibility of our system allows for training with disparate images, even with incompatible lighting, facilitating robust regularization. Finally, we demonstrate the efficacy of our approach by illustrating its use in a text-driven asset creation pipeline.

💡 Deep Analysis

Figure 1

📄 Full Content

Facial avatars are essential for a wide range of applications including virtual reality, video conferencing, gaming, feature films, etc. As virtual interactions become more prevalent, the demand for compelling digital representations of human faces will continue to grow. A person's face avatar should accurately reflect their identity, while also being controllable, relightable, and efficient enough to use in real-time applica-tions. These requirements have driven significant research endeavors in computer vision, computer graphics, and machine learning. However, it is still challenging to create such avatars in a scalable and democratized way, i.e. with commodity hardware and limited input data and without using multiple calibrated cameras or a light-stage.

Neural Radiance Fields (NeRFs) [74] have become increasingly prevalent in both computer vision and computer graphics due to their impressive ability to reconstruct and render 3D scenes from 2D image collections. In particular, various authors have achieved impressive photorealistic results on 3D human faces. The implicit nature of the NeRF representation facilitates high-quality editing (see e.g. [21]), since it keeps edits both non-local and smooth (similar to splines, but dissimilar to triangulated surfaces). In addition, the regularization provided by the low dimensional latent space keeps edits of faces looking like faces. It is far more difficult for non-expert users to edit triangulated surface geometry and textures directly.

Although only recently proposed, Gaussian Splatting [53] has quickly become remarkably popular. Its explicit nature makes it significantly more amenable to various constraints than other (typically implicit) neural models. Importantly, NeRFs and Gaussian Splatting complement each other, as one can edit a NeRF representation before converting it into a Gaussian Splatting model for various downstream tasks that would benefit from having a more structured and constrained representation.

The standard graphics pipeline has been refined over decades via both software and hardware optimizations and has become quite mature, especially for real-time applications. In fact, in spite of the maturity and impact of ray tracing, it has only recently been somewhat incorporated into gaming consoles and other real-time applications. Silicon chip development is often a zero-sum game where adding one capability necessarily removes another. However, ray tracing has been embraced in non-realtime applications and is often used to create content for real-time applications. It is not a stretch to assume that neural rendering methods will be treated similarly. Thus, converting neural models, such as Gaussian Splatting, into triangulated surfaces with tex-

Segmentation Maps

Figure 1. Using a small number of self-captured uncalibrated multi-view images, we use segmentation annotations along with size and shape constraints to force the Gaussians to move instead of deform. In addition, soft constraints are used to keep the Gaussians tightly coupled and close to the triangulated surface. After training, in a post process, the triangulated surface is deformed to better approximate the Gaussian reconstruction.

tures disentangled from lighting necessarily increases their immediate impact on real-time computer graphics applications. In this work, we address this by introducing a pipeline that transforms a Gaussian Splatting model trained with selfcaptured uncalibrated multi-view images into a triangulated surface with de-lit textures.

Our method offers significant advantages over traditional mesh-based geometry reconstruction, since it does not rely on the ability of advanced shading models to overcome the domain gap between synthetic and real images. Instead, neural rendering is used to close the domain gap, while constraints are used to tightly connect the neural rendering degrees of freedom to an explicit triangulated surface. The Gaussian Splatting model is modified to more tightly couple it to the triangulated surface in two ways: segmentation annotations along with size and shape constraints are used to force the Gaussians to move (instead of deform) in order to explain the data, and soft constraints are used to keep the Gaussians tightly coupled and close to the triangulated surface. After training, in a post process, the triangulated surface is deformed to better approximate the Gaussian reconstruction. See Fig. 1 and Fig. 2.

In order to obtain a de-lit texture with albedo disentangled from normal and lighting, without the need of a light-stage, we regularize this underconstrained problem by utilizing a PCA representation of a mesh-based facial texture derived from the Metahuman dataset [33]. We optimize the PCA coefficients to reconstruct albedo as faithfully as possible, while minimizing reliance on a Relightable Gaussian Splatting model that is used to capture residual differences between a target image and a rendering of the textured mesh.

📸 Image Gallery

00.jpg 00.png 00_cora_albedo.jpg 00_cora_gray.jpg 00_cora_texture.jpg 00_gray.png 00_gt.jpg 00_gt.png 00_nf_gray.jpg 00_nf_texture.jpg 00_nha_gray.jpg 00_nha_gray_pf.png 00_nha_gray_pf_neutral.png 00_nha_pf.png 00_nha_pf_neutral.png 00_nha_texture.jpg 00_ours_albedo.jpg 00_ours_gray.jpg 00_ours_texture.jpg 01.jpg 01.png 01_cora_albedo.jpg 01_cora_gray.jpg 01_cora_texture.jpg 01_gray.png 01_gt.jpg 01_gt.png 01_nf_gray.jpg 01_nf_texture.jpg 01_nha_gray.jpg 01_nha_texture.jpg 01_ours_albedo.jpg 01_ours_gray.jpg 01_ours_texture.jpg 02.jpg 02.png 02_cora_albedo.jpg 02_cora_gray.jpg 02_cora_texture.jpg 02_gray.png 02_gt.jpg 02_gt.png 02_nf_gray.jpg 02_nf_texture.jpg 02_nha_gray.jpg 02_nha_texture.jpg 02_ours_albedo.jpg 02_ours_gray.jpg 02_ours_texture.jpg 03.jpg 04.jpg 05.jpg 06.jpg 07.jpg 08.jpg 09.jpg 10.jpg 1_ours_switchlight.jpg 1_switchlight.jpg 2_geometry.jpg 2_geometry.png 2_geometry_no_reg.jpg 2_geometry_no_reg.png 2_geometry_no_reg_render.jpg 2_geometry_no_seg.jpg 2_geometry_no_seg.png 2_geometry_no_seg_render.jpg 2_geometry_render.jpg 2d_neural.jpg 3d_neural.jpg 3d_neural_before.jpg chatgpt_new.jpg cora_1.jpg cora_2.jpg cora_3.jpg delit.jpg gt.jpg gt.png image.png image_new_0.jpg image_new_1.jpg lit.jpg mh.png mh_1.jpg mh_2.jpg mh_3.jpg mh_new_0.png mh_new_1.png no_reg.jpg no_reg_no_eyes.jpg normal.jpg ours.jpg ours_1.jpg ours_2.jpg ours_3.jpg ours_delit.jpg ours_no_eyes.jpg right_gt.jpg right_merge.jpg right_ours.jpg seg_gauss_new.jpg seg_map_gauss_new.jpg seg_map_new.jpg seg_new.jpg sphere_1.jpg sphere_2.jpg sphere_3.jpg tex_ours_switchlight.jpg tex_switchlight.jpg wo_occlusion_delit.jpg

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut