UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement
Reading time: 5 minute
...
📝 Original Info
Title: UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement
ArXiv ID: 2512.21185
Date: 2025-12-24
Authors: Tanghui Jia, Dongyu Yan, Dehao Hao, Yang Li, Kaiyi Zhang, Xianyi He, Lanjiong Li, Yuhan Wang, Jinnan Chen, Lutao Jiang, Qishen Yin, Long Quan, Ying-Cong Chen, Li Yuan
📝 Abstract
In this report, we introduce UltraShape 1.0, a scalable 3D diffusion framework for high-fidelity 3D geometry generation. The proposed approach adopts a two-stage generation pipeline: a coarse global structure is first synthesized and then refined to produce detailed, high-quality geometry. To support reliable 3D generation, we develop a comprehensive data processing pipeline that includes a novel watertight processing method and high-quality data filtering. This pipeline improves the geometric quality of publicly available 3D datasets by removing low-quality samples, filling holes, and thickening thin structures, while preserving fine-grained geometric details. To enable fine-grained geometry refinement, we decouple spatial localization from geometric detail synthesis in the diffusion process. We achieve this by performing voxel-based refinement at fixed spatial locations, where voxel queries derived from coarse geometry provide explicit positional anchors encoded via RoPE, allowing the diffusion model to focus on synthesizing local geometric details within a reduced, structured solution space. Our model is trained exclusively on publicly available 3D datasets, achieving strong geometric quality despite limited training resources. Extensive evaluations demonstrate that UltraShape 1.0 performs competitively with existing open-source methods in both data processing quality and geometry generation. All code and trained models will be released to support future research.
💡 Deep Analysis
📄 Full Content
Technical Report
UltraShape 1.0: High-Fidelity 3D Shape Generation via
Scalable Geometric Refinement
Tanghui Jia*1, Dongyu Yan*2, Dehao Hao*3, Yang Li2, Kaiyi Zhang3, Xianyi He1, Lanjiong Li2
Yuhan Wang5, Jinnan Chen4, Lutao Jiang2, Qishen Yin1, Long Quan3, Ying-Cong Chen2, Li Yuan1
1Shenzhen Graduate School, Peking University
2The Hong Kong University of Science and Technology (Guangzhou)
3The Hong Kong University of Science and Technology
4National University of Singapore
5S-Lab, Nanyang Technological University
∗Equal contribution
Figure 1 High-quality 3D assets generated by UltraShape 1.0. Best viewed with zoom-in.
Abstract
In this report, we introduce UltraShape 1.0, a scalable 3D diffusion framework for high-fidelity 3D
geometry generation. The proposed approach adopts a two-stage generation pipeline: a coarse
global structure is first synthesized and then refined to produce detailed, high-quality geometry. To
support reliable 3D generation, we develop a comprehensive data processing pipeline that includes
a novel watertight processing method and high-quality data filtering. This pipeline improves
the geometric quality of publicly available 3D datasets by removing low-quality samples, filling
1
arXiv:2512.21185v2 [cs.CV] 25 Dec 2025
holes, and thickening thin structures, while preserving fine-grained geometric details. To enable
fine-grained geometry refinement, we decouple spatial localization from geometric detail synthesis
in the diffusion process. We achieve this by performing voxel-based refinement at fixed spatial
locations, where voxel queries derived from coarse geometry provide explicit positional anchors
encoded via RoPE, allowing the diffusion model to focus on synthesizing local geometric details
within a reduced, structured solution space. Our model is trained exclusively on publicly available
3D datasets, achieving strong geometric quality despite limited training resources. Extensive
evaluations demonstrate that UltraShape 1.0 performs competitively with existing open-source
methods in both data processing quality and geometry generation. All code and trained models
will be released to support future research.
Date: December 29, 2025
Project Page: https://pku-yuangroup.github.io/UltraShape-1.0/
1
Introduction
3D content generation plays a fundamental role across a wide range of applications, including film and visual
effects production, augmented and virtual reality, robotics, industrial design, and modern video games. Across
these domains, generation of high-fidelity 3D geometry remains a core technical requirement. As demand for
scalable, automated 3D geometry generation continues to grow, learning-based 3D generation has emerged as
a key research direction in computer vision and computer graphics. Compared to 2D content generation, 3D
generation poses substantially greater challenges. First, high-quality 3D data is significantly scarcer, often
represented non-uniformly, and typically requires strong geometric properties, such as watertightness, to be
directly usable in downstream tasks. In addition, common 3D representations are inherently sparse, and both
memory consumption and computational cost scale cubically with spatial resolution, severely limiting the
achievable level of geometric detail and scalability. These factors make it difficult for existing methods to
produce fine-grained geometry while maintaining robustness at higher resolutions. As a result, 3D generation
techniques have not yet converged on a unified, scalable pipeline.
Existing watertight remeshing techniques for 3D generative models can be broadly categorized into UDF-
based, visibility-check-based, and flood-fill-based approaches. UDF-based methods typically compute unsigned
distance fields (UDFs) on dense voxel grids and derive pseudo-SDFs by subtracting a small offset ϵ [2,
28]; however, this heuristic lacks explicit sign inference, often resulting in double-layered surfaces or the
erroneous removal of valid disconnected components (e.g., wheels) when filtering for the largest connected
part. Alternatively, visibility-check-based methods employ ray casting to identify interior regions [12, 13, 28],
which effectively seal cracks and eliminate spurious internal structures but remain sensitive to occlusions and
prone to high-frequency geometric noise in complex regions. Finally, flood-fill-based strategies infer signs
by expanding from exterior seeds (e.g., ManifoldPlus [7]) to generate clean, regularized surfaces. Despite
their effectiveness on closed shapes, these methods rely heavily on watertight assumptions; when applied to
non-watertight or self-intersecting inputs, the fill process often leaks into the interior, yielding unintended
double-layered thin shells.
Alongside earlier approaches such as Score Distillation Sampling [1, 18, 21] and Large Reconstruction
Models [6, 20, 25], diffusion transformer (DiT [17])-based methods have recently become the leading paradigm
in 3D generation. They can be broadly