MetaSSP: Enhancing Semi-supervised Implicit 3D Reconstruction through Meta-adaptive EMA and SDF-aware Pseudo-label Evaluation
Implicit SDF-based methods for single-view 3D reconstruction achieve high-quality surfaces but require large labeled datasets, limiting their scalability. We propose MetaSSP, a novel semi-supervised framework that exploits abundant unlabeled images. Our approach introduces gradient-based parameter importance estimation to regularize adaptive EMA updates and an SDF-aware pseudo-label weighting mechanism combining augmentation consistency with SDF variance. Beginning with a 10% supervised warm-up, the unified pipeline jointly refines labeled and unlabeled data. On the Pix3D benchmark, our method reduces Chamfer Distance by approximately 20.61% and increases IoU by around 24.09% compared to existing semi-supervised baselines, setting a new state of the art.
💡 Research Summary
MetaSSP tackles the data-hungry nature of implicit signed‑distance‑function (SDF) based single‑view 3D reconstruction by introducing a dedicated semi‑supervised learning framework. The method proceeds in two stages. First, a “warm‑up” phase trains a teacher network (based on the SSR implicit SDF architecture) on only 10 % of the available labeled Pix3D images, producing a stable initial model that can generate reliable pseudo‑labels.
In the second stage, abundant unlabeled images are incorporated through a teacher–student paradigm. For each unlabeled sample, the teacher processes a weakly augmented version (WA) and a strongly augmented version (SA). Two complementary reliability cues are computed: (1) an augmentation‑consistency loss ℓ_cons that measures the discrepancy between the strong and weak predictions, and (2) the variance σ² of the weak prediction’s SDF values, which reflects how tightly the predicted distances cluster around the true surface. These cues are linearly combined and clamped to obtain a pseudo‑label weight w_pseudo = clip(1 − α ℓ_cons − β σ², 0, 1) with α = β = 4. The final training loss blends supervised loss on labeled data and weighted unsupervised loss on unlabeled data, controlled by a scalar λ = 0.2. Consequently, noisy pseudo‑labels are automatically down‑weighted, while geometrically consistent, low‑variance predictions receive stronger supervision.
A central contribution is the meta‑adaptive exponential moving average (EMA) used to update the teacher from the student. At each epoch, the method estimates per‑parameter importance ω_i by averaging the absolute gradients of a simple scalar loss ℓ = ∥sdf∥₂² over a small batch of unlabeled images. Parameters with larger ω_i are deemed more critical for shaping the SDF field and are protected during EMA updates. The EMA momentum itself is not fixed: a cosine‑annealed base momentum m_base is modulated by a tiny meta‑controller MLP that takes as input the teacher‑student loss gap Δℓ, the teacher’s own loss ℓ_T, and the normalized training progress t/T. The controller outputs a scaling factor γ, and the final momentum becomes m = clip(γ · m_base, m_min, m_max). The teacher’s parameters are then updated as θ_T,i ← θ_T,i + (1 − m · η · ω_i)·Δθ_i, where Δθ_i is the student‑teacher parameter difference and η is a small hyper‑parameter. This dual adaptation—dynamic momentum and importance‑regularized updates—prevents the teacher from drifting when the student diverges, while preserving essential weights. An occasional “reset” with a lower momentum is applied if the validation loss gap falls below a threshold, further stabilizing training.
Experiments on the Pix3D dataset (standard S1 split) demonstrate the effectiveness of MetaSSP. Using only 10 % of the labeled data for warm‑up and the remaining 90 % as unlabeled, the method achieves a 20.61 % reduction in Chamfer Distance and a 24.09 % increase in Intersection‑over‑Union compared with the strongest prior semi‑supervised baselines (FixMatch, SSP3D, SSMP). Category‑wise results show consistent gains across chairs, beds, desks, and other object classes, with notable improvements in fine‑grained surface detail.
Ablation studies dissect each component. Fixed‑momentum EMA alone yields modest gains; adding gradient‑based importance regularization (ImpEMA) improves stability; introducing a dynamic EMA schedule (Dyn‑ImpEMA) further boosts performance; and the full configuration that also incorporates the SDF‑aware pseudo‑label weighting (Dyn‑ImpEMA‑adaptive) delivers the best results. Removing the pseudo‑label weighting causes a sharp performance drop, confirming the necessity of geometry‑aware confidence estimation.
In summary, MetaSSP presents the first semi‑supervised pipeline tailored for implicit SDF reconstruction. By jointly leveraging (i) per‑parameter importance‑guided EMA regularization, (ii) a geometry‑aware pseudo‑label weighting that fuses consistency and variance cues, and (iii) a two‑stage training schedule, the framework attains state‑of‑the‑art reconstruction quality with dramatically reduced annotation effort. This opens the door for practical deployment of high‑fidelity 3D reconstruction in domains where labeled 3D data are scarce, such as AR/VR content creation, robotic perception, and virtual production.
Comments & Academic Discussion
Loading comments...
Leave a Comment