FoundationPose-Initialized 3D-2D Liver Registration for Surgical Augmented Reality

FoundationPose-Initialized 3D-2D Liver Registration for Surgical Augmented Reality
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Augmented reality can improve tumor localization in laparoscopic liver surgery. Existing registration pipelines typically depend on organ contours; deformable (non-rigid) alignment is often handled with finite-element (FE) models coupled to dimensionality-reduction or machine-learning components. We integrate laparoscopic depth maps with a foundation pose estimator for camera-liver pose estimation and replace FE-based deformation with non-rigid iterative closest point (NICP) to lower engineering/modeling complexity and expertise requirements. On real patient data, the depth-augmented foundation pose approach achieved 9.91 mm mean registration error in 3 cases. Combined rigid-NICP registration outperformed rigid-only registration, demonstrating NICP as an efficient substitute for finite-element deformable models. This pipeline achieves clinically relevant accuracy while offering a lightweight, engineering-friendly alternative to FE-based deformation.


💡 Research Summary

This paper addresses the challenging problem of 3D‑2D liver registration for augmented reality (AR) guidance in laparoscopic surgery. The authors propose a two‑stage pipeline that replaces the traditionally heavy finite‑element (FE) deformation modeling with a lightweight combination of monocular depth estimation, a foundation‑pose network, and a non‑rigid Iterative Closest Point (NICP) algorithm guided by a gradient‑free optimizer.

In the first stage, they adapt the RefineNet architecture from FoundationPose to predict a 6‑DOF pose offset between two laparoscopic views. The network ingests three modalities: (1) liver contour maps (right/left ridges, ligaments, silhouette), (2) full liver masks, and (3) depth maps generated by the recent Depth Anything V2 model. To bridge the domain gap between synthetic training data and real intra‑operative images, extensive augmentations are applied to each modality (contour thinning/dilation, random occlusions, morphological jitter on masks, and several stochastic depth corruptions). Instead of the conventional rotation‑translation MSE loss, they define a surface‑MSE loss that measures the squared distance between the transformed source mesh and the target mesh after applying the predicted pose. During inference, the network iteratively refines the pose up to ten times or until convergence.

The second stage tackles non‑rigid deformation. A public liver mesh dataset is first aligned to the patient‑specific pre‑operative liver using rigid ICP. Each aligned mesh is then deformed to the patient mesh via NICP, producing a collection of plausible liver shapes. Principal Component Analysis (PCA) on these shapes yields ten dominant deformation modes and a mean shape, forming a low‑dimensional statistical shape model. For registration, the authors render the current shape (mean plus a linear combination of the PCA modes) under the current pose and compute a weighted Hausdorff distance between rendered contours and the intra‑operative label masks. Because the Hausdorff distance is non‑differentiable, they employ Covariance Matrix Adaptation Evolution Strategy (CMA‑ES) to jointly optimize the pose parameters and the PCA shape coefficients. Search bounds are set to ±20 mm and ±10° for translation/rotation and to


Comments & Academic Discussion

Loading comments...

Leave a Comment