Cholec80-port: A Geometrically Consistent Trocar Port Segmentation Dataset for Robust Surgical Scene Understanding

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Trocar ports are camera-fixed, pseudo-static structures that can persistently occlude laparoscopic views and attract disproportionate feature points due to specular, textured surfaces. This makes ports particularly detrimental to geometry-based downstream pipelines such as image stitching, 3D reconstruction, and visual SLAM, where dynamic or non-anatomical outliers degrade alignment and tracking stability. Despite this practical importance, explicit port labels are rare in public surgical datasets, and existing annotations often violate geometric consistency by masking the central lumen (opening), even when anatomical regions are visible through it. We present Cholec80-port, a high-fidelity trocar port segmentation dataset derived from Cholec80, together with a rigorous standard operating procedure (SOP) that defines a port-sleeve mask excluding the central opening. We additionally cleanse and unify existing public datasets under the same SOP. Experiments demonstrate that geometrically consistent annotations substantially improve cross-dataset robustness beyond what dataset size alone provides.

💡 Research Summary

This paper addresses a largely overlooked source of error in laparoscopic computer‑vision pipelines: the trocar port. Because the endoscopic camera passes through the port, its metallic or plastic sleeve is a static, camera‑fixed object that frequently appears in the field of view. Its highly specular and textured surface attracts an excessive number of feature points, which can dominate matching, increase geometric error, and ultimately degrade downstream geometry‑based tasks such as image stitching, dense 3‑D reconstruction, and visual SLAM. Existing public surgical datasets either omit explicit port annotations or provide masks that include the central lumen, thereby suppressing anatomical pixels that are visible through the opening and breaking geometric consistency.

To solve this, the authors propose a rigorous annotation standard operating procedure (SOP) that defines a “port‑sleeve” mask: the cylindrical wall of the trocar is labeled, while the central opening is deliberately left unmasked. This SOP ensures that any anatomical structures seen through the lumen remain available for processing and that the mask aligns with the true physical boundary of the non‑anatomical object.

Using the first 20 videos of the Cholec80 collection, the authors sampled every 30th frame, yielding 38,434 annotated frames, of which 1,398 contain visible ports. This positive sample size is an order of magnitude larger than the two previously available port‑focused datasets (m2caiSeg with 255 positive frames and GynSurg with 130). All annotations were performed in CVAT under the SOP, with ambiguous cases resolved by consulting neighboring frames to confirm the sleeve’s extent.

In addition to creating the new Cholec80‑port dataset, the authors cleanse and unify the existing m2caiSeg and GynSurg datasets to conform to the same SOP. For m2caiSeg they re‑annotate to remove interpolation artifacts; for GynSurg they reverse the “hole‑filling” policy by subtracting the central lumen from the original polygons, leaving sleeve‑only masks. The cleaned combined set is referred to as “Combined (cleaned)”.

The segmentation model is a ConvNeXt‑Base encoder paired with a U‑Net decoder, trained for binary semantic segmentation. The loss combines Dice loss and binary cross‑entropy (L = L_Dice + L_BCE). Training uses AdamW (lr = 5 × 10⁻⁵), batch size 16, and 384 × 384 input resolution, identical across all experiments to guarantee fair comparison.

Two evaluation metrics are reported: (1) Dice score computed only on frames where a port is present, measuring boundary recovery; (2) Detect F1, a frame‑level metric that treats a frame as positive if any pixel is predicted as port. Results show that a model trained on Cholec80‑port achieves Dice 0.862 and Detect F1 0.856 on its own test split. More importantly, when evaluated on the m2caiSeg test set, the Cholec80‑port‑trained model outperforms the model trained on the original m2caiSeg data, demonstrating that geometric consistency of the labels improves cross‑dataset robustness beyond sheer dataset size.

Cross‑dataset transfer to GynSurg remains challenging, likely due to domain shifts in port material, lighting conditions, and surgical workflow. Even the “Combined (cleaned)” set yields only modest gains, indicating that visual diversity (transparent ports, low‑contrast sleeves, peripheral appearances) is still limited. Ablation experiments without the SOP‑level cleansing show substantially worse transfer performance, confirming that annotation consistency is a dominant factor for generalization.

Failure cases are analyzed: (i) thin ports near image borders, (ii) transparent or low‑contrast sleeves that blend with background, and (iii) strong specular highlights that obscure sleeve boundaries. The authors suggest that future work should expand the variety of port appearances, incorporate higher‑resolution or multimodal data (e.g., depth, optical flow), and integrate port masking directly into SLAM and 3‑D reconstruction pipelines.

In conclusion, the paper delivers a high‑fidelity, geometrically consistent trocar‑port segmentation dataset (Cholec80‑port), a clear SOP for sleeve‑only labeling, and a cleaned unification of existing datasets. Empirical evidence shows that models trained on these consistent annotations achieve superior segmentation accuracy and markedly better cross‑dataset generalization, highlighting the critical role of annotation geometry in surgical scene understanding. The work opens the path toward more reliable geometry‑driven applications in minimally invasive surgery.

Cholec80-port: A Geometrically Consistent Trocar Port Segmentation Dataset for Robust Surgical Scene Understanding

💡 Research Summary

Comments & Academic Discussion

Leave a Comment