3D Reconstruction of Whole Stomach from Endoscope Video Using Structure-from-Motion

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Gastric endoscopy is a common clinical practice that enables medical doctors to diagnose the stomach inside a body. In order to identify a gastric lesion’s location such as early gastric cancer within the stomach, this work addressed to reconstruct the 3D shape of a whole stomach with color texture information generated from a standard monocular endoscope video. Previous works have tried to reconstruct the 3D structures of various organs from endoscope images. However, they are mainly focused on a partial surface. In this work, we investigated how to enable structure-from-motion (SfM) to reconstruct the whole shape of a stomach from a standard endoscope video. We specifically investigated the combined effect of chromo-endoscopy and color channel selection on SfM. Our study found that 3D reconstruction of the whole stomach can be achieved by using red channel images captured under chromo-endoscopy by spreading indigo carmine (IC) dye on the stomach surface.

💡 Research Summary

This paper presents an offline pipeline that reconstructs the complete three‑dimensional shape of the human stomach, together with realistic color texture, from a conventional monocular endoscopic video. The authors address a key limitation of previous endoscopic 3D reconstruction work, which has largely been restricted to partial organ surfaces (e.g., colon, liver, larynx) and suffered from the texture‑poor nature of internal mucosa. Their solution combines two complementary strategies: (1) the use of chromo‑endoscopy with indigo‑carmine (IC) dye to artificially introduce high‑contrast texture onto the gastric wall, and (2) systematic evaluation of individual RGB color channels to determine which provides the most robust feature set for Structure‑from‑Motion (SfM).

Data were collected from three patients undergoing routine gastrointestinal endoscopy. Video was captured with an Olympus GIF‑H290 scope and an IMH‑20 image management hub, saved as unprocessed 30 fps full‑HD AVI streams. For each patient, two sequences were recorded: one without dye and one after spraying IC dye onto the mucosa. A planar checkerboard was also imaged for camera calibration. Because the endoscope employs an ultra‑wide fisheye lens, significant radial distortion and non‑rectilinear projection were corrected using a fisheye model (OpenCV) based on the checkerboard images.

Pre‑processing involved extracting all frames, discarding near‑duplicate frames, and separating the three color channels. The authors observed noticeable channel misalignment in the raw RGB frames; consequently, they fed single‑channel images (red, green, or blue) into the SfM pipeline to avoid cross‑channel artifacts. For each channel and each dye condition, they executed a full SfM workflow using COLMAP: SIFT feature detection, exhaustive pairwise matching, pose estimation, and bundle adjustment to obtain a sparse point cloud and camera extrinsics.

Quantitative results (Table 1) reveal a dramatic impact of the IC dye and of the red channel. Without dye, reconstruction rates were low (6–24 % of frames) and the resulting point clouds were sparse (e.g., 5 740 points for red). With dye, the red channel achieved a 99.8 % frame reconstruction rate, generating 731 070 points—far exceeding the green (≈64 %) and blue (≈15 %) channels. The average number of 2‑D observations per image also increased markedly, indicating richer feature correspondence. Visual inspection confirmed that the red channel under IC dye produced the most complete and dense stomach model, while the green channel performed best only in the dye‑free case, and the blue channel was consistently the least effective.

The sparse point cloud was then processed to create a watertight mesh. The authors down‑sampled to 10 000 points, applied statistical outlier removal (points whose mean distance to nearest neighbors exceeded μ + 2σ were discarded), estimated normals, refined them using camera pose information, and finally performed screened Poisson surface reconstruction. Texture mapping was achieved by selecting, for each mesh triangle, the image with the most favorable viewing angle and distance, then projecting the corresponding red‑channel texture onto the mesh using MeshLab’s parameterization tools. The resulting textured mesh faithfully reproduces the overall gastric geometry and mucosal coloration.

A custom viewer was also developed, allowing clinicians to click any video frame and instantly see its estimated pose projected onto the 3D model. This capability enables precise 3D localization of lesions identified in the video, potentially aiding surgical planning for early gastric cancer resections.

In discussion, the authors highlight that the success of the red channel stems from the spectral properties of indigo‑carmine: the dye absorbs strongly in the blue‑green region, leaving the red channel with high contrast speckles that serve as reliable keypoints for SIFT. They note that while real‑time SLAM approaches offer immediate feedback, they sacrifice reconstruction fidelity due to limited feature descriptors and sequential matching. SfM, by contrast, leverages global optimization (bundle adjustment) and exhaustive matching, delivering superior geometry at the cost of offline processing.

Future work will focus on improving mesh fidelity through adaptive down‑sampling, more sophisticated outlier detection, and higher‑resolution texture blending. Integration with automated lesion detection and registration to pre‑operative CT or MRI data is envisioned to create a multimodal navigation platform for gastroenterologists and surgeons.

Overall, the study demonstrates that, with modest modifications (dye application and channel selection), a standard 2‑D endoscope can be transformed into a powerful 3‑D imaging tool capable of reconstructing the entire stomach, preserving both shape and color, without requiring specialized stereo hardware or structured‑light accessories. This represents a cost‑effective step toward enhanced endoscopic navigation and lesion localization in clinical practice.

3D Reconstruction of Whole Stomach from Endoscope Video Using Structure-from-Motion

💡 Research Summary

Comments & Academic Discussion

Leave a Comment