Enhancing Underwater Light Field Images via Global Geometry-aware Diffusion Process

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This work studies the challenging problem of acquiring high-quality underwater images via 4-D light field (LF) imaging. To this end, we propose GeoDiff-LF, a novel diffusion-based framework built upon SD-Turbo to enhance underwater 4-D LF imaging by leveraging its spatial-angular structure. GeoDiff-LF consists of three key adaptations: (1) a modified U-Net architecture with convolutional and attention adapters to model geometric cues, (2) a geometry-guided loss function using tensor decomposition and progressive weighting to regularize global structure, and (3) an optimized sampling strategy with noise prediction to improve efficiency. By integrating diffusion priors and LF geometry, GeoDiff-LF effectively mitigates color distortion in underwater scenes. Extensive experiments demonstrate that our framework outperforms existing methods across both visual fidelity and quantitative performance, advancing the state-of-the-art in enhancing underwater imaging. The code will be publicly available at https://github.com/linlos1234/GeoDiff-LF.

💡 Research Summary

The paper addresses the challenging problem of enhancing underwater images captured with a 4‑D light field (LF) camera. Underwater environments cause severe degradations such as color cast, low brightness, reduced contrast, and haze due to absorption, refraction, and scattering. While many methods exist for 2‑D underwater enhancement, only a few have explored LF data, and none have leveraged the recent advances in diffusion models.
To fill this gap, the authors propose GeoDiff‑LF, a diffusion‑based framework built on the SD‑Turbo model. GeoDiff‑LF introduces three key innovations:

Geometry‑aware U‑Net – The standard U‑Net backbone is augmented with convolutional adapters and attention adapters. The convolutional adapters implement spatial‑angular separable convolutions, efficiently extracting joint features from the four LF dimensions while keeping memory usage low. The attention adapters apply cross‑dimensional self‑attention, allowing the network to capture global dependencies across views and angles, which is crucial for correcting depth‑dependent attenuation and scattering.
Global geometry regularization – The LF tensor is decomposed using CP (CANDECOMP/PARAFAC) factorization to obtain a set of core components representing shared color, illumination, and depth information. These components are incorporated into a loss term that is progressively weighted during training: early steps focus on correcting color bias, while later steps enforce structural consistency and depth continuity across all views. This progressive scheme mitigates the tendency of simple L1/L2 losses to over‑smooth or distort geometry.
Efficient noise‑prediction sampling – Traditional diffusion models require hundreds of denoising steps. The authors observe that a raw underwater LF image Y₀ is already close to the clean target X₀, differing mainly in attenuation and scattering. Therefore, they introduce a noise map predictor that, given a chosen intermediate timestep τ (τ < T), generates a noisy sample X_τ directly from Y₀. The reverse diffusion then starts from τ instead of the full noise state, reducing the number of required steps to as few as 1–4 while preserving restoration quality.

The training follows the standard DDPM objective (L2 loss between true and predicted noise) but conditions the denoiser on both the low‑quality LF image and the timestep. During inference, the model first creates X_τ using the predictor, then iteratively denoises to obtain the final enhanced LF volume.

Experimental evaluation uses two publicly released underwater LF datasets: a small benchmark of 75 scenes and a larger real‑world collection. Baselines include traditional physics‑based methods, recent 2‑D diffusion models (DiffUIE, Diffusion‑based UIE), LF‑specific CNN/Transformer architectures, and a naïve adaptation of SD‑Turbo to 4‑D data. GeoDiff‑LF outperforms all baselines, achieving an average PSNR gain of >2 dB, higher SSIM, and lower CIEDE2000 color error. Qualitative results show restored natural colors, improved contrast, and preserved multi‑view coherence, which also benefits downstream tasks such as depth estimation and object detection.

Limitations and future work are acknowledged. GeoDiff‑LF relies on fine‑tuning a pre‑trained 2‑D diffusion model; a truly 4‑D trained diffusion backbone could further improve generalization, especially for extreme deep‑sea conditions where color loss is severe. The authors suggest extending the framework with multi‑scale geometric attention, joint depth‑estimation training, and lightweight variants for real‑time deployment on underwater robots.

In summary, GeoDiff‑LF is the first diffusion‑based system that explicitly integrates global LF geometry into the denoising process, delivering state‑of‑the‑art underwater LF image enhancement while dramatically reducing inference cost. The code and datasets will be released, promising broad impact on marine imaging, underwater robotics, and scientific exploration.

Enhancing Underwater Light Field Images via Global Geometry-aware Diffusion Process

💡 Research Summary

Comments & Academic Discussion

Leave a Comment