Spectral Compressive Imaging via Chromaticity-Intensity Decomposition
In coded aperture snapshot spectral imaging (CASSI), the captured measurement entangles spatial and spectral information, posing a severely ill-posed inverse problem for hyperspectral images (HSIs) reconstruction. Moreover, the captured radiance inherently depends on scene illumination, making it difficult to recover the intrinsic spectral reflectance that remains invariant to lighting conditions. To address these challenges, we propose a chromaticity-intensity decomposition framework, which disentangles an HSI into a spatially smooth intensity map and a spectrally variant chromaticity cube. The chromaticity encodes lighting-invariant reflectance, enriched with high-frequency spatial details and local spectral sparsity. Building on this decomposition, we develop CIDNet, a Chromaticity-Intensity Decomposition unfolding network within a dual-camera CASSI system. CIDNet integrates a hybrid spatial-spectral Transformer tailored to reconstruct fine-grained and sparse spectral chromaticity and a degradation-aware, spatially-adaptive noise estimation module that captures anisotropic noise across iterative stages. Extensive experiments on both synthetic and real-world CASSI datasets demonstrate that our method achieves superior performance in both spectral and chromaticity fidelity. Code and models will be publicly available.
💡 Research Summary
This paper tackles the fundamental difficulty of reconstructing hyperspectral images (HSIs) from the highly compressed measurements produced by coded aperture snapshot spectral imaging (CASSI). Traditional CASSI reconstruction methods either rely on handcrafted priors (total variation, low‑rank), plug‑and‑play denoisers, or end‑to‑end deep networks that implicitly learn spatial‑spectral correlations. However, they share two major limitations: (1) they do not explicitly separate the effects of illumination from the intrinsic reflectance of the scene, and (2) they treat measurement noise as spatially homogeneous, which is unrealistic for real optical systems.
The authors introduce a physically motivated chromaticity‑intensity decomposition (CID) for hyperspectral data. An HSI cube X(u,v,λ) is factorized as X = C ⊙ I, where I(u,v) is a spatially smooth intensity map defined as the average spectral energy per pixel, and C(u,v,λ) is the normalized spectral signature (chromaticity). Because I can be captured directly by a conventional RGB or PAN sensor in a dual‑camera CASSI setup, the unknown variable reduces to the chromaticity cube C, which is invariant to illumination, contains high‑frequency texture, and exhibits local spectral sparsity.
With this decomposition, the CASSI forward model becomes a linear inverse problem y = Hc + n, where c = vec(C) and H incorporates the coded aperture, dispersion, and the known intensity‑guided mask M′ = I ⊙ M. The authors adopt a Bayesian MAP formulation and explicitly model measurement noise as anisotropic Gaussian with a diagonal covariance Σ = diag(σ₁²,…,σ_n²). This leads to a data‑consistency update that generalizes existing ISTA, GAP, and HQS schemes:
c^{k+1} = z^{k} + Hᵀ (HHᵀ + μ Σ)^{-1} (y – H z^{k})
where Σ is spatially varying. To handle the fact that Σ is unknown and may change across iterations, they propose a degradation‑aware noise estimation module (DNEM). DNEM is a lightweight CNN that takes the current estimate z^{k} and the measurement y and predicts both Σ^{k} and a denoising strength ω^{k}=τ^{k}/μ^{k}. These parameters are then used in the gradient‑projection step and in the proximal (denoising) step, respectively.
The reconstruction algorithm is realized as a K‑stage deep unfolding network called CIDNet. Each stage consists of: (1) DNEM for Σ^{k} and ω^{k}, (2) the analytical gradient‑projection update (the equation above), and (3) a learned proximal operator implemented via a Hybrid Spatial‑Spectral Transformer (HSST). HSST adopts an asymmetric UNet architecture: the encoder employs window‑based local spatial attention (Spa‑L WSA) to capture fine texture, while the decoder uses a sparse Top‑K spectral attention (Spec‑TKSA) that selects the most informative spectral bands for each spatial location, thereby exploiting the local spectral sparsity of C while keeping computational cost low.
Extensive experiments were conducted on both synthetic CASSI data (with varying compression ratios and noise levels) and real measurements from a dual‑camera prototype. Quantitative metrics (PSNR, SSIM, spectral correlation) show that CIDNet consistently outperforms state‑of‑the‑art methods such as DeSCI, PIDS, GAP‑Net, MST, DAUHST, SSR, and In2SET, achieving improvements of 0.5–2 dB in PSNR and higher spectral fidelity. Visual results demonstrate that the chromaticity component effectively removes highlights, enhances low‑light details, and preserves high‑frequency textures, confirming the illumination‑invariant property of C.
Ablation studies reveal that (i) the chromaticity‑intensity decomposition itself yields a noticeable gain over directly reconstructing X, (ii) the anisotropic noise estimation significantly improves reconstruction under spatially varying noise, and (iii) the Top‑K spectral attention outperforms full‑spectral attention while being much more efficient.
In summary, the paper makes three key contributions: (1) a novel, physically grounded decomposition that separates illumination from reflectance in the CASSI context, (2) a MAP‑based reconstruction framework that incorporates spatially adaptive noise modeling, and (3) CIDNet, a deep unfolding architecture that combines a hybrid spatial‑spectral transformer with stage‑wise noise estimation. This work sets a new benchmark for compressive hyperspectral imaging and opens avenues for further research on multi‑camera configurations, real‑time implementation, and extension to other compressive optical systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment