Light Weight Residual Dense Attention Net for Spectral Reconstruction from RGB Images

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Hyperspectral Imaging is the acquisition of spectral and spatial information of a particular scene. Capturing such information from a specialized hyperspectral camera remains costly. Reconstructing such information from the RGB image achieves a better solution in both classification and object recognition tasks. This work proposes a novel light weight network with very less number of parameters about 233,059 parameters based on Residual dense model with attention mechanism to obtain this solution. This network uses Coordination Convolutional Block to get the spatial information. The weights from this block are shared by two independent feature extraction mechanisms, one by dense feature extraction and the other by the multiscale hierarchical feature extraction. Finally, the features from both the feature extraction mechanisms are globally fused to produce the 31 spectral bands. The network is trained with NTIRE 2020 challenge dataset and thus achieved 0.0457 MRAE metric value with less computational complexity.

💡 Research Summary

The paper addresses the costly and bulky nature of dedicated hyperspectral cameras by proposing a lightweight deep neural network that can reconstruct full hyperspectral information from a single RGB image. The proposed Light‑Weight Residual Dense Attention Net (LWRDAN) contains only 233,059 trainable parameters, yet achieves a Mean Relative Absolute Error (MRAE) of 0.0457 and a Structural Similarity Index (SSIM) of 0.9827 on the NTIRE 2020 spectral reconstruction challenge dataset, demonstrating that high accuracy does not necessarily require massive models.

The architecture begins with a Coordination Convolution (CoordConv) block that augments the three RGB channels with normalized x‑ and y‑coordinate maps. By explicitly providing spatial location information, CoordConv mitigates the translation‑invariance limitation of standard convolutions and improves the network’s ability to learn position‑dependent spectral mappings. The coordinate feature maps are then shared by two parallel feature‑extraction pathways.

The first pathway is a dense feature extraction block that stacks convolutional layers with dense connections (similar to DenseNet). Each layer receives the concatenated outputs of all preceding layers, preserving low‑level details and facilitating gradient flow. The second pathway consists of Residual Dense Attention Blocks (RDABs). Each RDAB builds upon a Residual Dense Block (RDB) and incorporates both channel‑attention and spatial‑attention modules (the CBAM mechanism). These attention modules re‑weight feature maps according to inter‑channel and inter‑spatial relationships, allowing the network to focus on spectrally informative regions while suppressing noise.

RDABs are arranged in a U‑Net‑style encoder‑decoder configuration. The encoder reduces spatial resolution via max‑pooling, while the decoder restores it using transpose convolutions, ensuring that multi‑scale contextual information is captured and later fused. After the two pathways have processed the input, their feature maps are globally fused and passed through a 1×1 convolution to produce 31 spectral bands corresponding to the target hyperspectral range.

Training employs a composite loss function that combines an L2 (pixel‑wise) term with an SSIM term. The SSIM component encourages the reconstructed spectra to retain structural similarity with the ground truth, which is crucial for preserving material‑specific spectral signatures. The network is trained on 400 RGB–hyperspectral image pairs (sub‑sampled into 20×20 patches) with a batch size of 8, using the Adam optimizer with a learning rate schedule from 1e‑2 to 1e‑4 over 500 epochs.

Ablation studies show that removing CoordConv raises MRAE to 0.08623, and omitting the CBAM attention modules similarly degrades performance, confirming the importance of both spatial coordinate encoding and attention mechanisms. Despite its modest size, the model runs comfortably on a GTX 1080 GPU with 8 GB RAM, making it suitable for deployment on resource‑constrained platforms such as mobile devices or edge‑computing units.

In summary, the authors present a compact yet powerful network that leverages coordinate‑aware convolutions, dense connectivity, and dual attention to achieve state‑of‑the‑art spectral reconstruction from RGB images. The work demonstrates that careful architectural design can reconcile the trade‑off between model size, computational cost, and reconstruction fidelity, opening avenues for low‑cost hyperspectral imaging in real‑world applications ranging from medical diagnostics to remote sensing.

Light Weight Residual Dense Attention Net for Spectral Reconstruction from RGB Images

💡 Research Summary

Comments & Academic Discussion

Leave a Comment