Adversarial Deep-Unfolding Network for MA-XRF Super-Resolution on Old Master Paintings Using Minimal Training Data
High-quality element distribution maps enable precise analysis of the material composition and condition of Old Master paintings. These maps are typically produced from data acquired through Macro X-ray fluorescence (MA-XRF) scanning, a non-invasive technique that collects spectral information. However, MA-XRF is often limited by a trade-off between acquisition time and resolution. Achieving higher resolution requires longer scanning times, which can be impractical for detailed analysis of large artworks. Super-resolution MA-XRF provides an alternative solution by enhancing the quality of MA-XRF scans while reducing the need for extended scanning sessions. This paper introduces a tailored super-resolution approach to improve MA-XRF analysis of Old Master paintings. Our method proposes a novel adversarial neural network architecture for MA-XRF, inspired by the Learned Iterative Shrinkage-Thresholding Algorithm. It is specifically designed to work in an unsupervised manner, making efficient use of the limited available data. This design avoids the need for extensive datasets or pre-trained networks, allowing it to be trained using just a single high-resolution RGB image alongside low-resolution MA-XRF data. Numerical results demonstrate that our method outperforms existing state-of-the-art super-resolution techniques for MA-XRF scans of Old Master paintings.
💡 Research Summary
**
The paper tackles the long‑standing trade‑off in Macro X‑ray Fluorescence (MA‑XRF) imaging between acquisition time and spatial resolution, a critical issue for the non‑invasive analysis of Old Master paintings. Traditional solutions increase scan time or shrink the X‑ray beam, both of which are impractical for large, valuable artworks. The authors propose a novel super‑resolution (SR) framework that can reconstruct a high‑resolution (HR) MA‑XRF map from a low‑resolution (LR) MA‑XRF scan and a single high‑resolution RGB photograph of the same painting.
The core of the method is a model‑based deep unfolding network inspired by the Learned Iterative Shrinkage‑Thresholding Algorithm (LISTA). The authors first formulate the multimodal SR problem as a sparse coding task: the HR MA‑XRF and RGB images share a common dictionary and each has its own modality‑specific dictionary. The unknown coefficient matrix A must be sparse, non‑negative, and bounded. By unfolding the ISTA iterations into a neural network, each layer performs a learned update of A, but replaces the usual soft‑thresholding with a sigmoid activation and a per‑row bias λ(k) to enforce the required constraints. Convolutional layers with 1×1 filters implement the dictionary multiplications, and the dictionary weights are kept non‑negative by squaring them during training.
Training is completely unsupervised. The network receives as input the concatenation of an up‑sampled LR MA‑XRF (via bilinear interpolation) and the HR RGB image. After K (five) unfolded iterations the network outputs a reconstructed concatenated tensor (\hat X), from which the HR MA‑XRF (\hat Y) is extracted. To guarantee consistency with the observed LR data, a projection step (\text{Proj}(\hat Y)=\hat Y-(\hat YU-Y_{\downarrow})U^{T}) forces the down‑sampled version of the reconstruction to match the measured LR MA‑XRF exactly.
The loss function combines three terms: (1) an L2 fidelity term on the LR MA‑XRF, (2) an L2 term on the RGB reconstruction, and (3) an adversarial loss that encourages realistic MA‑XRF textures. The adversarial component uses a discriminator trained on patches; only patches that the discriminator misclassifies are kept for loss computation, which stabilizes training. Because real HR MA‑XRF patches are unavailable, the authors synthesize “pseudo‑real” patches by blending the up‑sampled MA‑XRF channel with the most correlated RGB channel, weighted by a random factor β.
Experiments were conducted on three celebrated paintings (De Heem’s “Flowers and Insects”, Goya’s “Doña Isabel de Porcel”, and Leonardo’s “Virgin of the Rocks”). For each, the authors generated LR data by 4× down‑sampling ground‑truth de‑convoluted MA‑XRF maps (21, 7, and 9 spectral bands respectively). They compared their method against a range of baselines: MA‑XRF‑specific SR methods (SSR, SSRCU), hyperspectral SR approaches (CSTF, CMS, L‑TTR), and state‑of‑the‑art single‑image SR networks (CAR, HA‑T, Swin2SR). Quantitatively, the proposed network achieved the lowest RMSE and highest PSNR on all three datasets, outperforming SSRCU by 22–30 % in RMSE and gaining 1.5–2.0 dB in PSNR. Visual inspection of Ca Kα element maps showed sharper edges and more accurate fine‑scale elemental distributions, with error maps confirming reduced reconstruction artifacts.
Key contributions include: (i) an unsupervised SR pipeline that requires only a single HR RGB image and the LR MA‑XRF scan, eliminating the need for large training corpora; (ii) a LISTA‑based unfolding architecture that explicitly enforces sparsity, non‑negativity, and boundedness of the representation; (iii) a projection step that embeds the physical down‑sampling model into the learning loop; and (iv) an adversarial training scheme that focuses on mis‑classified patches and uses RGB‑derived pseudo‑real samples to guide the generator.
Limitations are acknowledged: the method relies on a strong correlation between RGB texture and X‑ray fluorescence patterns; in cases where this correlation is weak, performance may degrade. Training stability depends on careful initialization and learning‑rate scheduling, especially because no pre‑training data are available. Future work could explore deeper unfolding (more than five layers), multi‑scale dictionaries, or incorporation of additional modalities (e.g., infrared reflectography) to further improve robustness and generalization.
Overall, the paper presents a compelling solution to MA‑XRF super‑resolution that is both data‑efficient and physically grounded, offering a practical tool for conservators and scientists seeking high‑resolution elemental maps without prohibitive scanning times.
Comments & Academic Discussion
Loading comments...
Leave a Comment