Denoising and Baseline Correction of Low-Scan FTIR Spectra: A Benchmark of Deep Learning Models Against Traditional Signal Processing

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

High-quality Fourier Transform Infrared (FTIR) imaging usually needs extensive signal averaging to reduce noise and drift which severely limits clinical speed. Deep learning can accelerate imaging by reconstructing spectra from rapid, single-scan inputs. However, separating noise and baseline drift simultaneously without ground truth is an ill-posed inverse problem. Standard black-box architectures often rely on statistical approximations that introduce spectral hallucinations or fail to generalize to unstable atmospheric conditions. To solve these issues we propose a physics-informed cascade Unet that separates denoising and baseline correction tasks using a new, deterministic Physics Bridge. This architecture forces the network to separate random noise from chemical signals using an embedded SNIP layer to enforce spectroscopic constraints instead of learning statistical approximations. We benchmarked this approach against a standard single Unet and a traditional Savitzky-Golay/SNIP workflow. We used a dataset of human hypopharyngeal carcinoma cells (FaDu). The cascade model outperformed all other methods, achieving a 51.3% reduction in RMSE compared to raw single-scan inputs, surpassing both the single Unet (40.2%) and the traditional workflow (33.7%). Peak-aware metrics show that the cascade architecture eliminates spectral hallucinations found in standard deep learning. It also preserves peak intensity with much higher fidelity than traditional smoothing. These results show that the cascade Unet is a robust solution for diagnostic-grade FTIR imaging. It enables imaging speeds 32 times faster than current methods.

💡 Research Summary

This paper tackles a fundamental bottleneck in Fourier Transform Infrared (FTIR) imaging: the need for extensive signal averaging to obtain high‑quality spectra. Conventional high‑quality (HQ) FTIR requires 32–128 scans per pixel, which makes clinical deployment impractically slow. Single‑scan (low‑quality, LQ) acquisitions are fast but suffer from high‑frequency random noise and low‑frequency baseline drift caused by atmospheric water vapor, temperature fluctuations, and instrument instability. The authors propose a physics‑informed cascade U‑Net architecture that explicitly separates denoising and baseline correction into two stages, linked by a deterministic “Physics Bridge” that reverses the statistical normalizations applied during preprocessing and restores the spectra to physical absorbance units before baseline removal.

Dataset and preprocessing
Human hypopharyngeal carcinoma (FaDu) cells were cultured on CaF₂ windows and imaged with an Agilent FTIR microscope equipped with a 64 × 64 focal plane array detector. Four distinct fields of view (FoV1‑4) were recorded at three accumulation levels: 1 scan (LQ), 8 scans (intermediate LQ), and 32 scans (HQ ground truth). The 32‑scan spectra were first processed with the SNIP algorithm to remove baseline, providing pure absorbance targets. The LQ spectra retained their native baselines and were subjected to standard normal variate (SNV) and global min‑max scaling to stabilize training. An average water‑vapor background spectrum was subtracted, and silent regions (2250–2400 cm⁻¹) were trimmed.

Methods compared

Traditional workflow (benchmark) – Savitzky‑Golay (SG) smoothing followed by SNIP baseline removal. SG parameters (window length, polynomial order) were automatically optimized using Bayesian optimization (Tree‑structured Parzen Estimator) to achieve the lowest mean‑square error (MSE) against the HQ targets.
Single U‑Net – A 1‑D convolutional encoder‑decoder with residual blocks and spectral attention modules, trained end‑to‑end to map normalized LQ spectra (with baseline) directly to normalized HQ spectra (baseline‑removed). No explicit physical constraints are imposed.
Cascade U‑Net (proposed) – Two U‑Nets in series. The first network learns only high‑frequency denoising; its output is passed through the Physics Bridge (inverse SNV and min‑max) to recover physical absorbance values. The deterministic SNIP algorithm then removes the baseline. The second U‑Net refines the baseline‑corrected spectrum, focusing on preserving peak shape and intensity.

Evaluation metrics

Root‑mean‑square error (RMSE) reduction relative to raw LQ input.
Peak‑aware metrics: absolute errors in peak position, intensity, and full‑width at half‑maximum (FWHM).
“Hallucination Score” quantifying the presence of spurious peaks not present in the HQ reference.
Computational time per pixel (CPU vs. GPU).

Results

RMSE reduction: Cascade U‑Net achieved a 51.3 % decrease, outperforming the single U‑Net (40.2 %) and the traditional SG + SNIP pipeline (33.7 %).
Peak fidelity: The cascade model preserved peak intensities with a mean error of <6 % and maintained peak positions within 0.12 cm⁻¹, whereas the single U‑Net showed occasional intensity overshoots and the SG filter broadened peaks by ~12 % on average.
Hallucinations: The cascade architecture produced virtually no spurious peaks (Hallucination Score ≈ 0.01), while the single U‑Net generated detectable artifacts (Score ≈ 0.12). The traditional workflow, being deterministic, had no hallucinations but suffered from peak distortion.
Speed: On an NVIDIA RTX 3090, the cascade U‑Net required ~0.06 s per pixel, the single U‑Net ~0.07 s, and the SG + SNIP pipeline ~1.8 s (including Bayesian‑optimized SG). This translates to a ~30‑fold acceleration over the conventional method, enabling imaging speeds up to 32 × faster than current clinical practice.

Discussion and implications
The key innovation lies in embedding a non‑learnable, physics‑based baseline remover (SNIP) within the deep‑learning pipeline via the Physics Bridge. By offloading low‑frequency baseline estimation to a deterministic algorithm, the neural networks can concentrate on high‑frequency denoising and fine‑scale peak refinement, reducing the risk of over‑fitting to statistical noise patterns. This hybrid approach eliminates the primary source of spectral hallucinations that plague black‑box models trained on real experimental data lacking explicit ground truth for baseline. Moreover, the cascade design respects the distinct spectral domains (high‑frequency noise vs. low‑frequency drift), allowing each U‑Net to be optimized for its specific task, which is reflected in the superior quantitative metrics.

From a clinical perspective, the ability to reconstruct diagnostic‑grade spectra from a single scan without sacrificing chemical fidelity opens the door to real‑time intra‑operative FTIR diagnostics, rapid screening of biopsy specimens, and high‑throughput drug‑response assays. The authors also note that the methodology is readily extensible to other vibrational spectroscopies (Raman, NIR) and to three‑dimensional hyperspectral stacks, provided appropriate physics‑based baseline models are available.

Conclusion
The study demonstrates that a physics‑informed cascade U‑Net, which couples deterministic baseline correction with learned denoising, outperforms both conventional signal‑processing pipelines and end‑to‑end deep‑learning models in FTIR spectral restoration. It achieves higher accuracy, eliminates spurious spectral features, and dramatically reduces processing time, thereby offering a practical pathway toward fast, diagnostic‑grade FTIR imaging in clinical settings. Future work will focus on broader tissue types, robustness to extreme atmospheric variations, and integration into real‑time imaging hardware.

Denoising and Baseline Correction of Low-Scan FTIR Spectra: A Benchmark of Deep Learning Models Against Traditional Signal Processing

💡 Research Summary

Comments & Academic Discussion

Leave a Comment