Segmenting infant brains across magnetic fields: Domain randomization and annotation curation in ultra-low field MRI

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Early identification of neurodevelopmental disorders relies on accurate segmentation of brain structures in infancy, a task complicated by rapid brain growth, poor tissue contrast, and motion artifacts in pediatric MRI. These challenges are further exacerbated in ultra-low-field (ULF, 0.064~T) MRI, which, despite its lower image quality, offers an affordable, portable, and sedation-free alternative for use in low-resource settings. In this work, we propose a domain randomization (DR) framework to bridge the domain gap between high-field (HF) and ULF MRI in the context of the hippocampi and basal ganglia segmentation in the LISA challenge. We show that pre-training on whole-brain HF segmentations using DR significantly improves generalization to ULF data, and that careful curation of training labels, by removing misregistered HF-to-ULF annotations from training, further boosts performance. By fusing the predictions of several models through majority voting, we are able to achieve competitive performance. Our results demonstrate that combining robust augmentation with annotation quality control can enable accurate segmentation in ULF data. Our code is available at https://github.com/Medical-Image-Analysis-Laboratory/lisasegm

💡 Research Summary

This paper addresses the challenge of segmenting deep gray‑matter structures (hippocampi and basal ganglia) in infant brain MRI acquired with ultra‑low‑field (ULF) scanners (0.064 T). While ULF MRI offers a low‑cost, portable, and sedation‑free solution for low‑resource settings, its low signal‑to‑noise ratio and reduced spatial resolution make direct application of high‑field (HF, 1.5 T/3 T) deep‑learning models ineffective. The authors propose a two‑pronged strategy: (1) domain randomization (DR) to bridge the HF‑ULF domain gap, and (2) careful curation of training labels to remove misregistered annotations that arise when HF segmentations are propagated to ULF space.

Datasets and annotation schemes
Three datasets are used: (i) dHCP (3 T, neonatal, 783 infants) and (ii) BOBs (3 T, infants 1–9 months, 51 infants) for pre‑training, and (iii) the LISA challenge dataset (0.064 T, 79 training, 12 validation infants) for fine‑tuning and evaluation. Two annotation variants are defined: “LISA” (only the eight structures required by the challenge) and “LISA +” (the eight structures plus six broader brain tissue groups derived from the dense HF labels).

Domain randomization pipeline
The authors adapt the FetalSynthSeg synthetic data generator to the ULF setting. Random rigid and non‑rigid deformations, intensity and contrast perturbations, and a final down‑sampling/resampling step that mimics the anisotropic slice thickness of ULF acquisitions are applied. Additionally, ULF‑specific k‑space artifacts (motion, ghosting, spikes) are randomly injected. The resulting synthetic volumes are isotropically resampled to 1 mm³, matching the LISA resolution.

Training strategy
Two pre‑training variants are explored:

Synth: trained solely on synthetic DR data.
FT‑Real: trained on synthetic data and then fine‑tuned on the real HF images (dHCP or BOBs).

Both variants are subsequently fine‑tuned on the LISA ULF images. During fine‑tuning, only the output channels corresponding to the target structures are optimized when a whole‑brain pre‑trained model is used. After fine‑tuning, ensembles are built by voxel‑wise majority voting across several fine‑tuned models.

All models are 3‑D U‑Nets implemented in MONAI, trained with Adam (lr = 1e‑3), Dice‑Cross‑Entropy loss, ReduceLROnPlateau, and early stopping. Training is performed on RTX 6000 GPUs with batch size 1.

Experiments and results

Domain randomization effectiveness – FT‑Real models consistently outperform Synth‑only models, confirming that exposure to real HF images after synthetic pre‑training improves feature learning. Even without any LISA exposure, the HF‑pre‑trained models produce qualitatively reasonable segmentations, especially for ventricles and caudate nuclei.
Impact of annotation quality – Visual inspection of the LISA training set revealed that 23 out of 79 cases contain misaligned right ventricle and caudate labels (propagation errors from HF to ULF). The authors created three fine‑tuning subsets: all data, only “good” (correctly aligned) cases, and only “bad” (misaligned) cases. Models fine‑tuned on the “good” subset achieve higher Dice scores (≈ 0.56 vs 0.52 for hippocampus) and lower Hausdorff distances, demonstrating that label curation is crucial for robust ULF performance.
Pre‑training dataset choice – BOBs‑based pre‑training generally yields better results than dHCP, likely because BOBs covers a broader post‑natal age range (1–9 months) that is closer to the LISA cohort (up to 16 months).
Ensembling – Majority‑voting across multiple fine‑tuned models improves the normalized average metric (NormAvg) compared with any single model, confirming that ensemble diversity mitigates individual model errors.

Key insights

Domain randomization that mimics ULF‑specific artifacts can transfer knowledge from HF to ULF despite large contrast and resolution differences.
Quality control of propagated HF labels is essential; even a modest number of misregistered cases can degrade downstream performance.
Whole‑brain pre‑training with additional tissue classes (LISA +) helps but must be balanced against label definition mismatches.
Simple voxel‑wise majority voting provides a cost‑effective way to boost performance without additional training.

Limitations and future work
The synthetic generator does not capture all possible ULF noise patterns, so performance may drop on extreme low‑contrast scans. Manual label curation is labor‑intensive; automated quality‑assessment tools would be valuable for scaling. The study only uses 3 T HF data; exploring intermediate field strengths (1.5 T, 2 T) could further narrow the domain gap. Future directions include integrating GAN‑based domain adaptation, self‑supervised pre‑training on unlabeled ULF scans, and extending the pipeline to pathological infant populations.

Overall, the paper demonstrates that a combination of robust domain randomization and meticulous annotation curation enables accurate segmentation of infant brain structures in ultra‑low‑field MRI, paving the way for affordable neurodevelopmental imaging in resource‑limited settings.

Segmenting infant brains across magnetic fields: Domain randomization and annotation curation in ultra-low field MRI

💡 Research Summary

Comments & Academic Discussion

Leave a Comment