A novel open-source ultrasound dataset with deep learning benchmarks for spinal cord injury localization and anatomical segmentation
While deep learning has catalyzed breakthroughs across numerous domains, its broader adoption in clinical settings is inhibited by the costly and time-intensive nature of data acquisition and annotation. To further facilitate medical machine learning, we present an ultrasound dataset of 10,223 Brightness-mode (B-mode) images consisting of sagittal slices of porcine spinal cords (N=25) before and after a contusion injury. We additionally benchmark the performance metrics of several state-of-the-art object detection algorithms to localize the site of injury and semantic segmentation models to label the anatomy for comparison and creation of task-specific architectures. Finally, we evaluate the zero-shot generalization capabilities of the segmentation models on human ultrasound spinal cord images to determine whether training on our porcine dataset is sufficient for accurately interpreting human data. Our results show that the YOLOv8 detection model outperforms all evaluated models for injury localization, achieving a mean Average Precision (mAP50-95) score of 0.606. Segmentation metrics indicate that the DeepLabv3 segmentation model achieves the highest accuracy on unseen porcine anatomy, with a Mean Dice score of 0.587, while SAMed achieves the highest Mean Dice score generalizing to human anatomy (0.445). To the best of our knowledge, this is the largest annotated dataset of spinal cord ultrasound images made publicly available to researchers and medical professionals, as well as the first public report of object detection and segmentation architectures to assess anatomical markers in the spinal cord for methodology development and clinical applications.
💡 Research Summary
This paper introduces the largest publicly available ultrasound dataset for spinal cord injury (SCI) research and provides comprehensive deep‑learning benchmarks for both injury localization and anatomical segmentation. The authors collected 10,223 B‑mode sagittal images from 25 female Yorkshire pigs. Each animal underwent a laminectomy at thoracic levels T4‑T6, followed by a controlled contusion injury generated by dropping a 20 g, 40 g, or 60 g weight from a height of 17 cm. Ultrasound acquisition used a Canon Aplio i800 system with either a 12 MHz or 20 MHz transducer, producing images at 1280 × 960 px which were later cropped to 690 × 275 px (≈25 mm × 8 mm) and saved as PNG files. After quality control, the dataset comprises 4,467 pre‑injury and 5,756 post‑injury frames, annotated with bounding boxes around hematomas for detection and pixel‑wise masks for five anatomical structures (dura, cerebrospinal fluid, pia, spinal cord, hematoma) plus dorsal and ventral spaces. Ambiguous boundaries were merged into “dura/pia complex” or “dura/ventral complex” to preserve label consistency.
A separate human test set of 86 ultrasound frames from eight post‑laminectomy patients was also curated and annotated using the same protocol, enabling zero‑shot generalization evaluation.
For injury localization, five state‑of‑the‑art object detectors were trained under identical conditions: YOLOv8, YOLOv5, Faster‑RCNN, RetinaNet, and EfficientDet. Performance was measured with mean Average Precision (mAP) across IoU thresholds 0.5–0.95 and inference latency. YOLOv8 achieved the highest mAP of 0.606 and a latency of ~28 ms per frame, indicating suitability for real‑time intra‑operative use.
Semantic segmentation was benchmarked with six models: DeepLabv3+, UNet, UNet++, SegFormer, and SAMed (a foundation‑model‑based segmenter). On the porcine test split, DeepLabv3+ obtained the best mean Dice coefficient of 0.587 and an IoU of 0.68, benefitting from atrous spatial pyramid pooling and multi‑scale context aggregation. When applied zero‑shot to the human set, all models experienced performance drops, but SAMed recorded the highest Dice of 0.445, suggesting that large‑scale pre‑training helps bridge the domain gap between animal and human ultrasound.
Beyond accuracy, the authors propose a “continuous‑monitoring suitability” metric that combines detection/segmentation quality with computational footprint (GPU memory, FLOPs, power consumption). Both YOLOv8 and DeepLabv3+ satisfy the metric, supporting deployment on wearable or implantable ultrasound devices that require >30 fps processing.
The paper discusses several limitations: (1) the dataset is limited to a single species and imaging hardware, which may not capture the full variability of human spinal anatomy; (2) only hematoma is annotated as a pathology, leaving other clinically relevant lesions (e.g., edema, fibrosis) unaddressed; (3) the human test set is small, limiting statistical confidence in generalization claims; and (4) the study does not explore temporal modeling despite ultrasound’s high frame rate. Future work is suggested to expand multi‑center, multi‑device collections, incorporate additional pathology labels, and develop video‑based models that exploit temporal consistency.
In summary, this work delivers a valuable resource—10,223 annotated porcine spinal‑cord ultrasound images—along with rigorous benchmarks that demonstrate YOLOv8 as the leading detector and DeepLabv3+ as the top segmenter for the animal domain, while SAMed shows promise for cross‑species transfer. By releasing the data and code openly, the authors aim to accelerate methodological development in medical computer vision and facilitate the translation of real‑time ultrasound AI tools into clinical SCI management.
Comments & Academic Discussion
Loading comments...
Leave a Comment