SPWOOD: Sparse Partial Weakly-Supervised Oriented Object Detection
A consistent trend throughout the research of oriented object detection has been the pursuit of maintaining comparable performance with fewer and weaker annotations. This is particularly crucial in the remote sensing domain, where the dense object distribution and a wide variety of categories contribute to prohibitively high costs. Based on the supervision level, existing oriented object detection algorithms can be broadly grouped into fully supervised, semi-supervised, and weakly supervised methods. Within the scope of this work, we further categorize them to include sparsely supervised and partially weakly-supervised methods. To address the challenges of large-scale labeling, we introduce the first Sparse Partial Weakly-Supervised Oriented Object Detection framework, designed to efficiently leverage only a few sparse weakly-labeled data and plenty of unlabeled data. Our framework incorporates three key innovations: (1) We design a Sparse-annotation-Orientation-and-Scale-aware Student (SOS-Student) model to separate unlabeled objects from the background in a sparsely-labeled setting, and learn orientation and scale information from orientation-agnostic or scale-agnostic weak annotations. (2) We construct a novel Multi-level Pseudo-label Filtering strategy that leverages the distribution of model predictions, which is informed by the model’s multi-layer predictions. (3) We propose a unique sparse partitioning approach, ensuring equal treatment for each category. Extensive experiments on the DOTA and DIOR datasets show that our framework achieves a significant performance gain over traditional oriented object detection methods mentioned above, offering a highly cost-effective solution. Our code is publicly available at https://github.com/VisionXLab/SPWOOD.
💡 Research Summary
The paper introduces SPWOOD, the first framework for Sparse Partial Weakly‑Supervised Oriented Object Detection, targeting the high annotation cost problem in remote‑sensing imagery where objects are densely packed and appear in many categories. Existing oriented object detection (OOD) methods are categorized as fully supervised, semi‑supervised, weakly supervised, sparsely supervised, and partially weakly supervised. SPWOOD unifies the sparsely annotated (only a fraction of objects per image are labeled) and weakly annotated (horizontal boxes or points) settings and combines them with a large pool of unlabeled data.
The core contributions are threefold. First, the authors design a Sparse‑annotation‑Orientation‑and‑Scale‑aware Student (SOS‑Student). SOS‑Student tackles the ambiguity between background and unlabeled objects in sparse annotations by extending focal loss with separate weighting for hard negatives (high‑confidence predictions that should be objects but are treated as background). It also learns orientation and scale from weak annotations: point labels provide object centers, while horizontal boxes supply aspect ratios; a symmetry‑based self‑supervision component extracts rotation angles.
Second, a Multi‑level Pseudo‑label Filtering (MPF) mechanism is proposed for the semi‑supervised stage. Instead of a static confidence threshold, MPF fits a Gaussian Mixture Model to the teacher’s predictions at each feature‑pyramid level, dynamically adjusting per‑level thresholds. This leverages the distribution of multi‑layer outputs, allowing lower‑level features to be filtered conservatively and higher‑level features more aggressively, thereby producing more reliable pseudo‑labels for the student.
Third, the paper introduces an “Overall Sparse” partitioning strategy for constructing sparse‑partial datasets. Prior works sample sparsely on a per‑image basis, which over‑represents rare categories when an image contains a single instance of a rare class. By treating all labeled instances across the whole dataset as a single pool and sampling each class at a uniform ratio (e.g., 20 % of its total instances), the method preserves the original class distribution while drastically reducing labeling effort.
Training proceeds in two stages. In the burn‑in stage, SOS‑Student is pretrained on the limited sparse‑weak annotations (original and augmented views) and its weights are copied to a teacher network. In the self‑training stage, the teacher generates pseudo‑labels for the abundant unlabeled images; the student is further trained on these pseudo‑labels while the teacher is updated via Exponential Moving Average (EMA) of the student’s parameters, ensuring stability.
Experiments on DOTA‑v1.0/v1.5 and DIOR under the sparse‑partial setting (20 % sparse RBox + 20 % weak point/HBox, effectively zero labeling cost) demonstrate that SPWOOD narrows the gap to fully supervised models to within 2–3 % mAP, while achieving a 5× improvement in cost‑effectiveness. Compared against state‑of‑the‑art semi‑supervised, weakly supervised, and sparsely supervised baselines under identical annotation budgets, SPWOOD consistently outperforms them.
The authors acknowledge limitations: only point and horizontal‑box weak annotations are supported, and MPF’s Gaussian Mixture Model requires careful initialization that may be dataset‑dependent. Future work includes extending the framework to additional weak annotation types (e.g., polygonal masks, textual descriptions) and automating the pseudo‑label filtering hyper‑parameters.
Overall, SPWOOD offers a practical, scalable solution for oriented object detection when annotation resources are scarce, making it highly relevant for large‑scale remote‑sensing applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment