Route-DETR: Pairwise Query Routing in Transformers for Object Detection
Detection Transformer (DETR) offers an end-to-end solution for object detection by eliminating hand-crafted components like non-maximum suppression. However, DETR suffers from inefficient query competition where multiple queries converge to similar positions, leading to redundant computations. We present Route-DETR, which addresses these issues through adaptive pairwise routing in decoder self-attention layers. Our key insight is distinguishing between competing queries (targeting the same object) versus complementary queries (targeting different objects) using inter-query similarity, confidence scores, and geometry. We introduce dual routing mechanisms: suppressor routes that modulate attention between competing queries to reduce duplication, and delegator routes that encourage exploration of different regions. These are implemented via learnable low-rank attention biases enabling asymmetric query interactions. A dual-branch training strategy incorporates routing biases only during training while preserving standard attention for inference, ensuring no additional computational cost. Experiments on COCO and Cityscapes demonstrate consistent improvements across multiple DETR baselines, achieving +1.7% mAP gain over DINO on ResNet-50 and reaching 57.6% mAP on Swin-L, surpassing prior state-of-the-art models.
💡 Research Summary
The paper “Route-DETR: Pairwise Query Routing in Transformers for Object Detection” presents a novel solution to a fundamental inefficiency in Detection Transformer (DETR) models. While DETR revolutionized object detection by offering an end-to-end pipeline that eliminates hand-crafted components like Non-Maximum Suppression (NMS), it suffers from “inefficient query competition.” During decoding, multiple object queries often converge to similar positions targeting the same object. Due to the one-to-one label assignment rule, only one query ultimately detects the object, rendering the computations of other converging queries redundant and wasteful.
Route-DETR addresses this core issue by introducing adaptive, pairwise routing mechanisms within the decoder’s self-attention layers. The key insight is to explicitly model and manage the relationships between queries, distinguishing between “competing” queries (targeting the same object) and “complementary” queries (targeting different objects). This is achieved through two learnable, low-rank attention biases: 1) Suppressor Routes, which apply a negative bias to attenuate attention between competing queries, thereby reducing duplication and wasted effort; and 2) Delegator Routes, which apply a positive bias to enhance attention between complementary queries, encouraging them to explore and cover different regions of the image.
The technical implementation is elegant and efficient. For each decoder layer, a compact routing representation is derived for each query by combining its feature and positional encoding. Two separate low-rank matrices (U*V^T) are generated from these representations to serve as the suppressor and delegator bias templates, respectively. A “Competing-Aware Gating” module, which considers inter-query similarity, prediction confidence, and geometric cues, dynamically computes pairwise gating scores. These scores determine the blend of suppressor and delegator biases applied to each query pair in the attention logits, creating an asymmetric interaction matrix (B) that breaks the symmetry of standard self-attention.
A critical design choice is the Dual-Branch Training Strategy. The model maintains a main branch with standard self-attention and an auxiliary branch that incorporates the proposed routing biases. The total training loss is a weighted sum of losses from both branches. However, during inference, the auxiliary branch is entirely discarded, and only the main branch is used. This ensures that the model benefits from the specialized query representations learned via routing during training without incurring any additional computational cost or architectural complexity at inference time.
Extensive experiments validate the effectiveness and generality of Route-DETR. When integrated into strong baselines like Deformable-DETR++, DAB-Def-DETR++, and DINO, it delivers consistent performance gains on the COCO 2017 object detection benchmark. Notably, it achieves a +1.7% mAP improvement over DINO using a ResNet-50 backbone and reaches 57.6% mAP with a Swin-L backbone, surpassing prior state-of-the-art methods. Ablation studies confirm that both suppressor and delegator routes contribute positively, with their combination yielding synergistic benefits. The method also generalizes well to instance segmentation tasks on both COCO and Cityscapes datasets, showing significant improvements in both mask and box AP metrics.
In summary, Route-DETR provides a principled and efficient mechanism to mitigate inefficient query competition in DETR-like models. By enabling adaptive, relationship-aware interactions between queries through learnable routing biases applied only during training, it enhances model performance and training efficiency while preserving the appealing inference-time simplicity of the original DETR framework.
Comments & Academic Discussion
Loading comments...
Leave a Comment