Position-Aware Self-supervised Representation Learning for Cross-mode Radar Signal Recognition
Radar signal recognition in open electromagnetic environments is challenging due to diverse operating modes and unseen radar types. Existing methods often overlook position relations in pulse sequences, limiting their ability to capture semantic dependencies over time. We propose RadarPos, a position-aware self-supervised framework that leverages pulse-level temporal dynamics without complex augmentations or masking, providing improved position relation modeling over contrastive learning or masked reconstruction. Using this framework, we evaluate cross-mode radar signal recognition under the long-tailed setting to assess adaptability and generalization. Experimental results demonstrate enhanced discriminability and robustness, highlighting practical applicability in real-world electromagnetic environments.
💡 Research Summary
The paper addresses the problem of radar signal recognition in open electromagnetic environments, where a wide variety of operating modes and previously unseen radar types coexist. Existing approaches either assume a closed‑world setting or rely on self‑supervised learning (SSL) techniques such as contrastive learning or masked autoencoders that focus mainly on intra‑pulse features, neglecting the temporal dependencies between pulses that carry crucial modulation information (e.g., PRI variation, jitter). To fill this gap, the authors propose RadarPos, a position‑aware self‑supervised framework that directly predicts the temporal position of each pulse in a sequence.
RadarPos replaces the conventional sinusoidal positional encoding with a Time‑of‑Arrival (TOA)‑based encoding: for pulse i, the embedding p_i is computed using sin/cos functions of the TOA value, thereby preserving absolute and relative timing. During pre‑training, a random subset of these positional embeddings is replaced by a learnable mask token (mask‑position strategy). The model receives a token sequence consisting of a class token, visible pulse tokens, and the masked/visible positional embeddings, which are processed by a Transformer encoder‑decoder. An MLP projector then outputs an N×N matrix of position logits; a cross‑entropy loss (L_pos) is applied only to the masked positions.
Two auxiliary mechanisms improve robustness. First, positional smoothing replaces the one‑hot target with a Gaussian‑weighted distribution over neighboring positions, controlled by a smoothing hyper‑parameter σ, reducing sensitivity to exact position prediction errors. Second, an attention‑based reconstruction term multiplies the loss by a similarity weight C_i derived from the cosine similarity between the class token and each visible token, encouraging the model to use global context when inferring masked positions.
The experimental protocol consists of a large‑scale pre‑training phase on 3.75 million simulated and public radar pulses (RadChar) and a downstream fine‑tuning phase on a long‑tailed dataset containing three emitter models (m0, m1, m2) each with three operating modes (VS, TAS, STT). The fine‑tuning data follow a 100:50:25:15:10:5:1 class‑ratio split, creating a realistic imbalance. Six cross‑mode transfer scenarios (e.g., m0→m1, m0→m2) are evaluated using accuracy and F1‑score. Baselines include ResNet‑18, a vanilla Transformer, domain‑generalization methods (IBN, SNR, MixStyle, DSU), and state‑of‑the‑art SSL methods (SimCLR, MoCo, DIRA, TimeMAE).
Results show that RadarPos consistently outperforms the baselines, especially in the hardest transfers (m0→m1 and m0→m2), where it improves accuracy by about 1.5 percentage points over MoCo and yields higher F1 scores. Ablation studies confirm that the TOA‑based positional encoding contributes roughly a 1 percentage‑point gain, and that the optimal smoothing σ is 0.9 while the best LoRA rank for fine‑tuning is 8, balancing performance and parameter efficiency.
In summary, RadarPos demonstrates that (1) modeling inter‑pulse temporal relations via position prediction yields more discriminative and domain‑robust representations, (2) a simple mask‑position pre‑text avoids the need for heavy data augmentations, and (3) the introduced smoothing and attention‑based reconstruction mechanisms effectively mitigate ambiguity among similar pulses. The framework thus offers a practical solution for radar signal recognition in dynamic, open‑world scenarios, paving the way for more resilient electronic intelligence systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment