Joint Inverse Learning of Cognitive Radar Perception and Perception-Action Policy
Cognitive Radars (CRs) employ perception-action cycle to adapt their sensing and transmission strategies based on its’ perception of the target kinematic states and mission objectives. This paper considers an inverse learning Electronic Counter Measure (ECM) that infers both the perception and perception-driven action policy of the adversarial CR’s from the actions of the CR, i.e. the sensing and transmission actions taken by the CR. Existing frameworks, in the literature, assume the knowledge of either the perception or the perception-action policy and infer the other. However, this assumption is unrealistic in an adversarial setting. We address this gap by proposing an online, nonparametric Bayesian machine learning framework and developing the Inverse Particle Filter with Dependent Dirichlet Process (IPFDDP) algorithm, which characterizes the perception-dependent action policy using a Dependent Dirichlet Process (DDP) and embeds kernel-based DDP inference within a Bayesian inverse particle filtering framework to jointly estimate the CR’s perception and perception-action policy. Extensive numerical simulations demonstrate that IPFDDP outperforms existing inverse learning methods in terms of mean squared error, Kullback-Leibler divergence between the estimated and true policy, and accuracy in identifying relative action preferences. Unlike the existing techniques, the proposed Bayesian formulation naturally quantifies uncertainty in inferred perception and perception-action policy, enabling active probing strategies for sample efficient inverse learning. Simulation results show that active probing integrated with IPFDDP achieves, on average, a 40% faster reduction in KL divergence compared to randomized probing.
💡 Research Summary
The paper tackles the challenging problem of jointly inferring an adversarial cognitive radar’s internal belief (posterior estimate of the target state) and its belief‑dependent action policy solely from observed target trajectories and the radar’s actions. Existing inverse‑learning approaches either assume the radar’s perception is known and recover the policy, or assume the policy is known and recover the perception. Both assumptions are unrealistic in electronic warfare where the radar’s internal belief is hidden and its policy is adaptive.
To address this, the authors propose an online, non‑parametric Bayesian framework called Inverse Particle Filter with Dependent Dirichlet Process (IPFDDP). The key idea is to model the stochastic policy (G_{\pi,a}=p(a|\pi)) as a Dependent Dirichlet Process (DDP). The DDP places an infinite mixture of kernels over the belief space, allowing the policy to vary smoothly with the belief without any parametric form. Kernel similarity between beliefs drives clustering, and the process evolves over time as new observations arrive.
The IPFDDP algorithm embeds this DDP inference inside a Bayesian inverse particle filter. A set of particles ({(\pi^{(i)}{0:k},y^{(i)}{1:k})}{i=1}^N) represents joint samples of the hidden belief trajectory and the unobserved measurement sequence. Importance weights are computed using the exact posterior ratio, and an optimal proposal density that incorporates the current DDP posterior is employed to minimise variance. At each time step the filter: (1) propagates particles through the forward radar dynamics (motion, measurement, belief update), (2) samples a new action from the DDP‑derived policy, (3) updates weights, and (4) resamples. This recursive scheme yields a sequential approximation of the joint posterior (p(\pi{0:k},G_{\pi,a}\mid x_{0:k},a_{1:k})).
A major advantage of the Bayesian formulation is that it naturally quantifies uncertainty in both the belief estimate and the policy. The posterior distribution over DDP atoms provides confidence intervals for action probabilities, which the authors exploit to design an active probing strategy. By deliberately maneuvering the target (e.g., adjusting acceleration) the learner can steer the radar into belief regions that are most informative, thereby maximising expected information gain. Simulations show that this active probing reduces the Kullback‑Leibler (KL) divergence between the estimated and true policy roughly 40 % faster than a random probing baseline.
Extensive numerical experiments compare IPFDDP against state‑of‑the‑art inverse filtering methods (IEKF, IUKF, standard IPF) and inverse reinforcement‑learning approaches. Metrics include mean‑squared error of the inferred belief, KL divergence of the learned policy, and accuracy in recovering relative action preferences among three waveform choices (LFM, PFM, HFM). Across all metrics IPFDDP outperforms the baselines, especially in scenarios with multimodal, stochastic policies where parametric models fail. The non‑parametric DDP automatically adapts its complexity to the data, avoiding over‑ or under‑fitting.
In summary, the paper introduces the first framework capable of jointly estimating both perception and perception‑driven policy of a cognitive radar in an adversarial setting. By marrying dependent Dirichlet processes with inverse particle filtering, IPFDDP achieves high‑fidelity inference, principled uncertainty quantification, and sample‑efficient active probing, offering a powerful tool for electronic counter‑measure design and future cognitive radar research.
Comments & Academic Discussion
Loading comments...
Leave a Comment