Deep Reinforcement Learning for EH-Enabled Cognitive-IoT Under Jamming Attacks

Deep Reinforcement Learning for EH-Enabled Cognitive-IoT Under Jamming Attacks
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In the evolving landscape of the Internet of Things (IoT), integrating cognitive radio (CR) has become a practical solution to address the challenge of spectrum scarcity, leading to the development of cognitive IoT (CIoT). However, the vulnerability of radio communications makes radio jamming attacks a key concern in CIoT networks. In this paper, we introduce a novel deep reinforcement learning (DRL) approach designed to optimize throughput and extend network lifetime of an energy-constrained CIoT system under jamming attacks. This DRL framework equips a CIoT device with the autonomy to manage energy harvesting (EH) and data transmission, while also regulating its transmit power to respect spectrum-sharing constraints. We formulate the optimization problem under various constraints, and we model the CIoT device’s interactions within the channel as a model-free Markov decision process (MDP). The MDP serves as a foundation to develop a double deep Q-network (DDQN), designed to help the CIoT agent learn the optimal communication policy to navigate challenges such as dynamic channel occupancy, jamming attacks, and channel fading while achieving its goal. Additionally, we introduce a variant of the upper confidence bound (UCB) algorithm, named UCB-IA, which enhances the CIoT network’s ability to efficiently navigate jamming attacks within the channel. The proposed DRL algorithm does not rely on prior knowledge and uses locally observable information such as channel occupancy, jamming activity, channel gain, and energy arrival to make decisions. Extensive simulations prove that our proposed DRL algorithm that utilizes the UCB-IA strategy surpasses existing benchmarks, allowing for a more adaptive, energy-efficient, and secure spectrum sharing in CIoT networks.


💡 Research Summary

The paper addresses the problem of secure and energy‑efficient communication for a battery‑powered cognitive Internet‑of‑Things (CIoT) device that operates underlay with a primary user (PU) and is exposed to hostile jamming attacks. The authors propose a novel deep reinforcement learning (DRL) framework that simultaneously decides whether to harvest RF energy or transmit data, and, in the case of transmission, selects an appropriate power level that respects the interference constraint imposed by the PU.

A model‑free Markov decision process (MDP) is formulated where the state comprises locally observable variables: PU activity, jamming activity, channel gains (CIoT‑to‑CIoT, CIoT‑to‑PU, PU‑to‑CIoT), battery level, and harvested energy arrival. The action space consists of a binary decision (transmit vs. harvest) together with a discrete set of transmit powers. The reward function balances throughput (successful data delivery) and harvested energy, penalizing actions that lead to collisions with the jammer or violate the PU interference threshold.

To solve the MDP, the authors design a Double Deep Q‑Network (DDQN). Two neural networks—online and target—are employed to mitigate the over‑estimation bias of standard DQN, and an experience replay buffer is used to decorrelate samples. In addition, they introduce a customized Upper Confidence Bound algorithm called UCB‑IA (Interference‑Aware). UCB‑IA computes an optimism‑adjusted value for each action, encouraging exploration of less‑jammed channels while still exploiting known good actions, thereby accelerating convergence in highly adversarial environments.

The system model assumes time‑slotted operation. In each slot the CIoT device first makes a decision (energy harvesting or data transmission) and then executes it. When transmitting, the device must satisfy (P_{t}^{s} g_{t}^{sp} \le I_{th}), where (g_{t}^{sp}) is the channel gain to the PU receiver and (I_{th}) is the interference threshold. The PU occupies a random subset of slots, and the jammer follows a stochastic on/off pattern, degrading the signal‑to‑noise ratio when active. Energy harvesting is modeled as stochastic RF energy arrival from ambient transmissions.

The optimization problem is to maximize the long‑term average throughput subject to the interference constraint, battery capacity, and energy causality. By embedding this problem into the MDP, the DRL agent learns a policy that dynamically balances transmission and harvesting, adapts transmit power to channel conditions, and avoids jammed frequencies.

Simulation results are presented for a scenario with five channels, one jammer, and a PU that transmits with a fixed probability. The proposed DDQN + UCB‑IA scheme is benchmarked against (i) a vanilla DDQN without UCB, (ii) classic Q‑learning, (iii) game‑theoretic anti‑jamming strategies, and (iv) existing energy‑aware DRL methods. Performance metrics include average sum‑rate, average reward, jammer interference ratio, and battery depletion rate. The proposed method achieves 15–30 % higher sum‑rate, reduces the jammer interference ratio by up to 30 %, and maintains lower battery consumption. The UCB‑IA component notably speeds up convergence, cutting the number of training episodes by roughly half compared with plain DDQN.

The authors acknowledge limitations: the study focuses on a single CIoT node and a single jammer, while real deployments involve multiple devices and coordinated jammers. The channel model assumes i.i.d. Rayleigh fading, which may not capture shadowing or correlated fading in practice. Future work is outlined to extend the approach to multi‑agent reinforcement learning, federated learning for privacy‑preserving policy sharing, and hardware‑in‑the‑loop experiments with real RF jammers to validate the algorithm under realistic conditions.


Comments & Academic Discussion

Loading comments...

Leave a Comment