TY - JOUR
T1 - Multi-Agent Global Prioritized Experience Learning for UAV Cooperative Jamming in Secure Communication
AU - Wang, Saier
AU - Zhang, Yan
AU - Chen, Mingyu
AU - Zhang, Wancheng
AU - He, Zunwen
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2025
Y1 - 2025
N2 - In uncrewed aerial vehicle (UAV) communication networks, the line-of-sight (LoS) propagation link makes the communication information vulnerable to being wiretapped by ground eavesdroppers (GEs). This paper focuses on the maximization of the average secrecy rate with multiple UAV jammers helping multiple UAV transmitters to defend against GEs. We propose a multi-agent global prioritized experience learning (MAGPEL) algorithm. The allocation of UAV transmitters’ sub-channels, locations, and power levels, along with the allocation of UAV jammers’ locations and power levels are jointly optimized. Each UAV takes the role of an agent and uses the global information for training, which comprises specifics on the states and actions of all UAVs. Besides, temporal difference error (TD-error) is used to measure the significance of the experience and calculate the probability that the experience is sampled. Experiences of greater significance can be extracted with a higher probability for training. Simulation results show that the proposed algorithm has better convergence performance and a higher secrecy rate compared with other state-of-the-art methods.
AB - In uncrewed aerial vehicle (UAV) communication networks, the line-of-sight (LoS) propagation link makes the communication information vulnerable to being wiretapped by ground eavesdroppers (GEs). This paper focuses on the maximization of the average secrecy rate with multiple UAV jammers helping multiple UAV transmitters to defend against GEs. We propose a multi-agent global prioritized experience learning (MAGPEL) algorithm. The allocation of UAV transmitters’ sub-channels, locations, and power levels, along with the allocation of UAV jammers’ locations and power levels are jointly optimized. Each UAV takes the role of an agent and uses the global information for training, which comprises specifics on the states and actions of all UAVs. Besides, temporal difference error (TD-error) is used to measure the significance of the experience and calculate the probability that the experience is sampled. Experiences of greater significance can be extracted with a higher probability for training. Simulation results show that the proposed algorithm has better convergence performance and a higher secrecy rate compared with other state-of-the-art methods.
KW - Location deployment
KW - multi-agent deep reinforcement learning
KW - physical layer security
KW - resource allocation
KW - uncrewed aerial vehicle
UR - http://www.scopus.com/pages/publications/105011856378
U2 - 10.1109/TSIPN.2025.3592341
DO - 10.1109/TSIPN.2025.3592341
M3 - Article
AN - SCOPUS:105011856378
SN - 2373-776X
VL - 11
SP - 916
EP - 927
JO - IEEE Transactions on Signal and Information Processing over Networks
JF - IEEE Transactions on Signal and Information Processing over Networks
ER -