TY - JOUR
T1 - Scalable Cooperative Decision-Making in Multi-UAV Confrontations
T2 - An Attention-Based Multi-Agent Actor-Critic Approach
AU - Chen, Can
AU - Song, Tao
AU - Mo, Li
AU - Lv, Maolong
AU - Yu, Yinan
N1 - Publisher Copyright:
© 1965-2011 IEEE.
PY - 2025
Y1 - 2025
N2 - With the increasing use of unmanned aerial vehicles (UAVs) in military operations, autonomous cooperative decision-making for multiple UAVs in aerial confrontations has become a critical research challenge. This paper presents an attention-based multi-agent actor-critic (AMAAC) algorithm for UAV aerial confrontation decision-making. The algorithm combines multi-head attention and self-play within the centralized training-distributed execution (CTDE) framework, extending the actor-critic approach based on the missile hit probability prediction model (MHPAC) to multi-UAV scenarios. A fighter observation encoder (OFE) and a centralized critic network based on the attention mechanism are introduced to adapt to varying number of UAVs (different scales) and enhance training performance. Additionally, self-play-based extended training is used to generalize offensive and defensive strategies from small-scale aerial confrontations to larger scenarios. Experimental results demonstrate that the AMAAC algorithm achieves superior training effectiveness, and the strategies it produces perform well across various confrontation scales, even beyond the training scenario's scale. Compared to other decision-making algorithms, such as Multi-agent Proximal Policy Optimization (MAPPO), Multi-agent Hierarchical Policy Gradient (MAHPG), and the State-Event-Condition-Action (SECA) algorithm, the AMAAC-trained strategies yield higher win ratios and kill-death ratios in different scenarios, validating the algorithm's effectiveness and scalability.
AB - With the increasing use of unmanned aerial vehicles (UAVs) in military operations, autonomous cooperative decision-making for multiple UAVs in aerial confrontations has become a critical research challenge. This paper presents an attention-based multi-agent actor-critic (AMAAC) algorithm for UAV aerial confrontation decision-making. The algorithm combines multi-head attention and self-play within the centralized training-distributed execution (CTDE) framework, extending the actor-critic approach based on the missile hit probability prediction model (MHPAC) to multi-UAV scenarios. A fighter observation encoder (OFE) and a centralized critic network based on the attention mechanism are introduced to adapt to varying number of UAVs (different scales) and enhance training performance. Additionally, self-play-based extended training is used to generalize offensive and defensive strategies from small-scale aerial confrontations to larger scenarios. Experimental results demonstrate that the AMAAC algorithm achieves superior training effectiveness, and the strategies it produces perform well across various confrontation scales, even beyond the training scenario's scale. Compared to other decision-making algorithms, such as Multi-agent Proximal Policy Optimization (MAPPO), Multi-agent Hierarchical Policy Gradient (MAHPG), and the State-Event-Condition-Action (SECA) algorithm, the AMAAC-trained strategies yield higher win ratios and kill-death ratios in different scenarios, validating the algorithm's effectiveness and scalability.
KW - Aerial confrontation
KW - attention mechanism
KW - reinforcement learning
KW - scalability
KW - unmanned aerial vehicles
UR - http://www.scopus.com/pages/publications/105005781612
U2 - 10.1109/TAES.2025.3571405
DO - 10.1109/TAES.2025.3571405
M3 - Article
AN - SCOPUS:105005781612
SN - 0018-9251
JO - IEEE Transactions on Aerospace and Electronic Systems
JF - IEEE Transactions on Aerospace and Electronic Systems
ER -