Scalable Cooperative Decision-Making in Multi-UAV Confrontations: An Attention-Based Multi-Agent Actor-Critic Approach

Can Chen; Tao Song; Li Mo; Maolong Lv; Yinan Yu

doi:10.1109/TAES.2025.3571405

Scalable Cooperative Decision-Making in Multi-UAV Confrontations: An Attention-Based Multi-Agent Actor-Critic Approach

Can Chen, Tao Song, Li Mo^*, Maolong Lv, Yinan Yu

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

With the increasing use of unmanned aerial vehicles (UAVs) in military operations, autonomous cooperative decision-making for multiple UAVs in aerial confrontations has become a critical research challenge. This paper presents an attention-based multi-agent actor-critic (AMAAC) algorithm for UAV aerial confrontation decision-making. The algorithm combines multi-head attention and self-play within the centralized training-distributed execution (CTDE) framework, extending the actor-critic approach based on the missile hit probability prediction model (MHPAC) to multi-UAV scenarios. A fighter observation encoder (OFE) and a centralized critic network based on the attention mechanism are introduced to adapt to varying number of UAVs (different scales) and enhance training performance. Additionally, self-play-based extended training is used to generalize offensive and defensive strategies from small-scale aerial confrontations to larger scenarios. Experimental results demonstrate that the AMAAC algorithm achieves superior training effectiveness, and the strategies it produces perform well across various confrontation scales, even beyond the training scenario's scale. Compared to other decision-making algorithms, such as Multi-agent Proximal Policy Optimization (MAPPO), Multi-agent Hierarchical Policy Gradient (MAHPG), and the State-Event-Condition-Action (SECA) algorithm, the AMAAC-trained strategies yield higher win ratios and kill-death ratios in different scenarios, validating the algorithm's effectiveness and scalability.

Original language	English
Journal	IEEE Transactions on Aerospace and Electronic Systems
DOIs	http://doi.org/10.1109/TAES.2025.3571405
Publication status	Accepted/In press - 2025
Externally published	Yes

Keywords

Aerial confrontation
attention mechanism
reinforcement learning
scalability
unmanned aerial vehicles

Access to Document

10.1109/TAES.2025.3571405

Cite this

@article{47a5ecabbfbb44938b0940453a141f55,

title = "Scalable Cooperative Decision-Making in Multi-UAV Confrontations: An Attention-Based Multi-Agent Actor-Critic Approach",

abstract = "With the increasing use of unmanned aerial vehicles (UAVs) in military operations, autonomous cooperative decision-making for multiple UAVs in aerial confrontations has become a critical research challenge. This paper presents an attention-based multi-agent actor-critic (AMAAC) algorithm for UAV aerial confrontation decision-making. The algorithm combines multi-head attention and self-play within the centralized training-distributed execution (CTDE) framework, extending the actor-critic approach based on the missile hit probability prediction model (MHPAC) to multi-UAV scenarios. A fighter observation encoder (OFE) and a centralized critic network based on the attention mechanism are introduced to adapt to varying number of UAVs (different scales) and enhance training performance. Additionally, self-play-based extended training is used to generalize offensive and defensive strategies from small-scale aerial confrontations to larger scenarios. Experimental results demonstrate that the AMAAC algorithm achieves superior training effectiveness, and the strategies it produces perform well across various confrontation scales, even beyond the training scenario's scale. Compared to other decision-making algorithms, such as Multi-agent Proximal Policy Optimization (MAPPO), Multi-agent Hierarchical Policy Gradient (MAHPG), and the State-Event-Condition-Action (SECA) algorithm, the AMAAC-trained strategies yield higher win ratios and kill-death ratios in different scenarios, validating the algorithm's effectiveness and scalability.",

keywords = "Aerial confrontation, attention mechanism, reinforcement learning, scalability, unmanned aerial vehicles",

author = "Can Chen and Tao Song and Li Mo and Maolong Lv and Yinan Yu",

note = "Publisher Copyright: {\textcopyright} 1965-2011 IEEE.",

year = "2025",

doi = "10.1109/TAES.2025.3571405",

language = "English",

journal = "IEEE Transactions on Aerospace and Electronic Systems",

issn = "0018-9251",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Scalable Cooperative Decision-Making in Multi-UAV Confrontations

T2 - An Attention-Based Multi-Agent Actor-Critic Approach

AU - Chen, Can

AU - Song, Tao

AU - Mo, Li

AU - Lv, Maolong

AU - Yu, Yinan

PY - 2025

Y1 - 2025

N2 - With the increasing use of unmanned aerial vehicles (UAVs) in military operations, autonomous cooperative decision-making for multiple UAVs in aerial confrontations has become a critical research challenge. This paper presents an attention-based multi-agent actor-critic (AMAAC) algorithm for UAV aerial confrontation decision-making. The algorithm combines multi-head attention and self-play within the centralized training-distributed execution (CTDE) framework, extending the actor-critic approach based on the missile hit probability prediction model (MHPAC) to multi-UAV scenarios. A fighter observation encoder (OFE) and a centralized critic network based on the attention mechanism are introduced to adapt to varying number of UAVs (different scales) and enhance training performance. Additionally, self-play-based extended training is used to generalize offensive and defensive strategies from small-scale aerial confrontations to larger scenarios. Experimental results demonstrate that the AMAAC algorithm achieves superior training effectiveness, and the strategies it produces perform well across various confrontation scales, even beyond the training scenario's scale. Compared to other decision-making algorithms, such as Multi-agent Proximal Policy Optimization (MAPPO), Multi-agent Hierarchical Policy Gradient (MAHPG), and the State-Event-Condition-Action (SECA) algorithm, the AMAAC-trained strategies yield higher win ratios and kill-death ratios in different scenarios, validating the algorithm's effectiveness and scalability.

AB - With the increasing use of unmanned aerial vehicles (UAVs) in military operations, autonomous cooperative decision-making for multiple UAVs in aerial confrontations has become a critical research challenge. This paper presents an attention-based multi-agent actor-critic (AMAAC) algorithm for UAV aerial confrontation decision-making. The algorithm combines multi-head attention and self-play within the centralized training-distributed execution (CTDE) framework, extending the actor-critic approach based on the missile hit probability prediction model (MHPAC) to multi-UAV scenarios. A fighter observation encoder (OFE) and a centralized critic network based on the attention mechanism are introduced to adapt to varying number of UAVs (different scales) and enhance training performance. Additionally, self-play-based extended training is used to generalize offensive and defensive strategies from small-scale aerial confrontations to larger scenarios. Experimental results demonstrate that the AMAAC algorithm achieves superior training effectiveness, and the strategies it produces perform well across various confrontation scales, even beyond the training scenario's scale. Compared to other decision-making algorithms, such as Multi-agent Proximal Policy Optimization (MAPPO), Multi-agent Hierarchical Policy Gradient (MAHPG), and the State-Event-Condition-Action (SECA) algorithm, the AMAAC-trained strategies yield higher win ratios and kill-death ratios in different scenarios, validating the algorithm's effectiveness and scalability.

KW - Aerial confrontation

KW - attention mechanism

KW - reinforcement learning

KW - scalability

KW - unmanned aerial vehicles

UR - http://www.scopus.com/pages/publications/105005781612

U2 - 10.1109/TAES.2025.3571405

DO - 10.1109/TAES.2025.3571405

M3 - Article

AN - SCOPUS:105005781612

SN - 0018-9251

JO - IEEE Transactions on Aerospace and Electronic Systems

JF - IEEE Transactions on Aerospace and Electronic Systems

ER -

Scalable Cooperative Decision-Making in Multi-UAV Confrontations: An Attention-Based Multi-Agent Actor-Critic Approach

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this