Autonomous Dogfight Decision-Making for Air Combat Based on Reinforcement Learning with Automatic Opponent Sampling

Can Chen; Tao Song; Li Mo; Maolong Lv; Defu Lin

doi:10.3390/aerospace12030265

Autonomous Dogfight Decision-Making for Air Combat Based on Reinforcement Learning with Automatic Opponent Sampling

Can Chen, Tao Song, Li Mo^*, Maolong Lv, Defu Lin

^*此作品的通讯作者

空天科学与技术学院

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

The field of autonomous air combat has witnessed a surge in interest propelled by the rapid progress of artificial intelligence technology. A persistent challenge within this domain pertains to autonomous decision-making for dogfighting, especially when dealing with intricate, high-fidelity nonlinear aircraft dynamic models and insufficient information. In response to this challenge, this paper introduces reinforcement learning (RL) to train maneuvering strategies. In the context of RL for dogfighting, the method by which opponents are sampled assumes significance in determining the efficacy of training. Consequently, this paper proposes a novel automatic opponent sampling (AOS)-based RL framework where proximal policy optimization (PPO) is applied. This approach encompasses three pivotal components: a phased opponent policy pool with simulated annealing (SA)-inspired curriculum learning, an SA-inspired Boltzmann Meta-Solver, and a Gate Function based on the sliding window. The training outcomes demonstrate that this improved PPO algorithm with an AOS framework outperforms existing reinforcement learning methods such as the soft actor–critic (SAC) algorithm and the PPO algorithm with prioritized fictitious self-play (PFSP). Moreover, during testing scenarios, the trained maneuvering policy displays remarkable adaptability when confronted with a diverse array of opponents. This research signifies a substantial stride towards the realization of robust autonomous maneuvering decision systems in the context of modern air combat.

源语言	英语
文章编号	265
期刊	Aerospace
卷	12
期	3
DOI	http://doi.org/10.3390/aerospace12030265
出版状态	已出版 - 3月 2025

访问文件

10.3390/aerospace12030265

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{33815bc8f8b74e098739a11896949982,

title = "Autonomous Dogfight Decision-Making for Air Combat Based on Reinforcement Learning with Automatic Opponent Sampling",

abstract = "The field of autonomous air combat has witnessed a surge in interest propelled by the rapid progress of artificial intelligence technology. A persistent challenge within this domain pertains to autonomous decision-making for dogfighting, especially when dealing with intricate, high-fidelity nonlinear aircraft dynamic models and insufficient information. In response to this challenge, this paper introduces reinforcement learning (RL) to train maneuvering strategies. In the context of RL for dogfighting, the method by which opponents are sampled assumes significance in determining the efficacy of training. Consequently, this paper proposes a novel automatic opponent sampling (AOS)-based RL framework where proximal policy optimization (PPO) is applied. This approach encompasses three pivotal components: a phased opponent policy pool with simulated annealing (SA)-inspired curriculum learning, an SA-inspired Boltzmann Meta-Solver, and a Gate Function based on the sliding window. The training outcomes demonstrate that this improved PPO algorithm with an AOS framework outperforms existing reinforcement learning methods such as the soft actor–critic (SAC) algorithm and the PPO algorithm with prioritized fictitious self-play (PFSP). Moreover, during testing scenarios, the trained maneuvering policy displays remarkable adaptability when confronted with a diverse array of opponents. This research signifies a substantial stride towards the realization of robust autonomous maneuvering decision systems in the context of modern air combat.",

keywords = "air combat, automatic opponent sampling, autonomous decision-making, dogfight, proximal policy optimization, reinforcement learning",

author = "Can Chen and Tao Song and Li Mo and Maolong Lv and Defu Lin",

note = "Publisher Copyright: {\textcopyright} 2025 by the authors.",

year = "2025",

month = mar,

doi = "10.3390/aerospace12030265",

language = "English",

volume = "12",

journal = "Aerospace",

issn = "2226-4310",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "3",

}

TY - JOUR

T1 - Autonomous Dogfight Decision-Making for Air Combat Based on Reinforcement Learning with Automatic Opponent Sampling

AU - Chen, Can

AU - Song, Tao

AU - Mo, Li

AU - Lv, Maolong

AU - Lin, Defu

PY - 2025/3

Y1 - 2025/3

N2 - The field of autonomous air combat has witnessed a surge in interest propelled by the rapid progress of artificial intelligence technology. A persistent challenge within this domain pertains to autonomous decision-making for dogfighting, especially when dealing with intricate, high-fidelity nonlinear aircraft dynamic models and insufficient information. In response to this challenge, this paper introduces reinforcement learning (RL) to train maneuvering strategies. In the context of RL for dogfighting, the method by which opponents are sampled assumes significance in determining the efficacy of training. Consequently, this paper proposes a novel automatic opponent sampling (AOS)-based RL framework where proximal policy optimization (PPO) is applied. This approach encompasses three pivotal components: a phased opponent policy pool with simulated annealing (SA)-inspired curriculum learning, an SA-inspired Boltzmann Meta-Solver, and a Gate Function based on the sliding window. The training outcomes demonstrate that this improved PPO algorithm with an AOS framework outperforms existing reinforcement learning methods such as the soft actor–critic (SAC) algorithm and the PPO algorithm with prioritized fictitious self-play (PFSP). Moreover, during testing scenarios, the trained maneuvering policy displays remarkable adaptability when confronted with a diverse array of opponents. This research signifies a substantial stride towards the realization of robust autonomous maneuvering decision systems in the context of modern air combat.

AB - The field of autonomous air combat has witnessed a surge in interest propelled by the rapid progress of artificial intelligence technology. A persistent challenge within this domain pertains to autonomous decision-making for dogfighting, especially when dealing with intricate, high-fidelity nonlinear aircraft dynamic models and insufficient information. In response to this challenge, this paper introduces reinforcement learning (RL) to train maneuvering strategies. In the context of RL for dogfighting, the method by which opponents are sampled assumes significance in determining the efficacy of training. Consequently, this paper proposes a novel automatic opponent sampling (AOS)-based RL framework where proximal policy optimization (PPO) is applied. This approach encompasses three pivotal components: a phased opponent policy pool with simulated annealing (SA)-inspired curriculum learning, an SA-inspired Boltzmann Meta-Solver, and a Gate Function based on the sliding window. The training outcomes demonstrate that this improved PPO algorithm with an AOS framework outperforms existing reinforcement learning methods such as the soft actor–critic (SAC) algorithm and the PPO algorithm with prioritized fictitious self-play (PFSP). Moreover, during testing scenarios, the trained maneuvering policy displays remarkable adaptability when confronted with a diverse array of opponents. This research signifies a substantial stride towards the realization of robust autonomous maneuvering decision systems in the context of modern air combat.

KW - air combat

KW - automatic opponent sampling

KW - autonomous decision-making

KW - dogfight

KW - proximal policy optimization

KW - reinforcement learning

UR - http://www.scopus.com/pages/publications/105001130613

U2 - 10.3390/aerospace12030265

DO - 10.3390/aerospace12030265

M3 - Article

AN - SCOPUS:105001130613

SN - 2226-4310

VL - 12

JO - Aerospace

JF - Aerospace

IS - 3

M1 - 265

ER -

Autonomous Dogfight Decision-Making for Air Combat Based on Reinforcement Learning with Automatic Opponent Sampling

摘要

访问文件

其它文件与链接

指纹

引用此