An improved elitist-Q-Learning path planning strategy for VTOL air-ground vehicle using convolutional neural network mode prediction

Jing Zhao; Chao Yang; Weida Wang; Ying Li; Tianqi Qie; Bin Xu

doi:10.1016/j.aei.2025.103316

An improved elitist-Q-Learning path planning strategy for VTOL air-ground vehicle using convolutional neural network mode prediction

Jing Zhao, Chao Yang^*, Weida Wang, Ying Li, Tianqi Qie, Bin Xu

^*此作品的通讯作者

机械与车辆学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Vertical take-off and landing (VTOL) air-ground integrated vehicles have received extensive attention in rescue, transportation, and other task fields. To further improve the task efficiency in complex environments such as post-disaster cities and scrubland, this vehicle requires efficient and rational path planning. In above environments, it is difficult to obtain complete and accurate obstacle information. The planning process faces the technical difficulties of using the limited obstacle perception information to switch air-ground modes and fast acquire the optimal planning trajectory with the shortest distance. To address the above issues, this paper proposes an improved elitist-Q-Learning path planning strategy for the VTOL air-ground vehicle using convolutional neural network mode prediction. Firstly, to predict the mode switching actions, a convolutional neural mode prediction network is constructed with local obstacle information as input data. Secondly, based on the above predicted actions, an elitist-Q-Learning (EQL) multi-mode planning algorithm is designed. A new reward function considering the multi-mode actions is proposed. On this basis, heuristic correction and elitist adjusting factors replace the fixed rewards of traditional Q-Learning with dynamically adjusted rewards during the iterative process. The Q table is quickly updated to converge to optimal values. Finally, this proposed strategy is verified in randomly generated maps of 1000 m*1000 m. Results show that the prediction accuracy can be maintained over 93 %. Its path distance is reduced by 4.56 % and 1.75 % compared to that of traditional Q-Learning and A* with mode prediction, respectively. It has the same path distance as BAS-A*, LPA*, and D* Lite with mode prediction. Compared to traditional Q-Learning, it reduces computational time by 36.61 %. When converged, its iterative numbers are 58.9 % less than those of traditional Q-Learning.

源语言	英语
文章编号	103316
期刊	Advanced Engineering Informatics
卷	65
DOI	http://doi.org/10.1016/j.aei.2025.103316
出版状态	已出版 - 5月 2025

联合国可持续发展目标

此成果有助于实现下列可持续发展目标：

访问文件

10.1016/j.aei.2025.103316

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{44259d42c4f746ecb2a8cbfc80f093d7,

title = "An improved elitist-Q-Learning path planning strategy for VTOL air-ground vehicle using convolutional neural network mode prediction",

abstract = "Vertical take-off and landing (VTOL) air-ground integrated vehicles have received extensive attention in rescue, transportation, and other task fields. To further improve the task efficiency in complex environments such as post-disaster cities and scrubland, this vehicle requires efficient and rational path planning. In above environments, it is difficult to obtain complete and accurate obstacle information. The planning process faces the technical difficulties of using the limited obstacle perception information to switch air-ground modes and fast acquire the optimal planning trajectory with the shortest distance. To address the above issues, this paper proposes an improved elitist-Q-Learning path planning strategy for the VTOL air-ground vehicle using convolutional neural network mode prediction. Firstly, to predict the mode switching actions, a convolutional neural mode prediction network is constructed with local obstacle information as input data. Secondly, based on the above predicted actions, an elitist-Q-Learning (EQL) multi-mode planning algorithm is designed. A new reward function considering the multi-mode actions is proposed. On this basis, heuristic correction and elitist adjusting factors replace the fixed rewards of traditional Q-Learning with dynamically adjusted rewards during the iterative process. The Q table is quickly updated to converge to optimal values. Finally, this proposed strategy is verified in randomly generated maps of 1000 m*1000 m. Results show that the prediction accuracy can be maintained over 93 \%. Its path distance is reduced by 4.56 \% and 1.75 \% compared to that of traditional Q-Learning and A* with mode prediction, respectively. It has the same path distance as BAS-A*, LPA*, and D* Lite with mode prediction. Compared to traditional Q-Learning, it reduces computational time by 36.61 \%. When converged, its iterative numbers are 58.9 \% less than those of traditional Q-Learning.",

keywords = "Dynamic reward, Mode prediction, Path planning, VTOL air-ground vehicle, elitist-Q-Learning",

author = "Jing Zhao and Chao Yang and Weida Wang and Ying Li and Tianqi Qie and Bin Xu",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier Ltd",

year = "2025",

month = may,

doi = "10.1016/j.aei.2025.103316",

language = "English",

volume = "65",

journal = "Advanced Engineering Informatics",

issn = "1474-0346",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - An improved elitist-Q-Learning path planning strategy for VTOL air-ground vehicle using convolutional neural network mode prediction

AU - Zhao, Jing

AU - Yang, Chao

AU - Wang, Weida

AU - Li, Ying

AU - Qie, Tianqi

AU - Xu, Bin

PY - 2025/5

Y1 - 2025/5

N2 - Vertical take-off and landing (VTOL) air-ground integrated vehicles have received extensive attention in rescue, transportation, and other task fields. To further improve the task efficiency in complex environments such as post-disaster cities and scrubland, this vehicle requires efficient and rational path planning. In above environments, it is difficult to obtain complete and accurate obstacle information. The planning process faces the technical difficulties of using the limited obstacle perception information to switch air-ground modes and fast acquire the optimal planning trajectory with the shortest distance. To address the above issues, this paper proposes an improved elitist-Q-Learning path planning strategy for the VTOL air-ground vehicle using convolutional neural network mode prediction. Firstly, to predict the mode switching actions, a convolutional neural mode prediction network is constructed with local obstacle information as input data. Secondly, based on the above predicted actions, an elitist-Q-Learning (EQL) multi-mode planning algorithm is designed. A new reward function considering the multi-mode actions is proposed. On this basis, heuristic correction and elitist adjusting factors replace the fixed rewards of traditional Q-Learning with dynamically adjusted rewards during the iterative process. The Q table is quickly updated to converge to optimal values. Finally, this proposed strategy is verified in randomly generated maps of 1000 m*1000 m. Results show that the prediction accuracy can be maintained over 93 %. Its path distance is reduced by 4.56 % and 1.75 % compared to that of traditional Q-Learning and A* with mode prediction, respectively. It has the same path distance as BAS-A*, LPA*, and D* Lite with mode prediction. Compared to traditional Q-Learning, it reduces computational time by 36.61 %. When converged, its iterative numbers are 58.9 % less than those of traditional Q-Learning.

AB - Vertical take-off and landing (VTOL) air-ground integrated vehicles have received extensive attention in rescue, transportation, and other task fields. To further improve the task efficiency in complex environments such as post-disaster cities and scrubland, this vehicle requires efficient and rational path planning. In above environments, it is difficult to obtain complete and accurate obstacle information. The planning process faces the technical difficulties of using the limited obstacle perception information to switch air-ground modes and fast acquire the optimal planning trajectory with the shortest distance. To address the above issues, this paper proposes an improved elitist-Q-Learning path planning strategy for the VTOL air-ground vehicle using convolutional neural network mode prediction. Firstly, to predict the mode switching actions, a convolutional neural mode prediction network is constructed with local obstacle information as input data. Secondly, based on the above predicted actions, an elitist-Q-Learning (EQL) multi-mode planning algorithm is designed. A new reward function considering the multi-mode actions is proposed. On this basis, heuristic correction and elitist adjusting factors replace the fixed rewards of traditional Q-Learning with dynamically adjusted rewards during the iterative process. The Q table is quickly updated to converge to optimal values. Finally, this proposed strategy is verified in randomly generated maps of 1000 m*1000 m. Results show that the prediction accuracy can be maintained over 93 %. Its path distance is reduced by 4.56 % and 1.75 % compared to that of traditional Q-Learning and A* with mode prediction, respectively. It has the same path distance as BAS-A*, LPA*, and D* Lite with mode prediction. Compared to traditional Q-Learning, it reduces computational time by 36.61 %. When converged, its iterative numbers are 58.9 % less than those of traditional Q-Learning.

KW - Dynamic reward

KW - Mode prediction

KW - Path planning

KW - VTOL air-ground vehicle

KW - elitist-Q-Learning

UR - http://www.scopus.com/pages/publications/105001838950

U2 - 10.1016/j.aei.2025.103316

DO - 10.1016/j.aei.2025.103316

M3 - Article

AN - SCOPUS:105001838950

SN - 1474-0346

VL - 65

JO - Advanced Engineering Informatics

JF - Advanced Engineering Informatics

M1 - 103316

ER -

An improved elitist-Q-Learning path planning strategy for VTOL air-ground vehicle using convolutional neural network mode prediction

摘要

联合国可持续发展目标

访问文件

其它文件与链接

指纹

引用此