Quantization-aware distributed deep reinforcement learning for dynamic multi-robot scheduling

Peng Song; Yichen Xiao; Kaixin Cui; Junzheng Wang; Dawei Shi

doi:10.1016/j.eswa.2025.129027

Quantization-aware distributed deep reinforcement learning for dynamic multi-robot scheduling

Peng Song, Yichen Xiao, Kaixin Cui, Junzheng Wang, Dawei Shi^*

^*此作品的通讯作者

School of Automation, Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

In intelligent port logistics, container stevedoring operations confront escalating challenges in orchestrating fleets of robots, where real-time task scheduling must reconcile high-dimensional state spaces with stringent computational efficiency and dynamically evolving environments. Traditional approaches, categorized as exact methods and approximate metaheuristics, struggle to balance solution quality and real-time responsiveness as task complexity grows exponentially. While recent deep reinforcement learning (DRL) methods improve adaptability in dynamic settings, they suffer from high computational overhead and deployment latency, limiting their practicality in time-sensitive port operations. To address these limitations, this work proposes a distributed deep reinforcement learning (DDRL) framework. This framework leverages the independence between ports to perform action selection and decision-making in parallel, thereby alleviating computational pressure and enhancing operational efficiency. It is especially enhanced with a teammate collaboration model and a greedy MaxNextQ policy, which enables the network to identify and approach promising actions associated with increasing Q-values. To further enhance deployment efficiency, a quantization-aware training (QAT) method is introduced by adding pseudo-quantization nodes and thus reducing quantization-induced errors. The effectiveness of the proposed DDRL algorithm is validated through simulations under three distinct workload scenarios via varying the robot-to-port ratio. The simulation results demonstrate that, compared with centralized DRL approaches, the proposed approach achieves deployment rate improvements of 22.95%, 15.09%, and 23.37%, while simultaneously enhancing objective scores by 5.75%, 6.32%, and 7.05%.

源语言	英语
文章编号	129027
期刊	Expert Systems with Applications
卷	296
DOI	http://doi.org/10.1016/j.eswa.2025.129027
出版状态	已出版 - 15 1月 2026
已对外发布	是

访问文件

10.1016/j.eswa.2025.129027

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{e086735182cd4ad494cd718fed75ac22,

title = "Quantization-aware distributed deep reinforcement learning for dynamic multi-robot scheduling",

abstract = "In intelligent port logistics, container stevedoring operations confront escalating challenges in orchestrating fleets of robots, where real-time task scheduling must reconcile high-dimensional state spaces with stringent computational efficiency and dynamically evolving environments. Traditional approaches, categorized as exact methods and approximate metaheuristics, struggle to balance solution quality and real-time responsiveness as task complexity grows exponentially. While recent deep reinforcement learning (DRL) methods improve adaptability in dynamic settings, they suffer from high computational overhead and deployment latency, limiting their practicality in time-sensitive port operations. To address these limitations, this work proposes a distributed deep reinforcement learning (DDRL) framework. This framework leverages the independence between ports to perform action selection and decision-making in parallel, thereby alleviating computational pressure and enhancing operational efficiency. It is especially enhanced with a teammate collaboration model and a greedy MaxNextQ policy, which enables the network to identify and approach promising actions associated with increasing Q-values. To further enhance deployment efficiency, a quantization-aware training (QAT) method is introduced by adding pseudo-quantization nodes and thus reducing quantization-induced errors. The effectiveness of the proposed DDRL algorithm is validated through simulations under three distinct workload scenarios via varying the robot-to-port ratio. The simulation results demonstrate that, compared with centralized DRL approaches, the proposed approach achieves deployment rate improvements of 22.95\%, 15.09\%, and 23.37\%, while simultaneously enhancing objective scores by 5.75\%, 6.32\%, and 7.05\%.",

keywords = "Distributed deep reinforcement learning, Dynamic task scheduling, Multi-robot system, Quantization-aware training",

author = "Peng Song and Yichen Xiao and Kaixin Cui and Junzheng Wang and Dawei Shi",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier Ltd",

year = "2026",

month = jan,

day = "15",

doi = "10.1016/j.eswa.2025.129027",

language = "English",

volume = "296",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Quantization-aware distributed deep reinforcement learning for dynamic multi-robot scheduling

AU - Song, Peng

AU - Xiao, Yichen

AU - Cui, Kaixin

AU - Wang, Junzheng

AU - Shi, Dawei

PY - 2026/1/15

Y1 - 2026/1/15

N2 - In intelligent port logistics, container stevedoring operations confront escalating challenges in orchestrating fleets of robots, where real-time task scheduling must reconcile high-dimensional state spaces with stringent computational efficiency and dynamically evolving environments. Traditional approaches, categorized as exact methods and approximate metaheuristics, struggle to balance solution quality and real-time responsiveness as task complexity grows exponentially. While recent deep reinforcement learning (DRL) methods improve adaptability in dynamic settings, they suffer from high computational overhead and deployment latency, limiting their practicality in time-sensitive port operations. To address these limitations, this work proposes a distributed deep reinforcement learning (DDRL) framework. This framework leverages the independence between ports to perform action selection and decision-making in parallel, thereby alleviating computational pressure and enhancing operational efficiency. It is especially enhanced with a teammate collaboration model and a greedy MaxNextQ policy, which enables the network to identify and approach promising actions associated with increasing Q-values. To further enhance deployment efficiency, a quantization-aware training (QAT) method is introduced by adding pseudo-quantization nodes and thus reducing quantization-induced errors. The effectiveness of the proposed DDRL algorithm is validated through simulations under three distinct workload scenarios via varying the robot-to-port ratio. The simulation results demonstrate that, compared with centralized DRL approaches, the proposed approach achieves deployment rate improvements of 22.95%, 15.09%, and 23.37%, while simultaneously enhancing objective scores by 5.75%, 6.32%, and 7.05%.

AB - In intelligent port logistics, container stevedoring operations confront escalating challenges in orchestrating fleets of robots, where real-time task scheduling must reconcile high-dimensional state spaces with stringent computational efficiency and dynamically evolving environments. Traditional approaches, categorized as exact methods and approximate metaheuristics, struggle to balance solution quality and real-time responsiveness as task complexity grows exponentially. While recent deep reinforcement learning (DRL) methods improve adaptability in dynamic settings, they suffer from high computational overhead and deployment latency, limiting their practicality in time-sensitive port operations. To address these limitations, this work proposes a distributed deep reinforcement learning (DDRL) framework. This framework leverages the independence between ports to perform action selection and decision-making in parallel, thereby alleviating computational pressure and enhancing operational efficiency. It is especially enhanced with a teammate collaboration model and a greedy MaxNextQ policy, which enables the network to identify and approach promising actions associated with increasing Q-values. To further enhance deployment efficiency, a quantization-aware training (QAT) method is introduced by adding pseudo-quantization nodes and thus reducing quantization-induced errors. The effectiveness of the proposed DDRL algorithm is validated through simulations under three distinct workload scenarios via varying the robot-to-port ratio. The simulation results demonstrate that, compared with centralized DRL approaches, the proposed approach achieves deployment rate improvements of 22.95%, 15.09%, and 23.37%, while simultaneously enhancing objective scores by 5.75%, 6.32%, and 7.05%.

KW - Distributed deep reinforcement learning

KW - Dynamic task scheduling

KW - Multi-robot system

KW - Quantization-aware training

UR - http://www.scopus.com/pages/publications/105011194840

U2 - 10.1016/j.eswa.2025.129027

DO - 10.1016/j.eswa.2025.129027

M3 - Article

AN - SCOPUS:105011194840

SN - 0957-4174

VL - 296

JO - Expert Systems with Applications

JF - Expert Systems with Applications

M1 - 129027

ER -

Quantization-aware distributed deep reinforcement learning for dynamic multi-robot scheduling

摘要

访问文件

其它文件与链接

指纹

引用此