Quantization-aware distributed deep reinforcement learning for dynamic multi-robot scheduling

Peng Song; Yichen Xiao; Kaixin Cui; Junzheng Wang; Dawei Shi

doi:10.1016/j.eswa.2025.129027

Quantization-aware distributed deep reinforcement learning for dynamic multi-robot scheduling

Peng Song, Yichen Xiao, Kaixin Cui, Junzheng Wang, Dawei Shi^*

^*Corresponding author for this work

School of Automation, Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

Abstract

In intelligent port logistics, container stevedoring operations confront escalating challenges in orchestrating fleets of robots, where real-time task scheduling must reconcile high-dimensional state spaces with stringent computational efficiency and dynamically evolving environments. Traditional approaches, categorized as exact methods and approximate metaheuristics, struggle to balance solution quality and real-time responsiveness as task complexity grows exponentially. While recent deep reinforcement learning (DRL) methods improve adaptability in dynamic settings, they suffer from high computational overhead and deployment latency, limiting their practicality in time-sensitive port operations. To address these limitations, this work proposes a distributed deep reinforcement learning (DDRL) framework. This framework leverages the independence between ports to perform action selection and decision-making in parallel, thereby alleviating computational pressure and enhancing operational efficiency. It is especially enhanced with a teammate collaboration model and a greedy MaxNextQ policy, which enables the network to identify and approach promising actions associated with increasing Q-values. To further enhance deployment efficiency, a quantization-aware training (QAT) method is introduced by adding pseudo-quantization nodes and thus reducing quantization-induced errors. The effectiveness of the proposed DDRL algorithm is validated through simulations under three distinct workload scenarios via varying the robot-to-port ratio. The simulation results demonstrate that, compared with centralized DRL approaches, the proposed approach achieves deployment rate improvements of 22.95%, 15.09%, and 23.37%, while simultaneously enhancing objective scores by 5.75%, 6.32%, and 7.05%.

Original language	English
Article number	129027
Journal	Expert Systems with Applications
Volume	296
DOIs	http://doi.org/10.1016/j.eswa.2025.129027
Publication status	Published - 15 Jan 2026
Externally published	Yes

Keywords

Distributed deep reinforcement learning
Dynamic task scheduling
Multi-robot system
Quantization-aware training

Access to Document

10.1016/j.eswa.2025.129027

Cite this

@article{e086735182cd4ad494cd718fed75ac22,

title = "Quantization-aware distributed deep reinforcement learning for dynamic multi-robot scheduling",

abstract = "In intelligent port logistics, container stevedoring operations confront escalating challenges in orchestrating fleets of robots, where real-time task scheduling must reconcile high-dimensional state spaces with stringent computational efficiency and dynamically evolving environments. Traditional approaches, categorized as exact methods and approximate metaheuristics, struggle to balance solution quality and real-time responsiveness as task complexity grows exponentially. While recent deep reinforcement learning (DRL) methods improve adaptability in dynamic settings, they suffer from high computational overhead and deployment latency, limiting their practicality in time-sensitive port operations. To address these limitations, this work proposes a distributed deep reinforcement learning (DDRL) framework. This framework leverages the independence between ports to perform action selection and decision-making in parallel, thereby alleviating computational pressure and enhancing operational efficiency. It is especially enhanced with a teammate collaboration model and a greedy MaxNextQ policy, which enables the network to identify and approach promising actions associated with increasing Q-values. To further enhance deployment efficiency, a quantization-aware training (QAT) method is introduced by adding pseudo-quantization nodes and thus reducing quantization-induced errors. The effectiveness of the proposed DDRL algorithm is validated through simulations under three distinct workload scenarios via varying the robot-to-port ratio. The simulation results demonstrate that, compared with centralized DRL approaches, the proposed approach achieves deployment rate improvements of 22.95\%, 15.09\%, and 23.37\%, while simultaneously enhancing objective scores by 5.75\%, 6.32\%, and 7.05\%.",

keywords = "Distributed deep reinforcement learning, Dynamic task scheduling, Multi-robot system, Quantization-aware training",

author = "Peng Song and Yichen Xiao and Kaixin Cui and Junzheng Wang and Dawei Shi",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier Ltd",

year = "2026",

month = jan,

day = "15",

doi = "10.1016/j.eswa.2025.129027",

language = "English",

volume = "296",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Quantization-aware distributed deep reinforcement learning for dynamic multi-robot scheduling

AU - Song, Peng

AU - Xiao, Yichen

AU - Cui, Kaixin

AU - Wang, Junzheng

AU - Shi, Dawei

PY - 2026/1/15

Y1 - 2026/1/15

N2 - In intelligent port logistics, container stevedoring operations confront escalating challenges in orchestrating fleets of robots, where real-time task scheduling must reconcile high-dimensional state spaces with stringent computational efficiency and dynamically evolving environments. Traditional approaches, categorized as exact methods and approximate metaheuristics, struggle to balance solution quality and real-time responsiveness as task complexity grows exponentially. While recent deep reinforcement learning (DRL) methods improve adaptability in dynamic settings, they suffer from high computational overhead and deployment latency, limiting their practicality in time-sensitive port operations. To address these limitations, this work proposes a distributed deep reinforcement learning (DDRL) framework. This framework leverages the independence between ports to perform action selection and decision-making in parallel, thereby alleviating computational pressure and enhancing operational efficiency. It is especially enhanced with a teammate collaboration model and a greedy MaxNextQ policy, which enables the network to identify and approach promising actions associated with increasing Q-values. To further enhance deployment efficiency, a quantization-aware training (QAT) method is introduced by adding pseudo-quantization nodes and thus reducing quantization-induced errors. The effectiveness of the proposed DDRL algorithm is validated through simulations under three distinct workload scenarios via varying the robot-to-port ratio. The simulation results demonstrate that, compared with centralized DRL approaches, the proposed approach achieves deployment rate improvements of 22.95%, 15.09%, and 23.37%, while simultaneously enhancing objective scores by 5.75%, 6.32%, and 7.05%.

AB - In intelligent port logistics, container stevedoring operations confront escalating challenges in orchestrating fleets of robots, where real-time task scheduling must reconcile high-dimensional state spaces with stringent computational efficiency and dynamically evolving environments. Traditional approaches, categorized as exact methods and approximate metaheuristics, struggle to balance solution quality and real-time responsiveness as task complexity grows exponentially. While recent deep reinforcement learning (DRL) methods improve adaptability in dynamic settings, they suffer from high computational overhead and deployment latency, limiting their practicality in time-sensitive port operations. To address these limitations, this work proposes a distributed deep reinforcement learning (DDRL) framework. This framework leverages the independence between ports to perform action selection and decision-making in parallel, thereby alleviating computational pressure and enhancing operational efficiency. It is especially enhanced with a teammate collaboration model and a greedy MaxNextQ policy, which enables the network to identify and approach promising actions associated with increasing Q-values. To further enhance deployment efficiency, a quantization-aware training (QAT) method is introduced by adding pseudo-quantization nodes and thus reducing quantization-induced errors. The effectiveness of the proposed DDRL algorithm is validated through simulations under three distinct workload scenarios via varying the robot-to-port ratio. The simulation results demonstrate that, compared with centralized DRL approaches, the proposed approach achieves deployment rate improvements of 22.95%, 15.09%, and 23.37%, while simultaneously enhancing objective scores by 5.75%, 6.32%, and 7.05%.

KW - Distributed deep reinforcement learning

KW - Dynamic task scheduling

KW - Multi-robot system

KW - Quantization-aware training

UR - http://www.scopus.com/pages/publications/105011194840

U2 - 10.1016/j.eswa.2025.129027

DO - 10.1016/j.eswa.2025.129027

M3 - Article

AN - SCOPUS:105011194840

SN - 0957-4174

VL - 296

JO - Expert Systems with Applications

JF - Expert Systems with Applications

M1 - 129027

ER -

Quantization-aware distributed deep reinforcement learning for dynamic multi-robot scheduling

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this