TY - JOUR
T1 - Quantization-aware distributed deep reinforcement learning for dynamic multi-robot scheduling
AU - Song, Peng
AU - Xiao, Yichen
AU - Cui, Kaixin
AU - Wang, Junzheng
AU - Shi, Dawei
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2026/1/15
Y1 - 2026/1/15
N2 - In intelligent port logistics, container stevedoring operations confront escalating challenges in orchestrating fleets of robots, where real-time task scheduling must reconcile high-dimensional state spaces with stringent computational efficiency and dynamically evolving environments. Traditional approaches, categorized as exact methods and approximate metaheuristics, struggle to balance solution quality and real-time responsiveness as task complexity grows exponentially. While recent deep reinforcement learning (DRL) methods improve adaptability in dynamic settings, they suffer from high computational overhead and deployment latency, limiting their practicality in time-sensitive port operations. To address these limitations, this work proposes a distributed deep reinforcement learning (DDRL) framework. This framework leverages the independence between ports to perform action selection and decision-making in parallel, thereby alleviating computational pressure and enhancing operational efficiency. It is especially enhanced with a teammate collaboration model and a greedy MaxNextQ policy, which enables the network to identify and approach promising actions associated with increasing Q-values. To further enhance deployment efficiency, a quantization-aware training (QAT) method is introduced by adding pseudo-quantization nodes and thus reducing quantization-induced errors. The effectiveness of the proposed DDRL algorithm is validated through simulations under three distinct workload scenarios via varying the robot-to-port ratio. The simulation results demonstrate that, compared with centralized DRL approaches, the proposed approach achieves deployment rate improvements of 22.95%, 15.09%, and 23.37%, while simultaneously enhancing objective scores by 5.75%, 6.32%, and 7.05%.
AB - In intelligent port logistics, container stevedoring operations confront escalating challenges in orchestrating fleets of robots, where real-time task scheduling must reconcile high-dimensional state spaces with stringent computational efficiency and dynamically evolving environments. Traditional approaches, categorized as exact methods and approximate metaheuristics, struggle to balance solution quality and real-time responsiveness as task complexity grows exponentially. While recent deep reinforcement learning (DRL) methods improve adaptability in dynamic settings, they suffer from high computational overhead and deployment latency, limiting their practicality in time-sensitive port operations. To address these limitations, this work proposes a distributed deep reinforcement learning (DDRL) framework. This framework leverages the independence between ports to perform action selection and decision-making in parallel, thereby alleviating computational pressure and enhancing operational efficiency. It is especially enhanced with a teammate collaboration model and a greedy MaxNextQ policy, which enables the network to identify and approach promising actions associated with increasing Q-values. To further enhance deployment efficiency, a quantization-aware training (QAT) method is introduced by adding pseudo-quantization nodes and thus reducing quantization-induced errors. The effectiveness of the proposed DDRL algorithm is validated through simulations under three distinct workload scenarios via varying the robot-to-port ratio. The simulation results demonstrate that, compared with centralized DRL approaches, the proposed approach achieves deployment rate improvements of 22.95%, 15.09%, and 23.37%, while simultaneously enhancing objective scores by 5.75%, 6.32%, and 7.05%.
KW - Distributed deep reinforcement learning
KW - Dynamic task scheduling
KW - Multi-robot system
KW - Quantization-aware training
UR - http://www.scopus.com/pages/publications/105011194840
U2 - 10.1016/j.eswa.2025.129027
DO - 10.1016/j.eswa.2025.129027
M3 - Article
AN - SCOPUS:105011194840
SN - 0957-4174
VL - 296
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 129027
ER -