EURO 2024 Copenhagen
Abstract Submission

EURO-Online login

2562. Harnessing the Power Trained Reinforcement Learning Agents in Job Shop Scheduling Problems

Invited abstract in session TB-3: Machine Learning in Applied Optimization, stream Data Science Meets Optimization.

Tuesday, 10:30-12:00
Room: 1005 (building: 202)

Authors (first author is the speaker)

1. Constantin Waubert de Puiseau
Institute for Technologies and Management of Digital Transformation, University of Wuppertal
2. Hasan Tercan
Institute for Technologies and Management of Digital Transformation, University of Wuppertal
3. Tobias Meisen
Institute for Technologies and Management of Digital Transformation, University of Wuppertal

Abstract

The Job Shop Scheduling Problem (JSSP) has been extensively studied in operations research for decades, resulting in the development of various solution methods. Recently, deep reinforcement learning (DRL) has emerged as a promising approach to automatically learn generalized construction heuristics from simulations. Construction heuristics iteratively generate solution sequences in which operations are integrated into a schedule. The neural networks of DRL-agents predict the probabilities per operation, that integrating it next in sequence will lead to the shortest schedule. Often multiple solutions are sampled stochastically from the predictions of trained agents. However, due to the symmetry of the JSSP, many of these sequences result in the same makespan or even the same schedule. This motivates the use of more sophisticated search strategies that cover a wider range of solutions and utilize trained agents effectively.
This study compares theoretical and practical aspects of integrating learned priors into depth-wise search strategies, such as stochastic sampling and Monte-Carlo tree search, aiming to find the shortest makespan in limited computational time. While predictions for sampling are most efficiently parallelized, other methods effectively prune the search tree to require fewer predictions in total. Our results with state-of-the-art DRL agents indicate that variations of stochastic sampling perform best, considering realistic time and hardware constraints.

Keywords

Status: accepted


Back to the list of papers