2216. Learning State-Dependent Policy Parametrizations for Dynamic Technician Routing with Rework
Invited abstract in session WC-10: Tour Planning Problems, stream Mobility, Transportation, and Traffic.
Wednesday, 13:30-15:00Room: H16
Authors (first author is the speaker)
| 1. | Marlin Wolf Ulmer
|
| Management Science, Otto von Guericke Universität Magdeburg | |
| 2. | Jonas Stein
|
| Otto-von-Guericke-Universität Magdeburg | |
| 3. | Florentin Hildebrandt
|
| Faculty of Economics and Management, Otto-von-Guericke-Universität Magdeburg | |
| 4. | Barrett Thomas
|
| Management Sciences, University of Iowa |
Abstract
Home repair and installation services require technicians to visit customers and resolve tasks of different complexity. Technicians often have heterogeneous skills. The geographical spread of customers makes achieving only "ideal" matches between technician skills and task requirements impractical. Additionally, technicians are regularly absent, e.g., due to sickness. With only non-ideal assignments regarding task requirement and technician skill, some tasks may remain unresolved and require a revisit and rework at a later day, leading to delayed service. For this sequential decision problem, every day, we iteratively build tours by adding "important" customers. The importance bases on analytical considerations and is measured by respecting urgency of service, routing efficiency, and risk of rework in an integrated fashion. We propose a state-dependent balance of these factors via reinforcement learning. We rely on proximal policy optimization (PPO) tailored to the problem specifics, analyzing the implications of specific algorithmic augmentations. A comprehensive study shows that taking a few non-ideal assignments can be quite beneficial for the overall service quality. Furthermore, in states where a higher number of technicians are sick and many customers have overdue service deadlines, prioritizing service urgency is crucial. Conversely, in states with fewer sick technicians and fewer customers with overdue deadlines, routing efficiency should take precedence. We further demonstrate the value provided by a state-dependent parametrization via PPO.
Keywords
- Routing
- Machine Learning
- Stochastic Models
Status: accepted
Back to the list of papers