1665. Static and Dynamic Policies for Multi-item Service Level Agreements with Finite Review Horizons and Penalty Costs
Invited abstract in session MD-38: (Deep) Reinforcement Learning for Combinatorial Optimization, stream Data Science meets Optimization.
Monday, 14:30-16:00Room: Michael Sadler LG19
Authors (first author is the speaker)
| 1. | Tarkan Temizoz
|
| Eindhoven University of Technology | |
| 2. | Christina Imdahl
|
| Eindhoven University of Technology | |
| 3. | Remco Dijkman
|
| School of Industrial Engineering, Eindhoven University of Technology | |
| 4. | Douniel Lamghari-Idrissi
|
| School of Industrial Engineering, Eindhoven University of Technology | |
| 5. | Willem van Jaarsveld
|
| Eindhoven University of Technology |
Abstract
Service Level Agreements (SLAs) align performance expectations in operations management. In multi-item inventory systems, suppliers aim to meet an aggregate fill rate (AFR) target by using static base-stock policies (BSPs). To set the order-up-to levels, they typically rely on a greedy heuristic (GH), assuming infinite review horizons. However, incorporating finite review horizons, penalty costs for underperformance, and real-time performance feedback can offer cost savings. We propose a two-tier solution framework. The static approach identifies which infinite horizon AFR target, when fed into GH, produces the cost-minimizing BSP for a given SLA. To find this, we present a simulation-based algorithm that generates candidate BSPs and efficiently prunes the search space. The dynamic approach reduces the combinatorial action space to a smaller set of replenishment rules by introducing composite actions, each specifying an order-up-to level for every item in the system. To implement this approach, we train Deep Reinforcement Learning policies that learn to choose among the composite actions in real-time. In numerical experiments, results show that dynamic policies reduce costs on average 4% relative to the best BSP benchmark while maintaining similar AFR and incurring mostly fewer penalties. We observe that a longer horizon with a higher penalty can lead to lower costs than a shorter horizon with a lower penalty, showing the importance of negotiating for longer review horizons.
Keywords
- Inventory
- Machine Learning
- Supply Chain Management
Status: accepted
Back to the list of papers