EURO 2025 Leeds
Abstract Submission

1665. Static and Dynamic Policies for Multi-item Service Level Agreements with Finite Review Horizons and Penalty Costs

Invited abstract in session MD-38: (Deep) Reinforcement Learning for Combinatorial Optimization, stream Data Science meets Optimization.

Monday, 14:30-16:00
Room: Michael Sadler LG19

Authors (first author is the speaker)

1. Tarkan Temizoz
Eindhoven University of Technology
2. Christina Imdahl
Eindhoven University of Technology
3. Remco Dijkman
School of Industrial Engineering, Eindhoven University of Technology
4. Douniel Lamghari-Idrissi
School of Industrial Engineering, Eindhoven University of Technology
5. Willem van Jaarsveld
Eindhoven University of Technology

Abstract

Service Level Agreements (SLAs) align performance expectations in operations management. In multi-item inventory systems, suppliers aim to meet an aggregate fill rate (AFR) target by using static base-stock policies (BSPs). To set the order-up-to levels, they typically rely on a greedy heuristic (GH), assuming infinite review horizons. However, incorporating finite review horizons, penalty costs for underperformance, and real-time performance feedback can offer cost savings. We propose a two-tier solution framework. The static approach identifies which infinite horizon AFR target, when fed into GH, produces the cost-minimizing BSP for a given SLA. To find this, we present a simulation-based algorithm that generates candidate BSPs and efficiently prunes the search space. The dynamic approach reduces the combinatorial action space to a smaller set of replenishment rules by introducing composite actions, each specifying an order-up-to level for every item in the system. To implement this approach, we train Deep Reinforcement Learning policies that learn to choose among the composite actions in real-time. In numerical experiments, results show that dynamic policies reduce costs on average 4% relative to the best BSP benchmark while maintaining similar AFR and incurring mostly fewer penalties. We observe that a longer horizon with a higher penalty can lead to lower costs than a shorter horizon with a lower penalty, showing the importance of negotiating for longer review horizons.

Keywords

Status: accepted


Back to the list of papers