EURO 2025 Leeds
Abstract Submission

2418. Deep Reinforcement Learning for Multi-Echelon Inventory Management with Markov-Modulated Demand, Lead Times, and Transshipment Crossing

Invited abstract in session TA-34: Advancements of OR-analytics in statistics, machine learning and data science 2, stream Advancements of OR-analytics in statistics, machine learning and data science.

Tuesday, 8:30-10:00
Room: Michael Sadler LG10

Authors (first author is the speaker)

1. Fatemeh Fakhredin
Kühne Logistics University
2. Joern Meissner
Kuehne Logistics University

Abstract

Effective inventory management in multi-echelon supply chains requires making dynamic ordering decisions under stochastic demand and lead times. This paper formulates the problem as a Markov Decision Process (MDP) and applies deep reinforcement learning (DRL) using Proximal Policy Optimization (PPO) to minimize total inventory costs. Unlike traditional inventory control methods, our approach learns an adaptive ordering policy through interactions with a simulated environment.
We model a linear multi-echelon supply chain where Markov-modulated demand and lead times introduce temporal dependencies, capturing seasonality, market fluctuations, and disruptions. Another key challenge is transshipment crossing, where shipments placed at different times may arrive simultaneously due to varying lead times. This disrupts sequential fulfillment assumptions, making traditional optimization methods less effective.
To address this, we employ PPO, a policy-based DRL method, which iteratively refines a stochastic ordering policy by observing inventory levels, backorders, shipments, demand history, and the Markov state. This study advances data-driven supply chain optimization, demonstrating how reinforcement learning can adapt to dynamic uncertainties, offering a scalable alternative to rule-based inventory policies.

Keywords

Status: accepted


Back to the list of papers