EURO 2025 Leeds
Abstract Submission

2155. Efficient Q-Learning for Constrained Decision-Making

Invited abstract in session WA-54: Applications in Queueing Theory, stream Stochastic modelling.

Wednesday, 8:30-10:00
Room: Liberty 1.08

Authors (first author is the speaker)

1. Khushboo Agarwal
2. Konstantin Avrachenkov
INRIA

Abstract

Markov decision processes (MDPs) provide a framework for sequential decision-making in stochastically evolving systems. When the transition dynamics are unknown, reinforcement learning (RL) algorithms, such as Q-learning and actor-critic (AC) methods, are commonly used to obtain optimal policies.

The situation is more involved when there are additional constraints. Most RL literature focuses on inequality constraints and employs Lagrangian-based methods for the analysis. However, equality constraints naturally arise in several real-world applications. For example, in queueing systems, one may seek to minimize the delay while ensuring a desired throughput (equality constraint) and an energy cost lesser than a given value (inequality constraint). Similarly, in wireless communication with mmWave systems, a controller might aim to optimize beam alignment to minimize power consumption while maintaining a fixed transmission rate (equality constraint) and keeping the expected number of realignments within a permissible limit (inequality constraint).

Driven by these practical examples, we study average-cost MDPs with 'equality and inequality' constraints. To address this, we propose and analyze a more efficient 'two-timescale' Q-learning algorithm, in contrast to the previously studied slower 'three-timescale' AC algorithm designed only for inequality-constrained MDPs. We also illustrate the performance of our algorithm for the instances mentioned above.

Keywords

Status: accepted


Back to the list of papers