2155. Efficient Q-Learning for Constrained Decision-Making
Invited abstract in session WA-54: Applications in Queueing Theory, stream Stochastic modelling.
Wednesday, 8:30-10:00Room: Liberty 1.08
Authors (first author is the speaker)
| 1. | Khushboo Agarwal
|
| 2. | Konstantin Avrachenkov
|
| INRIA |
Abstract
Markov decision processes (MDPs) provide a framework for sequential decision-making in stochastically evolving systems. When the transition dynamics are unknown, reinforcement learning (RL) algorithms, such as Q-learning and actor-critic (AC) methods, are commonly used to obtain optimal policies.
The situation is more involved when there are additional constraints. Most RL literature focuses on inequality constraints and employs Lagrangian-based methods for the analysis. However, equality constraints naturally arise in several real-world applications. For example, in queueing systems, one may seek to minimize the delay while ensuring a desired throughput (equality constraint) and an energy cost lesser than a given value (inequality constraint). Similarly, in wireless communication with mmWave systems, a controller might aim to optimize beam alignment to minimize power consumption while maintaining a fixed transmission rate (equality constraint) and keeping the expected number of realignments within a permissible limit (inequality constraint).
Driven by these practical examples, we study average-cost MDPs with 'equality and inequality' constraints. To address this, we propose and analyze a more efficient 'two-timescale' Q-learning algorithm, in contrast to the previously studied slower 'three-timescale' AC algorithm designed only for inequality-constrained MDPs. We also illustrate the performance of our algorithm for the instances mentioned above.
Keywords
- Control Theory
- Multi-Objective Decision Making
- Queuing Systems
Status: accepted
Back to the list of papers