EURO-Online login
- New to EURO? Create an account
- I forgot my username and/or my password.
- Help with cookies
(important for IE8 users)
2160. Learning Whittle and LP indices in Average-Reward Restless Multi-Armed Bandits
Invited abstract in session TB-40: Reinforcement Learning: Methods and Applications , stream Advances in Stochastic Modelling and Learning Methods.
Tuesday, 10:30-12:00Room: 96 (building: 306)
Authors (first author is the speaker)
1. | Konstantin Avrachenkov
|
INRIA |
Abstract
Restless Multi-Armed Bandits (RMABs) are extensively used in scheduling, resource allocation,
marketing and clinical trials, just to name a few application areas. RMABs are Markov Decision Processes
with two actions (active and passive modes) for each arm and with a constraint on the number of active arms
per time slot. Since in general RMABs are PSPACE-complete, several heuristics such as Whittle index and LP
index have been proposed. In this talk, I present reinforcement learning schemes for both Whittle and
LP indices with almost sure convergence guarantee in the tabular setting and an empirically efficient
Deep Q-learning variants. Several examples, including scheduling in queueing systems, will be presented.
This talk is based on joint works with V.S. Borkar and P. Shah from IIT Bombay.
Keywords
- Machine Learning
- Optimal Control
- Stochastic Models
Status: accepted
Back to the list of papers