EURO 2024 Copenhagen
Abstract Submission

EURO-Online login

2972. Index policy for multiarmed bandit problem with dynamic risk measures

Invited abstract in session TB-35: Risk Averse and Contextual Stochastic Optimization, stream Stochastic, Robust and Distributionally Robust Optimization.

Tuesday, 10:30-12:00
Room: 44 (building: 303A)

Authors (first author is the speaker)

1. Ozlem Cavus
Industrial Engineering, Bilkent University
2. Milad Malekipirbazari
Computer Science and Engineering, Chalmers University of Technology

Abstract

The multiarmed bandit problem (MAB) is a classic problem in which a finite amount of resources must be allocated among competing choices with the aim of identifying a policy that maximizes the expected total reward. The classical MAB makes the strong assumption that the decision maker is risk-neutral and indifferent to the variability of the outcome. However, in many real life applications, these assumptions are not met and decision makers are risk-averse. Motivated to resolve this, we study risk-averse control of the multiarmed bandit problem in regard to the concept of dynamic coherent risk measures to determine a policy with the best risk-adjusted total discounted return. In respect of this specific setting, we present a theoretical analysis based on Whittle’s retirement problem and propose a priority-index policy that reduces to the Gittins index when the level of risk-aversion converges to zero. We generalize the restart formulation of the Gittins index to effectively compute these risk-averse allocation indices. Numerical results exhibit the excellent performance of this heuristic approach. Our experimental studies suggest that there is no guarantee that an index-based optimal policy exists for the risk-averse problem. Nonetheless, our risk-averse allocation indices can achieve optimal or near-optimal policies which in some instances are easier to interpret compared to the exact optimal policy.

Keywords

Status: accepted


Back to the list of papers