Treat abstract

> Treat abstract

3003. Algorithms to solve Risk Sensitive MDPs, Games and Applications

Invited abstract in session TB-40: Reinforcement Learning: Methods and Applications , stream Advances in Stochastic Modelling and Learning Methods.

Tuesday, 10:30-12:00
Room: 96 (building: 306)

Authors (first author is the speaker)

1.	Veeraruna Kavitha
	IEOR, Indian Institute of Technology Bombay
2.	Vartika Singh

3.	Tushar Walunj
	Industrial Engineering and Operations Research, IIT Bombay

Abstract

There are no computationally feasible algorithms that provide solutions to the finite horizon Risk-sensitive Constrained Markov Decision Process (CMDP) problem, even for the problems with moderate horizon. With an aim to design the same, we derive a fixed-point equation such that the optimal policy of the CMDP is also a solution. We further provide two optimization problems equivalent to the CMDP. These formulations are instrumental in designing a global algorithm that converges to the optimal policy. The proposed algorithm is based on random restarts and a local improvement step. Here the local improvement step involves solving a Linear program, and utilizes the solution of the derived fixed-point equation, while, the random restarts ensure global optimization. Such MDPs are utilized to model and derive robust inventory control.

Further, one may have constrained stochastic games whose objective and constraints are a combination of linear and risk-sensitive utilities.
The best response becomes a constrained MDP with a combination of linear and risk-sensitive utilities. We build upon the theory developed for constrained risk-sensitive MDP and develop a solution technique to numerically solve such combined constrained MDPs and eventually the game.

Finally, we suggest an online learning algorithm that can potentially learn the optimal policy, when the data defining the MDP is not known.

Keywords

Decision Theory
Game Theory
Stochastic Models

Status: accepted

Back to the list of papers

> Treat abstract

This part of the site is hosted by EURO. Feedback. Privacy policy

Username:
Password:

EURO-Online login

3003. Algorithms to solve Risk Sensitive MDPs, Games and Applications