Treat abstract

> Treat abstract

1984. Does Adding a Novel Policy Replication Loop Enhance Approximate Policy Iteration Performance?

Invited abstract in session WB-54: Stochastic Models and Optimization I, stream Stochastic modelling.

Wednesday, 10:30-12:00
Room: Liberty 1.08

Authors (first author is the speaker)

1.	Amirreza Pashapour
	IEOM, KOC University
2.	Dilek Gunnec
	Industrial Engineering, Ozyegin University
3.	Sibel Salman
	Industrial Engineering, Koc University
4.	Eda Yücel
	Industrial Engineering, TOBB University of Economics and Technology

Abstract

Approximate Dynamic Programming (ADP) has proven effective for solving large-scale stochastic optimization problems by approximating value functions to overcome the curse of dimensionality inherent in Dynamic Programming (DP). An ADP algorithm can utilize Approximate Value Iteration (AVI), Approximate Policy Iteration (API), or Approximate Linear Programming (ALP) regimes to estimate the value functions. Among ADP methods, Approximate Policy Iteration (API) typically employs two nested loops: an outer loop for policy improvement and a middle loop for policy evaluation. State-of-the-art APIs typically apply policy evaluation to each sampled post-state only once. In this study, we introduce novel stochastic modeling to the API algorithm as three nested loops (3API), incorporating an additional inner loop—policy replication—within the policy evaluation process. This third loop repeatedly simulates transitions from every single sampled post-state, which allows for a more accurate estimation of expected value functions in large outcome spaces while maintaining computational efficiency and avoiding overfitting. The proposed 3API framework integrates Least Squares Temporal Differences (LSTD) learning to calibrate the policy vector based on collected data to ensure efficient policy updates. We demonstrate the effectiveness of the 3API approach across several problem contexts to highlight its potential to enhance policy learning and decision quality in stochastic environments.

Keywords

Stochastic Optimization
Programming, Dynamic
Programming, Stochastic

Status: accepted

Back to the list of papers

> Treat abstract

This part of the site is hosted by EURO. Feedback. Privacy policy