1984. Does Adding a Novel Policy Replication Loop Enhance Approximate Policy Iteration Performance?
Invited abstract in session WB-54: Stochastic Models and Optimization I, stream Stochastic modelling.
Wednesday, 10:30-12:00Room: Liberty 1.08
Authors (first author is the speaker)
| 1. | Amirreza Pashapour
|
| IEOM, KOC University | |
| 2. | Dilek Gunnec
|
| Industrial Engineering, Ozyegin University | |
| 3. | Sibel Salman
|
| Industrial Engineering, Koc University | |
| 4. | Eda Yücel
|
| Industrial Engineering, TOBB University of Economics and Technology |
Abstract
Approximate Dynamic Programming (ADP) has proven effective for solving large-scale stochastic optimization problems by approximating value functions to overcome the curse of dimensionality inherent in Dynamic Programming (DP). An ADP algorithm can utilize Approximate Value Iteration (AVI), Approximate Policy Iteration (API), or Approximate Linear Programming (ALP) regimes to estimate the value functions. Among ADP methods, Approximate Policy Iteration (API) typically employs two nested loops: an outer loop for policy improvement and a middle loop for policy evaluation. State-of-the-art APIs typically apply policy evaluation to each sampled post-state only once. In this study, we introduce novel stochastic modeling to the API algorithm as three nested loops (3API), incorporating an additional inner loop—policy replication—within the policy evaluation process. This third loop repeatedly simulates transitions from every single sampled post-state, which allows for a more accurate estimation of expected value functions in large outcome spaces while maintaining computational efficiency and avoiding overfitting. The proposed 3API framework integrates Least Squares Temporal Differences (LSTD) learning to calibrate the policy vector based on collected data to ensure efficient policy updates. We demonstrate the effectiveness of the 3API approach across several problem contexts to highlight its potential to enhance policy learning and decision quality in stochastic environments.
Keywords
- Stochastic Optimization
- Programming, Dynamic
- Programming, Stochastic
Status: accepted
Back to the list of papers