POMDP policies can be checked for robustness to observation model changes by solving a bi-level optimization via root-finding with the Robust Interval Search algorithm, which runs in polynomial time for non-sticky history-independent deviations when using finite-state controllers.
2014.Markov decision processes: discrete stochastic dynamic programming
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
GUIDE integrates a Decision Transformer for joint modeling of bidding actions and states with Q-value regularization for exploration and an IDM for safe policy fallback, outperforming baselines in simulations and real Taobao deployment with gains in GMV, clicks, cost, and ROI.
A coalitional game model for MEC co-investment that incorporates resource updates and dynamic player participation to increase total payoffs and strengthen investment incentives.
citing papers explorer
-
Robustness Analysis of POMDP Policies to Observation Perturbations
POMDP policies can be checked for robustness to observation model changes by solving a bi-level optimization via root-finding with the Robust Interval Search algorithm, which runs in polynomial time for non-sticky history-independent deviations when using finite-state controllers.
-
Generative Auto-Bidding with Unified Modeling and Exploration
GUIDE integrates a Decision Transformer for joint modeling of bidding actions and states with Q-value regularization for exploration and an IDM for safe policy fallback, outperforming baselines in simulations and real Taobao deployment with gains in GMV, clicks, cost, and ROI.
-
Co-Investment in Mobile Edge Computing with Infrastructure Update and Dynamic Participation
A coalitional game model for MEC co-investment that incorporates resource updates and dynamic player participation to increase total payoffs and strengthen investment incentives.