Addressing Function Approximation Error in Actor-Critic Methods.International Conference on Machine Learning, pages 1587–1596

Scott Fujimoto, Herke van Hoof, David Meger · 2018

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Reflective Prompted Policy Optimization: Trajectory-Grounded Revision and Salience Bias

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Reflective Prompted Policy Optimization uses a Critic-LLM to inspect full trajectories and propose grounded revisions, yielding higher mean best rewards, faster near-optimal performance, and greater stability than scalar-reward baselines across ten environments.

citing papers explorer

Showing 1 of 1 citing paper.

Reflective Prompted Policy Optimization: Trajectory-Grounded Revision and Salience Bias cs.LG · 2026-05-08 · unverdicted · none · ref 2
Reflective Prompted Policy Optimization uses a Critic-LLM to inspect full trajectories and propose grounded revisions, yielding higher mean best rewards, faster near-optimal performance, and greater stability than scalar-reward baselines across ten environments.

Addressing Function Approximation Error in Actor-Critic Methods.International Conference on Machine Learning, pages 1587–1596

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer