Constrained policy optimization

· 2017

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Learning Reachability of Energy Storage Arbitrage

eess.SY · 2025-12-06 · unverdicted · novelty 7.0

A stopping-time reward and chance-constrained SoC penalty embedded in an end-to-end learning framework improves battery reachability of target ranges, raises arbitrage profit, and lowers profit variance under volatile prices.

Constraint-Aware Reinforcement Learning via Adaptive Action Scaling

cs.RO · 2025-10-13 · unverdicted · novelty 6.0

A separate regulator module adaptively scales actions in RL to reduce constraint violations while preserving exploration, yielding up to 126x fewer violations and over 10x higher returns on Safety Gym tasks.

COOPO: Cyclic Offline-Online Policy Optimization Algorithm

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Learning Reachability of Energy Storage Arbitrage eess.SY · 2025-12-06 · unverdicted · none · ref 29
A stopping-time reward and chance-constrained SoC penalty embedded in an end-to-end learning framework improves battery reachability of target ranges, raises arbitrage profit, and lowers profit variance under volatile prices.

Constrained policy optimization

fields

years

verdicts

representative citing papers

citing papers explorer