A formal theory proves model exploitation is essentially unavoidable on large policy sets in RL, generalizes reward hacking results, and derives a safe horizon for a relaxed version of exploitation.
Journal of statistical mechanics: theory and experiment , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
OM-Path uses Onsager-Machlup-regularized posterior transport on Doob-bridged paths for DGP inference and reports statistical wins over DBVI on the two largest UCI regression benchmarks.
QuantFPFlow uses quantum amplitude estimation in a Fokker-Planck RL framework to achieve O(1/ε) partition function estimation and reports improved global optimum discovery plus better scaling in continuous control tasks.
citing papers explorer
-
Imperfect World Models are Exploitable
A formal theory proves model exploitation is essentially unavoidable on large policy sets in RL, generalizes reward hacking results, and derives a safe horizon for a relaxed version of exploitation.
-
Onsager-Machlup Posterior Transport for Deep Gaussian Processes
OM-Path uses Onsager-Machlup-regularized posterior transport on Doob-bridged paths for DGP inference and reports statistical wins over DBVI on the two largest UCI regression benchmarks.
-
QuantFPFlow: Quantum Amplitude Estimation for Fokker--Planck Policy Optimisation in Continuous Reinforcement Learning
QuantFPFlow uses quantum amplitude estimation in a Fokker-Planck RL framework to achieve O(1/ε) partition function estimation and reports improved global optimum discovery plus better scaling in continuous control tasks.