The paper introduces Frictive Policy Optimization as a risk-sensitive epistemic control framework for LLM alignment that treats interventions like clarification, verification, and refusal as explicit actions to improve downstream belief quality rather than immediate rewards.
InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 15503–15514
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Proposes OPAC for trajectory-level offline RL achieving 𝓣O(H^{2}√(C_sa(π*)/n)) bounds with matching lower bound, plus conditions for tractability in generalized nonlinear outcome settings.
State augmentation allows dynamic programming and sample complexity bounds for MDPs and optimal control under static risk measures including CVaR.
citing papers explorer
-
When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?
Proposes OPAC for trajectory-level offline RL achieving 𝓣O(H^{2}√(C_sa(π*)/n)) bounds with matching lower bound, plus conditions for tractability in generalized nonlinear outcome settings.