CGPA enables certified speculative execution of untrusted AI proposals in constrained sequential decisions via verifier rejection, conformal boundary gating, and solver deferral, yielding zero violations and regret within noise of the oracle.
Safe reinforcement learning via probabilistic shields
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
A neuro-symbolic framework compiles LTLf formulas to DFAs, derives differentiable satisfaction signals from DFA progression, and uses them as a logic-based regularization loss to enforce temporal constraints in autoregressive transformer RL policies while preserving competitive returns.
The thesis presents Pino, an end-to-end pipeline that supervises reinforcement learning agents with argumentation-based normative advisors, introduces an algorithm for automatic argument extraction, and defines a mitigation strategy for norm avoidance.
citing papers explorer
-
Certified Speculative Execution for Untrusted AI Agents
CGPA enables certified speculative execution of untrusted AI proposals in constrained sequential decisions via verifier rejection, conformal boundary gating, and solver deferral, yielding zero violations and regret within noise of the oracle.
-
Neuro-Symbolic Injection of LTLf Constraints in Autoregressive Reinforcement Learning Policies
A neuro-symbolic framework compiles LTLf formulas to DFAs, derives differentiable satisfaction signals from DFA progression, and uses them as a logic-based regularization loss to enforce temporal constraints in autoregressive transformer RL policies while preserving competitive returns.
-
What if Pinocchio Were a Reinforcement Learning Agent: A Normative End-to-End Pipeline
The thesis presents Pino, an end-to-end pipeline that supervises reinforcement learning agents with argumentation-based normative advisors, introduces an algorithm for automatic argument extraction, and defines a mitigation strategy for norm avoidance.