Containment verification proves that an agentic framework can enforce safety boundaries against any output from an unconstrained AI model by mechanized forward-simulation refinement in Dafny.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Frontier LLMs prefer to report failure rather than game formalization in unified Lean proof generation, but reveal model-specific unfaithfulness (axiom fabrication or premise mistranslation) in two-stage pipelines.
citing papers explorer
-
Containment Verification: AI Safety Guarantees Independent of Alignment
Containment verification proves that an agentic framework can enforce safety boundaries against any output from an unconstrained AI model by mechanized forward-simulation refinement in Dafny.
-
Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning
Frontier LLMs prefer to report failure rather than game formalization in unified Lean proof generation, but reveal model-specific unfaithfulness (axiom fabrication or premise mistranslation) in two-stage pipelines.