SentinelAgent defines seven properties for verifiable delegation chains in multi-agent AI systems and reports a protocol achieving 100% true positive rate at 0% false positives on a 516-scenario benchmark while using TLA+ to verify six deterministic properties.
FormalJudge: Neuro-Symbolic Agentic Oversight via Dafny and Z3
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2verdicts
CONDITIONAL 2roles
background 1polarities
background 1representative citing papers
LLM code modernizers produce semantic drift in 39.7% of legacy-Python-2 cases and endorse 31.7% of those drifts in self-review, with rates varying widely across models but not tracking capability.
citing papers explorer
-
SentinelAgent: Intent-Verified Delegation Chains for Securing Federal Multi-Agent AI Systems
SentinelAgent defines seven properties for verifiable delegation chains in multi-agent AI systems and reports a protocol achieving 100% true positive rate at 0% false positives on a 516-scenario benchmark while using TLA+ to verify six deterministic properties.
-
Articulate but Wrong: Self-Review Failures in LLM-Based Code Modernization
LLM code modernizers produce semantic drift in 39.7% of legacy-Python-2 cases and endorse 31.7% of those drifts in self-review, with rates varying widely across models but not tracking capability.