BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.
Re- Act: Synergizing reasoning and acting in language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
An agentic framework generates executable physics simulation code from text prompts via coordinated planning, coding, visual, and physics agents that iterate to satisfy both prompt fidelity and physical constraints.
citing papers explorer
-
Behavioral Integrity Verification for AI Agent Skills
BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.