LLM agents complete over 80% of tasks on a new 849-task Rust verification benchmark and over 90% on unfinished human proofs.
VeriPlan: Integrating formal verification and LLMs into end-user planning,
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
ACE decouples planning into abstract and concrete phases with static information-flow verification and enforces execution barriers to secure LLM app systems against prompt injection and related attacks.
U-Define improves user control in LLM planning by letting people define hard rules and soft preferences in natural language with matching verification methods, raising usefulness and satisfaction scores.
citing papers explorer
-
VeruSAGE: A Study of Agent-Based Verification for Rust Systems
LLM agents complete over 80% of tasks on a new 849-task Rust verification benchmark and over 90% on unfinished human proofs.
-
ACE: A Security Architecture for LLM-Integrated App Systems
ACE decouples planning into abstract and concrete phases with static information-flow verification and enforces execution barriers to secure LLM app systems against prompt injection and related attacks.
-
U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning
U-Define improves user control in LLM planning by letting people define hard rules and soft preferences in natural language with matching verification methods, raising usefulness and satisfaction scores.