Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.
hub Mixed citations
Pro2Guard: Proactive runtime enforcement of LLM agent safety via probabilistic model checking
Mixed citation behavior. Most common role is background (50%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
RecRM-Bench is a new large-scale benchmark dataset and framework for multi-dimensional reward modeling in agentic recommender systems, spanning instruction following, factual consistency, query-item relevance, and user behavior prediction.
TraceFix repairs LLM-generated multi-agent protocols via TLA+ counterexamples to achieve full verification on all tested tasks and higher completion rates than prompt-only baselines.
TraceToChain models LLM agent traces as absorbing DTMCs using automatic clustering and smoothed MLE, with KS and AIC validation, to reconcile pass@k, pass^k, and RDC as projections of a single first-passage success-time distribution.
PowerDAG achieves 94-100% success on unseen distribution grid analysis queries by combining adaptive retrieval with similarity-decay cutoff and just-in-time supervision, outperforming ReAct, LangChain, and CrewAI baselines.
Routine user chats can unintentionally poison the long-term state of personalized LLM agents, causing authorization drift, tool escalation, and unchecked autonomy, as measured by a new benchmark and reduced by the StateGuard defense.
Owner-Harm is a new threat model with eight categories of agent behavior that harms the deployer, and existing defenses achieve only 14.8% true positive rate on injection-based owner-harm tasks versus 100% on generic criminal harm.
ATBench is a new trajectory-level benchmark with 1,000 diverse and realistic scenarios for assessing safety in LLM agents.
ATLAS-RTC raises first-attempt success on structured LLM generation and tool calling by 20-37.8 points through closed-loop token-level interventions.
A three-layer probabilistic assume-guarantee architecture is structurally required for safe LLM agent deployment.
A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.
Sovereign Agentic Loops decouple LLM reasoning from execution by emitting validated intents through a control plane with obfuscation and evidence chains, blocking 93% of unsafe actions in a cloud prototype while adding 12.4 ms latency.
GAAT is a proposed architecture extending OpenTelemetry with governance schemas, OPA-based detection, graduated enforcement, and trusted provenance to close the observe-but-do-not-act gap in multi-agent systems.
A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.
The paper analyzes evolving security and safety threats in generative AI from content generation to agentic actions, noting that attack surfaces expand faster than defenses and that many safeguards require institutional coordination not yet in place.