hub Mixed citations

Pro2guard: Proactive runtime enforcement of llm agent safety via probabilistic model checking

· 2025 · arXiv 2508.00500

Mixed citation behavior. Most common role is background (50%).

16 Pith papers citing it

Background 50% of classified citations

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 baseline 1 dataset 1

citation-polarity summary

background 3 baseline 1 support 1 use dataset 1

representative citing papers

Do Coding Agents Understand Least-Privilege Authorization?

cs.CR · 2026-05-14 · unverdicted · novelty 7.0

Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.

RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems

cs.IR · 2026-05-12 · unverdicted · novelty 7.0

RecRM-Bench is a new large-scale benchmark dataset and framework for multi-dimensional reward modeling in agentic recommender systems, spanning instruction following, factual consistency, query-item relevance, and user behavior prediction.

TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples

cs.AI · 2026-05-08 · conditional · novelty 7.0

TraceFix repairs LLM-generated multi-agent protocols via TLA+ counterexamples to achieve full verification on all tested tasks and higher completion rates than prompt-only baselines.

Measuring the Unmeasurable: Markov Chain Reliability for LLM Agents

cs.SE · 2026-04-27 · unverdicted · novelty 7.0

TraceToChain models LLM agent traces as absorbing DTMCs using automatic clustering and smoothed MLE, with KS and AIC validation, to reconcile pass@k, pass^k, and RDC as projections of a single first-passage success-time distribution.

PowerDAG: Reliable Agentic AI System for Automating Distribution Grid Analysis

eess.SY · 2026-03-18 · unverdicted · novelty 7.0

PowerDAG achieves 94-100% success on unseen distribution grid analysis queries by combining adaptive retrieval with similarity-decay cutoff and just-in-time supervision, outperforming ReAct, LangChain, and CrewAI baselines.

When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

Routine user chats can unintentionally poison the long-term state of personalized LLM agents, causing authorization drift, tool escalation, and unchecked autonomy, as measured by a new benchmark and reduced by the StateGuard defense.

Owner-Harm: A Missing Threat Model for AI Agent Safety

cs.CR · 2026-04-20 · unverdicted · novelty 6.0

Owner-Harm is a new threat model with eight categories of agent behavior that harms the deployer, and existing defenses achieve only 14.8% true positive rate on injection-based owner-harm tasks versus 100% on generic criminal harm.

ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis

cs.AI · 2026-04-02 · unverdicted · novelty 6.0 · 2 refs

ATBench is a new trajectory-level benchmark with 1,000 diverse and realistic scenarios for assessing safety in LLM agents.

ATLAS-RTC: Closing the Loop on LLM Agent Output with Token-Level Runtime Control

cs.LG · 2026-03-29 · unverdicted · novelty 6.0

ATLAS-RTC raises first-attempt success on structured LLM generation and tool calling by 20-37.8 points through closed-loop token-level interventions.

Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

A three-layer probabilistic assume-guarantee architecture is structurally required for safe LLM agent deployment.

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

cs.CR · 2026-05-07 · unverdicted · novelty 5.0

A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.

Sovereign Agentic Loops: Decoupling AI Reasoning from Execution in Real-World Systems

cs.CR · 2026-04-24 · unverdicted · novelty 5.0

Sovereign Agentic Loops decouple LLM reasoning from execution by emitting validated intents through a control plane with obfuscation and evidence chains, blocking 93% of unsafe actions in a cloud prototype while adding 12.4 ms latency.

Harnessing Embodied Agents: Runtime Governance for Policy-Constrained Execution

cs.RO · 2026-04-09 · unverdicted · novelty 5.0 · 2 refs

A runtime governance framework for embodied agents intercepts 96.2% of unauthorized actions and achieves 91.4% recovery success in 1000 simulation trials while outperforming baselines.

Governance-Aware Agent Telemetry for Closed-Loop Enforcement in Multi-Agent AI Systems

cs.MA · 2026-04-06 · unverdicted · novelty 4.0

GAAT is a proposed architecture extending OpenTelemetry with governance schemas, OPA-based detection, graduated enforcement, and trusted provenance to close the observe-but-do-not-act gap in multi-agent systems.

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

cs.AI · 2025-10-27 · unverdicted · novelty 4.0

A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.

From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AI

cs.CR · 2026-05-15 · unverdicted · novelty 3.0

The paper analyzes evolving security and safety threats in generative AI from content generation to agentic actions, noting that attack surfaces expand faster than defenses and that many safeguards require institutional coordination not yet in place.

citing papers explorer

Showing 16 of 16 citing papers.

Do Coding Agents Understand Least-Privilege Authorization? cs.CR · 2026-05-14 · unverdicted · none · ref 40
Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.
RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems cs.IR · 2026-05-12 · unverdicted · none · ref 37
RecRM-Bench is a new large-scale benchmark dataset and framework for multi-dimensional reward modeling in agentic recommender systems, spanning instruction following, factual consistency, query-item relevance, and user behavior prediction.
TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples cs.AI · 2026-05-08 · conditional · none · ref 37
TraceFix repairs LLM-generated multi-agent protocols via TLA+ counterexamples to achieve full verification on all tested tasks and higher completion rates than prompt-only baselines.
Measuring the Unmeasurable: Markov Chain Reliability for LLM Agents cs.SE · 2026-04-27 · unverdicted · none · ref 21
TraceToChain models LLM agent traces as absorbing DTMCs using automatic clustering and smoothed MLE, with KS and AIC validation, to reconcile pass@k, pass^k, and RDC as projections of a single first-passage success-time distribution.
PowerDAG: Reliable Agentic AI System for Automating Distribution Grid Analysis eess.SY · 2026-03-18 · unverdicted · none · ref 41
PowerDAG achieves 94-100% success on unseen distribution grid analysis queries by combining adaptive retrieval with similarity-decay cutoff and just-in-time supervision, outperforming ReAct, LangChain, and CrewAI baselines.
When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents cs.CR · 2026-05-07 · unverdicted · none · ref 24
Routine user chats can unintentionally poison the long-term state of personalized LLM agents, causing authorization drift, tool escalation, and unchecked autonomy, as measured by a new benchmark and reduced by the StateGuard defense.
Owner-Harm: A Missing Threat Model for AI Agent Safety cs.CR · 2026-04-20 · unverdicted · none · ref 7
Owner-Harm is a new threat model with eight categories of agent behavior that harms the deployer, and existing defenses achieve only 14.8% true positive rate on injection-based owner-harm tasks versus 100% on generic criminal harm.
ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis cs.AI · 2026-04-02 · unverdicted · none · ref 32 · 2 links
ATBench is a new trajectory-level benchmark with 1,000 diverse and realistic scenarios for assessing safety in LLM agents.
ATLAS-RTC: Closing the Loop on LLM Agent Output with Token-Level Runtime Control cs.LG · 2026-03-29 · unverdicted · none · ref 11
ATLAS-RTC raises first-attempt success on structured LLM generation and tool calling by 20-37.8 points through closed-loop token-level interventions.
Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment cs.AI · 2026-05-18 · unverdicted · none · ref 61
A three-layer probabilistic assume-guarantee architecture is structurally required for safe LLM agent deployment.
Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation cs.CR · 2026-05-07 · unverdicted · none · ref 41
A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.
Sovereign Agentic Loops: Decoupling AI Reasoning from Execution in Real-World Systems cs.CR · 2026-04-24 · unverdicted · none · ref 14
Sovereign Agentic Loops decouple LLM reasoning from execution by emitting validated intents through a control plane with obfuscation and evidence chains, blocking 93% of unsafe actions in a cloud prototype while adding 12.4 ms latency.
Harnessing Embodied Agents: Runtime Governance for Policy-Constrained Execution cs.RO · 2026-04-09 · unverdicted · none · ref 52 · 2 links
A runtime governance framework for embodied agents intercepts 96.2% of unauthorized actions and achieves 91.4% recovery success in 1000 simulation trials while outperforming baselines.
Governance-Aware Agent Telemetry for Closed-Loop Enforcement in Multi-Agent AI Systems cs.MA · 2026-04-06 · unverdicted · none · ref 8
GAAT is a proposed architecture extending OpenTelemetry with governance schemas, OPA-based detection, graduated enforcement, and trusted provenance to close the observe-but-do-not-act gap in multi-agent systems.
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges cs.AI · 2025-10-27 · unverdicted · none · ref 223
A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.
From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AI cs.CR · 2026-05-15 · unverdicted · none · ref 135
The paper analyzes evolving security and safety threats in generative AI from content generation to agentic actions, noting that attack surfaces expand faster than defenses and that many safeguards require institutional coordination not yet in place.

Pro2guard: Proactive runtime enforcement of llm agent safety via probabilistic model checking

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer