{"total":12,"items":[{"citing_arxiv_id":"2606.20510","ref_index":40,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Efficient and Sound Probabilistic Verification for AI Agents","primary_cat":"cs.CR","submitted_at":"2026-06-18T17:27:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Presents a distributionally robust optimization method for sound probabilistic verification of Datalog policies in AI agents that bounds violation risk regardless of predicate correlations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.17573","ref_index":37,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Cordon: Semantic Transactions for Tool-Using LLM Agents","primary_cat":"cs.OS","submitted_at":"2026-06-16T06:21:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Cordon is a transactional runtime system that binds tool intents to reversible state, staged effects, and audit metadata to validate composed agent workflows before commit.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09500","ref_index":25,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Deterministic Integrity Gates for LLM-Assisted Clinical Manuscript Preparation: An Auditable Biomedical Informatics Architecture","primary_cat":"cs.AI","submitted_at":"2026-06-08T13:51:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Presents MedSci Skills, an open-source toolkit with deterministic integrity gates for verifying LLM-assisted clinical manuscripts against reporting guidelines like STARD, PRISMA, and STROBE.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.07131","ref_index":47,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MalSkillBench: A Runtime-Verified Benchmark of Malicious Agent Skills","primary_cat":"cs.CR","submitted_at":"2026-06-05T10:43:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"MalSkillBench supplies the first sandbox-verified dataset of malicious agent skills and shows that existing detectors achieve high recall on code injection but collapse on prompt injection and agent-control attacks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04990","ref_index":64,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"From Agent Traces to Trust: A Survey of Evidence Tracing and Execution Provenance in LLM Agents","primary_cat":"cs.CR","submitted_at":"2026-06-03T15:12:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"This survey defines execution provenance as a typed graph of agent execution and evidence tracing as its projection onto evidence-support relations, then reviews methods, taxonomy, benchmarks, and challenges for auditable LLM agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30693","ref_index":41,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Triaging Threats to Specialized Guardrails","primary_cat":"cs.CR","submitted_at":"2026-05-29T00:36:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces GuardZoo benchmark and RouteGuard router-expert system showing monolithic guardrails suffer task interference while specialized routing improves threat detection and generalization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.26942","ref_index":30,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Neuro-Symbolic Verification of LLM Outputs for Data-Sensitive Domains (extended preprint)","primary_cat":"cs.AI","submitted_at":"2026-05-26T12:32:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Neuro-symbolic pipeline using formal logic and semantic embeddings detects hallucinations in LLM medical reports at 83%+ for entities and 72% for fabrications while cutting creation time 30%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17380","ref_index":44,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ADR: An Agentic Detection System for Enterprise Agentic AI Security","primary_cat":"cs.AI","submitted_at":"2026-05-17T10:49:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ADR is a three-component detection system for AI agents that combines telemetry sensors, red teaming, and two-tier detection, achieving 97.2% precision in a ten-month Uber deployment and outperforming baselines on the new ADR-Bench.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10614","ref_index":26,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"PRISM: Generation-Time Detection and Mitigation of Secret Leakage in Multi-Agent LLM Pipelines","primary_cat":"cs.AI","submitted_at":"2026-05-11T14:11:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PRISM detects and stops credential leakage during LLM generation in multi-agent pipelines using per-token risk scores from lexical, structural, and behavioral signals, achieving zero observed leaks and F1 of 0.832 on a 2000-task benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.27819","ref_index":4,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MCPHunt: An Evaluation Framework for Cross-Boundary Data Propagation in Multi-Server MCP Agents","primary_cat":"cs.AI","submitted_at":"2026-04-30T13:01:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"MCPHunt benchmark finds 11.5-41.3% policy-violating credential propagation in multi-server MCP agents across five models, reducible up to 97% by prompt mitigations while retaining most utility.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16542","ref_index":2,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"TWGuard: A Case Study of LLM Safety Guardrails for Localized Linguistic Contexts","primary_cat":"cs.CR","submitted_at":"2026-04-17T01:55:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"TWGuard achieves +0.289 F1 improvement and 94.9% false-positive reduction for LLM safety guardrails in the Taiwan linguistic context compared to foundation models and baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15579","ref_index":57,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility","primary_cat":"cs.SE","submitted_at":"2026-04-16T23:18:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"sensitive data leakage [ 18], and access control to restrict autho- rized access [54]. A few recent studies have begun to explore these symbolic enforcement mechanisms as symbolic guardrails for AI agent safety and security, in contrast to neural guardrails. For exam- ple,AgentSpec[ 65],Agent-C[ 31], andMaris[ 15] use temporal logic to specify and enforce agent constraints; Progent [57] defines privilege control policies using domain-specific languages; Doshi et al. [19] explores temporal logic and information-flow control with formal models; PFI [32] validates unsafe data flows to prevent privilege escalation in agents;Fides[ 13] uses information-flow Conference'17, July 2017, Washington, DC, USA Yining Hong, Yining She, Eunsuk Kang, Christopher S."}],"limit":50,"offset":0}