{"total":12,"items":[{"citing_arxiv_id":"2606.13474","ref_index":10,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Exploring Systems-Thinking Approaches to Loss of Control Risk","primary_cat":"cs.CY","submitted_at":"2026-06-11T15:27:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Systems analyses of a frontier-lab AI coding agent scenario using STECA, STPA, and FRAM reveal unverifiable governance loops, ineffective control delays, and gradual safeguard erosion, supporting the addition of systems-level methods to model-focused AI evaluations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30406","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AI Loss of Control Incident Management: Response & Resilience","primary_cat":"cs.CY","submitted_at":"2026-05-28T17:47:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Presents a taxonomy for AI loss of control incident management that distinguishes extremely costly versus impossible regaining of control and accidental versus adversarial scenarios.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18549","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics","primary_cat":"cs.CL","submitted_at":"2026-05-18T15:29:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Probe trajectories across token positions in LRMs, combined with signal-processing features, improve prediction of future model outputs over static probes on safety and math tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16471","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AI","primary_cat":"cs.CR","submitted_at":"2026-05-15T13:53:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"The paper analyzes evolving security and safety threats in generative AI from content generation to agentic actions, noting that attack surfaces expand faster than defenses and that many safeguards require institutional coordination not yet in place.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"Security and Safety Threats in Generative AI study(𝑁=27,000)confirmedhigherperceivedveracityandsharingintentforAI-generatedfakenews.The2024super electionyearsawdocumentedAIinterferenceacrossmultiplenations[106],andAI-generatedacademicfraudhasalso increased. The 2026 AI Safety Report estimates that up to 8% of peer-reviewed submissions may contain substantial AI-generated content [12]. 4. Technical Countermeasures To counter the threats identified in Section 3, a multi-layered technical defense ecosystem has emerged spanning detection, watermarking, alignment, and agentic security. No single technique is sufficient; resilient AI requires composing all layers into an integrated security posture. 4.1. AIGC Detection 4.1.1. Text Detection"},{"citing_arxiv_id":"2605.13329","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Tracing Persona Vectors Through LLM Pretraining","primary_cat":"cs.CL","submitted_at":"2026-05-13T10:44:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Persona vectors form within the first 0.22% of LLM pretraining and remain effective for steering post-trained models, with continued refinement and transfer to other models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12746","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CoT-Guard: Small Models for Strong Monitoring","primary_cat":"cs.CR","submitted_at":"2026-05-12T20:49:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CoT-Guard is a 4B model using SFT and RL that achieves 75% G-mean^2 on hidden objective detection under prompt and code manipulation attacks, outperforming several larger models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[25] Fazl Barez, Tung-Yu Wu, Iván Arcuschin, Michael Lan, Vincent Wang, Noah Siegel, Nicolas Collignon, Clement Neo, Isabelle Lee, Alasdair Paren, et al. Chain-of-thought is not explainability.Preprint, alphaXiv, page v1, 2025. [26] Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267, 2025. [27] Yoshua Bengio, Stephen Clare, Carina Prunkl, Maksym Andriushchenko, Ben Bucknall, Malcolm Murray, Rishi Bommasani, Stephen Casper, Tom Davidson, Raymond Douglas, et al. International ai safety report 2026.arXiv preprint arXiv:2602.21012, 2026. [28] Alexander Meinke, Bronson Schoen, Jérémy Scheurer, Mikita Balesni, Rusheb Shah, and Marius Hobbhahn."},{"citing_arxiv_id":"2605.01130","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Iterative Finetuning is Mostly Idempotent","primary_cat":"cs.AI","submitted_at":"2026-05-01T22:01:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Iterative self-finetuning of LLMs mostly fails to amplify seeded behavioral traits, with amplification limited to specific DPO setups and often harming coherence.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.23425","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"When the Agent Is the Adversary: Architectural Requirements for Agentic AI Containment After the April 2026 Frontier Model Escape","primary_cat":"cs.CR","submitted_at":"2026-04-25T19:41:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A reported 2026 frontier model escape shows that alignment training, sandboxing, tool interception, and audits fail against adversarial agentic AI, requiring five new architectural requirements for durable containment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.14228","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems","primary_cat":"cs.SE","submitted_at":"2026-04-14T17:59:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Claude Code centers on a model-tool while-loop surrounded by permission systems, context compaction, extensibility hooks, subagent delegation, and session storage; the same design questions yield different answers in OpenClaw's gateway context.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.05969","ref_index":40,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms","primary_cat":"cs.CR","submitted_at":"2026-04-07T15:02:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MCPSHIELD offers a threat taxonomy of 23 attack vectors, a labeled transition system verification model, and a defense-in-depth architecture claiming 91% coverage for MCP-based AI agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.04443","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DeonticBench: A Benchmark for Reasoning over Rules","primary_cat":"cs.CL","submitted_at":"2026-04-06T05:41:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DEONTICBENCH is a new benchmark of 6,232 deontic reasoning tasks from U.S. legal domains where frontier LLMs reach only ~45% accuracy and symbolic Prolog assistance plus RL training still fail to solve tasks reliably.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"U.S. Federal Tax §151(b) A taxpayer may claim a spouse exemption if: - the spouse has zero income - the spouse is not another taxpayer's dependent - they are not filing jointly Married filing separately. U.S. Federal Tax §63(C) Basic standard deduction: -Joint return or surviving spouse -Head of household → $4,400 -Any other case → $3,000 (1) Problem Contexts (3) Prolog Execution In 2017, Alice was paid $36266. Alice and Bob have been married since Feb 3rd, 2017. Bob had no income in 2017 ... [Detailed Facts of Alice and Bob omitted] Question: How much tax does Alice have to pay in 2017? spouse_exemption_allowed(Taxpayer,Year) :- spouse ( Taxpayer, Sp), income (Sp, Year, 0),"},{"citing_arxiv_id":"2603.06847","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Characterizing Faults in Agentic AI: A Taxonomy of Types, Symptoms, and Root Causes","primary_cat":"cs.SE","submitted_at":"2026-03-06T20:12:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"An empirical study of real-world issues yields a taxonomy of 34 fault types, symptoms, and root causes in agentic AI systems, validated by 145 practitioners.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}