hub Mixed citations

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan · 2024 · cs.CR · arXiv 2410.02644

Mixed citation behavior. Most common role is background (62%).

55 Pith papers citing it

Background 62% of classified citations

open full Pith review browse 55 citing papers arXiv PDF

abstract

Although LLM-based agents, powered by Large Language Models (LLMs), can use external tools and memory mechanisms to solve complex real-world tasks, they may also introduce critical security vulnerabilities. However, the existing literature does not comprehensively evaluate attacks and defenses against LLM-based agents. To address this, we introduce Agent Security Bench (ASB), a comprehensive framework designed to formalize, benchmark, and evaluate the attacks and defenses of LLM-based agents, including 10 scenarios (e.g., e-commerce, autonomous driving, finance), 10 agents targeting the scenarios, over 400 tools, 27 different types of attack/defense methods, and 7 evaluation metrics. Based on ASB, we benchmark 10 prompt injection attacks, a memory poisoning attack, a novel Plan-of-Thought backdoor attack, 4 mixed attacks, and 11 corresponding defenses across 13 LLM backbones. Our benchmark results reveal critical vulnerabilities in different stages of agent operation, including system prompt, user prompt handling, tool usage, and memory retrieval, with the highest average attack success rate of 84.30\%, but limited effectiveness shown in current defenses, unveiling important works to be done in terms of agent security for the community. We also introduce a new metric to evaluate the agents' capability to balance utility and security. Our code can be found at https://github.com/agiresearch/ASB.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9 baseline 2 dataset 2

citation-polarity summary

background 8 baseline 2 support 1 unclear 1 use dataset 1

representative citing papers

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

cs.CL · 2026-05-11 · unverdicted · novelty 8.0

A new native-runtime benchmark reveals that current frontier AI agents succeed on at most 62 percent of realistic long-horizon CLI tasks.

Demystifying and Detecting Agentic Workflow Injection Vulnerabilities in GitHub Actions

cs.CR · 2026-05-08 · conditional · novelty 8.0 · 2 refs

Agentic Workflow Injection is a new injection vulnerability class in LLM-augmented GitHub Actions, with two patterns (P2A and P2S) detected via the TaintAWI tool yielding 496 confirmed exploitable instances across 13,392 workflows.

Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents

cs.CY · 2026-04-11 · accept · novelty 8.0

This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

cs.CR · 2026-04-07 · unverdicted · novelty 8.0

No continuous utility-preserving input wrapper can eliminate all prompt injection risks in connected prompt spaces for language models.

AutoDojo: Adaptive Black-Box Attacks Reveal the Limits of IPI Defenses and Task-Specification Effects in LLM Agents

cs.CR · 2026-06-13 · unverdicted · novelty 7.0

AutoDojo adaptively optimizes IPI attacks to bypass defenses, recovering substantial ASR on action-open tasks where static attacks fail.

From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails

cs.CR · 2026-06-12 · unverdicted · novelty 7.0

Attackers can force LLM guardrails into extended reasoning loops via optimized payloads, causing 13-63x token amplification and up to 148x latency in agent systems.

Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

cs.CR · 2026-06-11 · unverdicted · novelty 7.0

Introduces a stakeholder-centric benchmark showing current web agents fail all tested prompt injection objectives, with failures falling into stealthy parasitism, misaligned disruption, or compounded failure modes.

SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems

cs.CR · 2026-06-10 · unverdicted · novelty 7.0

SMSR is the first defense with a certified robustness bound against multi-session memory poisoning in persistent LLM agents, combining HMAC provenance signing with randomized ablation and verdict-based voting.

Brain-Prompt Injection: A Route-Safety Audit for BCI-LLM Agents

cs.CR · 2026-06-08 · unverdicted · novelty 7.0

Introduces brain-prompt injection attacks on BCI-to-LLM agent pipelines and defines a Route-Safety Audit Contract with separation theorem, C3 decomposition, and split-conformal calibration to bound false-accept rates on EEG data.

Towards Demystifying and Repairing LLM-in-the-Loop Vulnerabilities

cs.SE · 2026-05-27 · unverdicted · novelty 7.0

Authors create LLMCVE dataset of LLM-in-the-loop vulnerabilities and demonstrate that agent-based repair methods achieve low success rates on them, particularly prompt injections at 28.57% Pass@1.

Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback

cs.CR · 2026-05-17 · unverdicted · novelty 7.0

Presents TRUST-Bench benchmark for hidden-trigger tool compromises in LLM agents and VISTA-Guard framework for trajectory-aware risk scoring of final actions under untrusted feedback.

Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

HAM³ achieves up to 78.3% attack success rate on the GQA benchmark by hierarchically attacking perception, communication, and reasoning layers in multi-modal multi-agent systems.

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

cs.CR · 2026-05-12 · unverdicted · novelty 7.0

SkillSafetyBench is a benchmark of 155 cases across 47 tasks and 6 risk domains showing that non-user attacks via skills, artifacts, or environments can consistently induce unsafe agent behavior.

EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.

MEMSAD: Gradient-Coupled Anomaly Detection for Memory Poisoning in Retrieval-Augmented Agents

cs.CR · 2026-05-05 · unverdicted · novelty 7.0 · 2 refs

MEMSAD links anomaly detection gradients to retrieval objectives under encoder regularity to certify detection of continuous memory poisons, achieving perfect TPR/FPR in experiments while exposing a synonym-invariance gap.

Toward a Principled Framework for Agent Safety Measurement

cs.CR · 2026-05-02 · unverdicted · novelty 7.0

BOA uses budgeted search over agent trajectories to report the probability an LLM agent stays safe, finding unsafe paths that sampling misses.

Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

cs.CR · 2026-04-29 · unverdicted · novelty 7.0

A parameterized DFA firewall enforces safe tool sequences for structured AI agents, reducing attack success rates to 2.2% in tested workflows with low added latency.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

Beyond Pattern Matching: Seven Cross-Domain Techniques for Prompt Injection Detection

cs.CR · 2026-04-20 · unverdicted · novelty 7.0 · 2 refs

The work introduces and partially evaluates seven cross-domain prompt injection detectors, reporting F1 gains on benchmarks like deepset/prompt-injections and indirect-injection sets via local alignment, stylometry, and fatigue tracking.

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

cs.AI · 2026-04-03 · unverdicted · novelty 7.0

AgentHazard benchmark shows computer-use agents remain highly vulnerable, with attack success rates reaching 73.63% on models like Qwen3-Coder powering Claude Code.

Formal Policy Enforcement for Real-World Agentic Systems

cs.CR · 2026-02-18 · unverdicted · novelty 7.0

FORGE enforces security policies in agentic systems via Datalog over abstract predicates with an observability service and reference monitor that guarantees policy semantics when the environment contract holds.

Understanding and Evaluating Claw-like Agent Security Through a Computer-Systems Lens

cs.CR · 2026-06-29 · unverdicted · novelty 6.0

The paper introduces SafeClawArena, a 406-task benchmark evaluating security failures in three Claw-like agent platforms across skill supply-chain, state exploitation, data flow, and prompt injection surfaces.

RIFT-Bench: Dynamic Red-teaming For Agentic AI Systems

cs.AI · 2026-06-22 · unverdicted · novelty 6.0

RIFT-Bench is a graph representation-driven methodology for dynamic red-teaming that enables unified evaluations across diverse agentic AI architectures, demonstrated on 45 systems.

Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents

cs.CR · 2026-05-26 · unverdicted · novelty 6.0

AuthGraph aligns an execution provenance graph with a clean authorization graph to detect parameter-source deviations from user intent, reducing attack success rates to 1-2% on AgentDojo and AgentDyn while retaining most task utility.

citing papers explorer

Showing 48 of 48 citing papers after filters.

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation cs.CL · 2026-05-11 · unverdicted · none · ref 54 · internal anchor
A new native-runtime benchmark reveals that current frontier AI agents succeed on at most 62 percent of realistic long-horizon CLI tasks.
The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail? cs.CR · 2026-04-07 · unverdicted · full · ref 29 · internal anchor
No continuous utility-preserving input wrapper can eliminate all prompt injection risks in connected prompt spaces for language models.
AutoDojo: Adaptive Black-Box Attacks Reveal the Limits of IPI Defenses and Task-Specification Effects in LLM Agents cs.CR · 2026-06-13 · unverdicted · none · ref 21 · internal anchor
AutoDojo adaptively optimizes IPI attacks to bypass defenses, recovering substantial ASR on action-open tasks where static attacks fail.
From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails cs.CR · 2026-06-12 · unverdicted · none · ref 24 · internal anchor
Attackers can force LLM guardrails into extended reasoning loops via optimized payloads, causing 13-63x token amplification and up to 148x latency in agent systems.
Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents cs.CR · 2026-06-11 · unverdicted · none · ref 16 · internal anchor
Introduces a stakeholder-centric benchmark showing current web agents fail all tested prompt injection objectives, with failures falling into stealthy parasitism, misaligned disruption, or compounded failure modes.
SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems cs.CR · 2026-06-10 · unverdicted · none · ref 12 · internal anchor
SMSR is the first defense with a certified robustness bound against multi-session memory poisoning in persistent LLM agents, combining HMAC provenance signing with randomized ablation and verdict-based voting.
Brain-Prompt Injection: A Route-Safety Audit for BCI-LLM Agents cs.CR · 2026-06-08 · unverdicted · none · ref 24 · internal anchor
Introduces brain-prompt injection attacks on BCI-to-LLM agent pipelines and defines a Route-Safety Audit Contract with separation theorem, C3 decomposition, and split-conformal calibration to bound false-accept rates on EEG data.
Towards Demystifying and Repairing LLM-in-the-Loop Vulnerabilities cs.SE · 2026-05-27 · unverdicted · none · ref 64 · internal anchor
Authors create LLMCVE dataset of LLM-in-the-loop vulnerabilities and demonstrate that agent-based repair methods achieve low success rates on them, particularly prompt injections at 28.57% Pass@1.
Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback cs.CR · 2026-05-17 · unverdicted · none · ref 9 · internal anchor
Presents TRUST-Bench benchmark for hidden-trigger tool compromises in LLM agents and VISTA-Guard framework for trajectory-aware risk scoring of final actions under untrusted feedback.
Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning cs.AI · 2026-05-13 · unverdicted · none · ref 45 · internal anchor
HAM³ achieves up to 78.3% attack success rate on the GQA benchmark by hierarchically attacking perception, communication, and reasoning layers in multi-modal multi-agent systems.
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces cs.CR · 2026-05-12 · unverdicted · none · ref 92 · internal anchor
SkillSafetyBench is a benchmark of 155 cases across 47 tasks and 6 risk domains showing that non-user attacks via skills, artifacts, or environments can consistently induce unsafe agent behavior.
EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium cs.AI · 2026-05-10 · unverdicted · none · ref 90 · internal anchor
EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.
MEMSAD: Gradient-Coupled Anomaly Detection for Memory Poisoning in Retrieval-Augmented Agents cs.CR · 2026-05-05 · unverdicted · none · ref 15 · 2 links · internal anchor
MEMSAD links anomaly detection gradients to retrieval objectives under encoder regularity to certify detection of continuous memory poisons, achieving perfect TPR/FPR in experiments while exposing a synonym-invariance gap.
Toward a Principled Framework for Agent Safety Measurement cs.CR · 2026-05-02 · unverdicted · none · ref 22 · internal anchor
BOA uses budgeted search over agent trajectories to report the probability an LLM agent stays safe, finding unsafe paths that sampling misses.
Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents cs.CR · 2026-04-29 · unverdicted · none · ref 21 · internal anchor
A parameterized DFA firewall enforces safe tool sequences for structured AI agents, reducing attack success rates to 2.2% in tested workflows with low added latency.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 94 · internal anchor
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Beyond Pattern Matching: Seven Cross-Domain Techniques for Prompt Injection Detection cs.CR · 2026-04-20 · unverdicted · none · ref 25 · 2 links · internal anchor
The work introduces and partially evaluates seven cross-domain prompt injection detectors, reporting F1 gains on benchmarks like deepset/prompt-injections and indirect-injection sets via local alignment, stylometry, and fatigue tracking.
AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents cs.AI · 2026-04-03 · unverdicted · none · ref 32 · internal anchor
AgentHazard benchmark shows computer-use agents remain highly vulnerable, with attack success rates reaching 73.63% on models like Qwen3-Coder powering Claude Code.
Formal Policy Enforcement for Real-World Agentic Systems cs.CR · 2026-02-18 · unverdicted · none · ref 68 · internal anchor
FORGE enforces security policies in agentic systems via Datalog over abstract predicates with an observability service and reference monitor that guarantees policy semantics when the environment contract holds.
Understanding and Evaluating Claw-like Agent Security Through a Computer-Systems Lens cs.CR · 2026-06-29 · unverdicted · none · ref 39 · internal anchor
The paper introduces SafeClawArena, a 406-task benchmark evaluating security failures in three Claw-like agent platforms across skill supply-chain, state exploitation, data flow, and prompt injection surfaces.
RIFT-Bench: Dynamic Red-teaming For Agentic AI Systems cs.AI · 2026-06-22 · unverdicted · none · ref 8 · internal anchor
RIFT-Bench is a graph representation-driven methodology for dynamic red-teaming that enables unified evaluations across diverse agentic AI architectures, demonstrated on 45 systems.
Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents cs.CR · 2026-05-26 · unverdicted · none · ref 27 · internal anchor
AuthGraph aligns an execution provenance graph with a clean authorization graph to detect parameter-source deviations from user intent, reducing attack success rates to 1-2% on AgentDojo and AgentDyn while retaining most task utility.
Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks cs.CR · 2026-05-18 · unverdicted · none · ref 19 · internal anchor
OBBR projects poisoned samples into benign space via rewriting with open-book examples, raising safety performance by 51% on average versus prior defenses across five attacks and four LLMs.
LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection cs.CR · 2026-05-18 · unverdicted · none · ref 5 · 2 links · internal anchor
LivePI benchmark reports indirect prompt injection success rates of 10.7-29.6% across five models on seven input surfaces and shows a two-layer defense blocking all malicious completions while preserving utility.
Web Agents Should Adopt the Plan-Then-Execute Paradigm cs.CR · 2026-05-14 · unverdicted · none · ref 37 · internal anchor
Web agents should default to planning a complete task program before observing live web content to reduce prompt injection exposure, since WebArena tasks are compatible and 80% need no runtime LLM calls.
Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw cs.CR · 2026-05-11 · unverdicted · none · ref 17 · internal anchor
DeepTrap automates discovery of contextual vulnerabilities in OpenClaw agents via trajectory optimization, showing that unsafe behavior can be induced while preserving task completion and that final-response checks are insufficient.
When Agents Overtrust Environmental Evidence: An Extensible Agentic Framework for Benchmarking Evidence-Grounding Defects in LLM Agents cs.AI · 2026-05-09 · unverdicted · none · ref 28 · 2 links · internal anchor
EnvTrustBench is a new agentic benchmark that measures evidence-grounding defects where LLM agents overtrust faulty environmental observations and take incorrect actions.
When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks cs.CR · 2026-05-08 · unverdicted · none · ref 57 · internal anchor
Multi-agent LLM frameworks can spread compromises across agent boundaries via insecure memory inheritance during subagent spawning.
Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem cs.SE · 2026-05-08 · unverdicted · none · ref 65 · internal anchor
MCP-BiFlow detects 93.8% of known bidirectional data-flow vulnerabilities in MCP servers and identifies 118 confirmed issues across 87 real-world servers from a scan of 15,452 repositories.
SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills cs.CR · 2026-05-07 · unverdicted · none · ref 61 · internal anchor
SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.
ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection cs.CR · 2026-05-05 · unverdicted · none · ref 10 · internal anchor
ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration cs.CR · 2026-05-03 · unverdicted · none · ref 92 · 2 links · internal anchor
The paper defines and evaluates Trojan Hippo attacks on LLM agent memory, showing 85-100% success in data exfiltration across backends and reduced rates with defenses at varying utility costs.
Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis cs.CR · 2026-05-01 · unverdicted · none · ref 49 · internal anchor
Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on expert-labeled samples.
SUDP: Secret-Use Delegation Protocol for Agentic Systems cs.CR · 2026-04-27 · unverdicted · none · ref 4 · 2 links · internal anchor
SUDP is a three-party protocol in which an agent proposes an operation, the user issues a fresh grant, and a custodian executes it, satisfying seven security properties for bounded secret use without reusable authority transfer.
Security Considerations for Multi-agent Systems cs.CR · 2026-03-09 · unverdicted · none · ref 160 · internal anchor
No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
Tracking Capabilities for Safer Agents cs.AI · 2026-03-01 · unverdicted · none · ref 86 · internal anchor
AI agents can generate code in a capability-safe Scala dialect that statically prevents information leakage and malicious side effects while preserving task performance.
Agents That Know Too Much: A Data-Centric Survey of Privacy in LLM Agents cs.CR · 2026-06-25 · unverdicted · none · ref 129 · internal anchor
A data-centric survey finds that only information-flow control covers compositional and cross-session leakage in LLM agents and that no single benchmark tests an agent across all its data surfaces under one policy.
Think Twice Before You Act: Protecting LLM Agents Against Tool Description Poisoning via Isolated Planning cs.CR · 2026-06-18 · unverdicted · none · ref 25 · internal anchor
Tool-Guard uses isolated planning to quarantine suspicious tools, reducing tool description poisoning attacks on LLM agents while preserving task utility on AgentDojo and ASB benchmarks.
Order Is Not Control cs.LG · 2026-06-11 · unverdicted · none · ref 40 · internal anchor
Order is distinct from control, where control is defined as a local receiver-gated response law demonstrated across biological circuits and LLM response panels with reported prediction accuracies of 72-84%.
Personalization Meets Safety:Mechanisms,Risks,and Mitigations in Personalized LLMs cs.AI · 2026-06-08 · unverdicted · none · ref 246 · internal anchor
A survey that maps safety risks in personalized LLMs, introduces a unified taxonomy, and highlights three structural inadequacies in existing research on user-invariant safety, isolated techniques, and short-term evaluations.
ADR: An Agentic Detection System for Enterprise Agentic AI Security cs.AI · 2026-05-17 · unverdicted · none · ref 5 · internal anchor
ADR is a three-component detection system for AI agents that combines telemetry sensors, red teaming, and two-tier detection, achieving 97.2% precision in a ten-month Uber deployment and outperforming baselines on the new ADR-Bench.
Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation cs.CR · 2026-05-07 · unverdicted · none · ref 24 · internal anchor
A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.
SafeAgent: A Runtime Protection Architecture for Agentic Systems cs.AI · 2026-04-19 · unverdicted · none · ref 19 · internal anchor
SafeAgent is a stateful runtime protection system that improves LLM agent robustness to prompt injections over baselines while preserving task performance.
SoK: Security of Autonomous LLM Agents in Agentic Commerce cs.CR · 2026-04-15 · unverdicted · none · ref 136 · internal anchor
The paper systematizes security for LLM agents in agentic commerce into five threat dimensions, identifies 12 cross-layer attack vectors, and proposes a layered defense architecture.
DRAFT: Task Decoupled Latent Reasoning for Agent Safety cs.LG · 2026-02-11 · unverdicted · none · ref 24 · internal anchor
DRAFT decouples agent safety judgment into latent extraction and reasoning stages, raising average benchmark accuracy from 63.27% to 91.18%.
Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability cs.CL · 2026-05-08 · unverdicted · none · ref 154 · internal anchor
The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges cs.AI · 2025-10-27 · unverdicted · none · ref 241 · internal anchor
A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.
AI-Driven Security Alert Screening and Alert Fatigue Mitigation in Security Operations Centers: A Comprehensive Survey cs.CR · 2026-05-08 · unverdicted · none · ref 167 · internal anchor
A literature survey synthesizes 119 studies on AI-driven alert screening into a four-stage taxonomy of filtering, triage, correlation, and generative augmentation while identifying gaps in deployment realism and robustness.

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer