hub

Meta SecAlign: A secure foundation LLM against prompt injection attacks

Sizhe Chen, Arman Zharmagambetov, David Wagner, Chuan Guo · 2025 · arXiv 2507.02735

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

cs.CR · 2026-05-09 · unverdicted · novelty 8.0 · 3 refs

ShadowMerge exploits relation-channel conflicts to poison graph-based agent memory, achieving 93.8% average attack success rate on Mem0 and real-world datasets while bypassing existing defenses.

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

cs.CR · 2026-04-08 · unverdicted · novelty 8.0

TRUSTDESC prevents tool poisoning in LLM applications by automatically generating accurate tool descriptions from code via a three-stage pipeline of reachability analysis, description synthesis, and dynamic verification.

Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains

cs.AI · 2026-05-19 · unverdicted · novelty 7.0

Introduces the Grounded Observer framework that applies robotics-inspired formal constructs for runtime constraint enforcement on foundation model interaction trajectories in socially sensitive domains.

No More, No Less: Task Alignment in Terminal Agents

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

The TAB benchmark reveals that frontier terminal agents achieve high task completion but low selective alignment with relevant environmental cues over distractors, and prompt-injection defenses block both.

AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?

cs.CR · 2026-02-03 · accept · novelty 7.0

AgentDyn benchmark demonstrates that current AI agent defenses against prompt injection fail to handle dynamic real-world conditions.

LocalAlign: Enabling Generalizable Prompt Injection Defense via Generation of Near-Target Adversarial Examples for Alignment Training

cs.CR · 2026-05-02 · unverdicted · novelty 6.0

LocalAlign generates near-target adversarial examples via prompting and applies margin-aware alignment training to enforce tighter boundaries against prompt injection attacks.

CleanBase: Detecting Malicious Documents in RAG Knowledge Databases

cs.CR · 2026-05-01 · unverdicted · novelty 6.0

CleanBase identifies malicious documents in RAG databases by detecting cliques in a semantic similarity graph constructed using embedding models and a statistical threshold.

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

cs.CR · 2026-04-30 · unverdicted · novelty 6.0

FlashRT delivers 2x-7x speedup and 2x-4x GPU memory reduction for prompt injection and knowledge corruption attacks on long-context LLMs versus nanoGCG.

Progent: Securing AI Agents with Privilege Control

cs.CR · 2025-04-16 · unverdicted · novelty 6.0

Progent introduces a privilege-control framework for AI agents that uses LLM-generated symbolic rules over tools, SMT-solver-enforced monotonic updates, and deterministic checks to reduce attack success rates on AgentDojo and ASB benchmarks.

PIArena: A Platform for Prompt Injection Evaluation

cs.CR · 2026-04-09 · unverdicted · novelty 5.0

PIArena provides a unified evaluation platform for prompt injection attacks and defenses, featuring a new adaptive attack that reveals major weaknesses in existing protections.

Agent Security is a Systems Problem

cs.CR · 2026-05-18 · unverdicted · novelty 4.0 · 2 refs

The paper argues that agent security is best addressed as a systems problem by applying principles from operating systems, networks, and formal methods rather than relying solely on model robustness improvements.

citing papers explorer

Showing 11 of 11 citing papers.

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts cs.CR · 2026-05-09 · unverdicted · none · ref 58 · 3 links
ShadowMerge exploits relation-channel conflicts to poison graph-based agent memory, achieving 93.8% average attack success rate on Mem0 and real-world datasets while bypassing existing defenses.
TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation cs.CR · 2026-04-08 · unverdicted · none · ref 28
TRUSTDESC prevents tool poisoning in LLM applications by automatically generating accurate tool descriptions from code via a three-stage pipeline of reachability analysis, description synthesis, and dynamic verification.
Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains cs.AI · 2026-05-19 · unverdicted · none · ref 9
Introduces the Grounded Observer framework that applies robotics-inspired formal constructs for runtime constraint enforcement on foundation model interaction trajectories in socially sensitive domains.
No More, No Less: Task Alignment in Terminal Agents cs.LG · 2026-05-12 · unverdicted · none · ref 34
The TAB benchmark reveals that frontier terminal agents achieve high task completion but low selective alignment with relevant environmental cues over distractors, and prompt-injection defenses block both.
AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments? cs.CR · 2026-02-03 · accept · none · ref 1
AgentDyn benchmark demonstrates that current AI agent defenses against prompt injection fail to handle dynamic real-world conditions.
LocalAlign: Enabling Generalizable Prompt Injection Defense via Generation of Near-Target Adversarial Examples for Alignment Training cs.CR · 2026-05-02 · unverdicted · none · ref 5
LocalAlign generates near-target adversarial examples via prompting and applies margin-aware alignment training to enforce tighter boundaries against prompt injection attacks.
CleanBase: Detecting Malicious Documents in RAG Knowledge Databases cs.CR · 2026-05-01 · unverdicted · none · ref 63
CleanBase identifies malicious documents in RAG databases by detecting cliques in a semantic similarity graph constructed using embedding models and a statistical threshold.
FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption cs.CR · 2026-04-30 · unverdicted · none · ref 25
FlashRT delivers 2x-7x speedup and 2x-4x GPU memory reduction for prompt injection and knowledge corruption attacks on long-context LLMs versus nanoGCG.
Progent: Securing AI Agents with Privilege Control cs.CR · 2025-04-16 · unverdicted · none · ref 10
Progent introduces a privilege-control framework for AI agents that uses LLM-generated symbolic rules over tools, SMT-solver-enforced monotonic updates, and deterministic checks to reduce attack success rates on AgentDojo and ASB benchmarks.
PIArena: A Platform for Prompt Injection Evaluation cs.CR · 2026-04-09 · unverdicted · none · ref 3
PIArena provides a unified evaluation platform for prompt injection attacks and defenses, featuring a new adaptive attack that reveals major weaknesses in existing protections.
Agent Security is a Systems Problem cs.CR · 2026-05-18 · unverdicted · none · ref 11 · 2 links
The paper argues that agent security is best addressed as a systems problem by applying principles from operating systems, networks, and formal methods rather than relying solely on model robustness improvements.

Meta SecAlign: A secure foundation LLM against prompt injection attacks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer