ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.
Llm as os, agents as apps: Envisioning aios, agents and the aios-agent ecosystem, 2023 b
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.
Deterministic multi-agent intent routing can reduce hallucinations in generative engines to near zero by limiting LLMs to intent routers and handing off tasks to specialized agents.
A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.
citing papers explorer
-
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.
-
OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces
OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.
-
Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative Engine Optimization
Deterministic multi-agent intent routing can reduce hallucinations in generative engines to near zero by limiting LLMs to intent routers and handing off tasks to specialized agents.
-
A Survey on the Memory Mechanism of Large Language Model based Agents
A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.