mSystems, 9(12):e00568–24

Qisen Yang, Zekun Wang, Honghui Chen, Shenzhi Wang, Yifan Pu, Xin Gao, Wenhao Huang, Shiji Song, Gao Huang · 2024 · arXiv 2402.12326

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

cs.CR · 2024-10-03 · unverdicted · novelty 7.0

ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.

GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

cs.LG · 2024-06-13 · unverdicted · novelty 6.0

GuardAgent safeguards LLM agents by generating task plans from safety requests and mapping them to executable guardrail code, achieving over 98% accuracy on a healthcare access-control benchmark and 83% on a web safety benchmark.

Inertia in Moral and Value Judgments of Large Language Models

cs.CL · 2024-08-16 · unverdicted · novelty 4.0

LLMs exhibit persistent inertia in value orientations, with harm avoidance and fairness remaining skewed across persona prompts.

citing papers explorer

Showing 3 of 3 citing papers.

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents cs.CR · 2024-10-03 · unverdicted · none · ref 153
ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning cs.LG · 2024-06-13 · unverdicted · none · ref 6
GuardAgent safeguards LLM agents by generating task plans from safety requests and mapping them to executable guardrail code, achieving over 98% accuracy on a healthcare access-control benchmark and 83% on a web safety benchmark.
Inertia in Moral and Value Judgments of Large Language Models cs.CL · 2024-08-16 · unverdicted · none · ref 52
LLMs exhibit persistent inertia in value orientations, with harm avoidance and fairness remaining skewed across persona prompts.

mSystems, 9(12):e00568–24

fields

years

verdicts

representative citing papers

citing papers explorer