Title resolution pending

At each step, you need to analyze the current status, determine the next course of action, whether to execute a function call

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

MemEvoBench: Benchmarking Safety Risks from Memory Misevolution in LLM Agents

cs.CL · 2026-04-17 · unverdicted · novelty 8.0 · 2 refs

MemEvoBench is presented as the first standardized benchmark for long-horizon memory safety in LLM agents, covering adversarial memory injection, noisy tool outputs, and biased feedback across QA and workflow tasks.

Agent-SafetyBench: Evaluating the Safety of LLM Agents

cs.CL · 2024-12-19 · conditional · novelty 7.0

Agent-SafetyBench shows no tested LLM agent exceeds 60% safety score, attributing failures to lack of robustness and risk awareness.

citing papers explorer

Showing 2 of 2 citing papers.

MemEvoBench: Benchmarking Safety Risks from Memory Misevolution in LLM Agents cs.CL · 2026-04-17 · unverdicted · none · ref 60 · 2 links
MemEvoBench is presented as the first standardized benchmark for long-horizon memory safety in LLM agents, covering adversarial memory injection, noisy tool outputs, and biased feedback across QA and workflow tasks.
Agent-SafetyBench: Evaluating the Safety of LLM Agents cs.CL · 2024-12-19 · conditional · none · ref 19
Agent-SafetyBench shows no tested LLM agent exceeds 60% safety score, attributing failures to lack of robustness and risk awareness.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer