SEATauBench is the first agent benchmark for SEA languages, finding that performance holds for language-only changes but degrades sharply with full domain localization.
CRMA rena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Formalizes stored prompt injection in agentic systems, develops a taxonomy and benchmark to show how adversarial prompts can persist across sessions via persistent state artifacts.
citing papers explorer
-
SEATauBench: Adapting Tool-Agent-User Evaluation Into Low-Resource Southeast Asian Languages
SEATauBench is the first agent benchmark for SEA languages, finding that performance holds for language-only changes but degrades sharply with full domain localization.
-
What If Prompt Injection Never Left? Exploring Cross-Session Stored Prompt Injection in Agentic Systems
Formalizes stored prompt injection in agentic systems, develops a taxonomy and benchmark to show how adversarial prompts can persist across sessions via persistent state artifacts.