AgentBoard: An analytical evaluation board of multi-turn LLM agents

Chang Ma, Junlei Zhang, Zhihao Zhu, Cheng Yang, Yujiu Yang, Yaohui Jin, Zhenzhong Lan, Lingpeng Kong, Junxian He · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Failure-Centered Runtime Evaluation for Deployed Trilingual Public-Space Agents

cs.AI · 2026-04-27 · unverdicted · novelty 5.0

PSA-Eval reframes evaluation of trilingual public-space agents around traceable failures and regression testing, revealing cross-language score drift in a pilot despite high average performance.

citing papers explorer

Showing 1 of 1 citing paper.

Failure-Centered Runtime Evaluation for Deployed Trilingual Public-Space Agents cs.AI · 2026-04-27 · unverdicted · none · ref 8
PSA-Eval reframes evaluation of trilingual public-space agents around traceable failures and regression testing, revealing cross-language score drift in a pilot despite high average performance.

AgentBoard: An analytical evaluation board of multi-turn LLM agents

fields

years

verdicts

representative citing papers

citing papers explorer