Agentbench: Evaluating llms as agents

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, et al · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents

cs.CL · 2026-05-22 · unverdicted · novelty 6.0

OpenSkillEval automatically builds realistic tasks from evolving artifacts to audit skill effectiveness in LLM agents, finding that skill use depends on model and framework and that many popular skills do not outperform base agents.

citing papers explorer

Showing 1 of 1 citing paper.

OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents cs.CL · 2026-05-22 · unverdicted · none · ref 30
OpenSkillEval automatically builds realistic tasks from evolving artifacts to audit skill effectiveness in LLM agents, finding that skill use depends on model and framework and that many popular skills do not outperform base agents.

Agentbench: Evaluating llms as agents

fields

years

verdicts

representative citing papers

citing papers explorer