AgentBench is a new multi-environment benchmark showing commercial LLMs outperform open-source models up to 70B parameters in agent tasks mainly due to better long-term reasoning and instruction following.
Based on the information of the truth and the user’s past questions, you answer the user’s questions
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2023 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
AgentBench: Evaluating LLMs as Agents
AgentBench is a new multi-environment benchmark showing commercial LLMs outperform open-source models up to 70B parameters in agent tasks mainly due to better long-term reasoning and instruction following.