PEAR: Planner-executor agent robustness benchmark

Shen Dong, Mingxuan Zhang, Pengfei He, Li Ma, Bhavani Thuraisingham, Hui Liu, Yue Xing · 2026 · DOI 10.18653/v1/2026.findings-eacl.237

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

Specialize Roles, Mix Deployments: Pushing the Cost-Accuracy Frontier of LLM Agent Teams

cs.MA · 2026-05-28 · unverdicted · novelty 7.0

AgentCARD benchmark shows heterogeneous LLM agent teams with mixed deployments reach the cost-accuracy frontier, delivering up to 44% higher accuracy or 12x lower cost than uniform teams, with domain-specific role bottlenecks.

citing papers explorer

Showing 1 of 1 citing paper.

Specialize Roles, Mix Deployments: Pushing the Cost-Accuracy Frontier of LLM Agent Teams cs.MA · 2026-05-28 · unverdicted · none · ref 25
AgentCARD benchmark shows heterogeneous LLM agent teams with mixed deployments reach the cost-accuracy frontier, delivering up to 44% higher accuracy or 12x lower cost than uniform teams, with domain-specific role bottlenecks.

PEAR: Planner-executor agent robustness benchmark

fields

years

verdicts

representative citing papers

citing papers explorer