CollabBench: Benchmarking and Unleashing Collaborative Ability of LLMs with Diverse Players via Proactive Engagement

Aimin Zhou; Hanjie Ge; Haotian Shi; Hong Qian; Jingwen Yang; Liang Dou; Xiangfeng Wang; Yuanhao Liu; Zihan Zhou; Zongbao Zhang

arxiv: 2606.05793 · v1 · pith:UKR4GJ33new · submitted 2026-06-04 · 💻 cs.CL · cs.AI· cs.CY· cs.LG

CollabBench: Benchmarking and Unleashing Collaborative Ability of LLMs with Diverse Players via Proactive Engagement

Hong Qian , Yuanhao Liu , Zihan Zhou , Zongbao Zhang , Hanjie Ge , Haotian Shi , Liang Dou , Xiangfeng Wang

show 2 more authors

Jingwen Yang Aimin Zhou

This is my paper

classification 💻 cs.CL cs.AIcs.CYcs.LG

keywords collaborativeaffectivecollabbenchdiverseefficiencymodelstrainingagentic

0 comments

read the original abstract

While LLM-based agents excel at individual tasks, effective collaboration with realistic human partners remains challenging. Most of the existing conversation-level collaborative studies lack grounded interaction and behavioral execution, motivating the need for cooperative game environments that enable contextualized and immersive collaboration. To this end, this paper proposes CollabBench, a benchmark for evaluating and training collaborative agents in cooperative games. CollabBench features a Diverse Player Profile Simulation pipeline to model varied players behaviors, and a Collaborative Agentic Training paradigm that unifies reasoning, communication, and action via agentic rollouts, optimized with a hybrid reward balancing task efficiency and affective adaptation. We further extend classic environments to CWAH-MultiPlayer and Cook-MultiPlayer for systematic evaluation under diverse personalities. Experiments with efficiency and affective metrics show that our trained models outperform base models, achieving 19.5% higher efficiency and 24.4% improved affective performance. Further analysis reveals key collaborative limitations of existing models and offers insights for future collaborative training.

This paper has not been read by Pith yet.

CollabBench: Benchmarking and Unleashing Collaborative Ability of LLMs with Diverse Players via Proactive Engagement

discussion (0)