Chateval: Towards better llm-based evaluators through multi-agent debate,

Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, Zhiyuan Liu

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Stop Drawing Scientific Claims from LLM Social Simulations Without Robustness Audits

physics.soc-ph · 2026-05-17 · accept · novelty 6.0

Minor perturbations in persona format, instruction framing, and network structure shift cooperation by up to 76 percentage points and polarization metrics consistently, showing that LLM social simulations require per-claim robustness audits via the new TRAILS taxonomy.

citing papers explorer

Showing 1 of 1 citing paper.

Stop Drawing Scientific Claims from LLM Social Simulations Without Robustness Audits physics.soc-ph · 2026-05-17 · accept · none · ref 10
Minor perturbations in persona format, instruction framing, and network structure shift cooperation by up to 76 percentage points and polarization metrics consistently, showing that LLM social simulations require per-claim robustness audits via the new TRAILS taxonomy.

Chateval: Towards better llm-based evaluators through multi-agent debate,

fields

years

verdicts

representative citing papers

citing papers explorer