pith. sign in

CyberJurors: A Multi-Agent Simulation Task for E-Commerce Disputes Verdict

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

E-commerce platforms have begun recruiting crowdsourced jurors to adjudicate massive volumes of transaction disputes. Unlike formal legal judgment, E-commerce dispute verdicts require grounding pivotal clues from redundant, multi-round, multimodal evidence and making decisions under flexible platform-specific conventions. These characteristics render existing methods insufficient for this scenario. To bridge this gap, we introduce a pioneering task, E-commerce Dispute Verdicts (EDV), and present VerdictBench, a multimodal benchmark comprising 6,000 real-world cases designed to reflect crowdsourced jury decisions. Building upon this, we propose CyberJurors, a multi-agent framework to clarify the dispute logic and regulate the verdict process. At the individual level, Individual Verdict Chain-of-Thought decomposes the EDV task into four structured reasoning stages, enabling fine-grained clue perception and clarifying causal logic between pivotal clues and the dispute focus. At the collective level, Jury Consensus Verdict simulates multi-round discussion and voting among jurors, while incorporating verdict precedents to mitigate cognitive biases toward either disputant. Experiments on VerdictBench show that CyberJurors outperforms state-of-the-art LLMs, MLLMs, and court simulators, while achieving stronger alignment with real-world jury voting patterns. Code and dataset are available at https://github.com/YanhuiS/CyberJurors and https://huggingface.co/datasets/piggi/VerdictBench.

fields

cs.MA 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper.

  • Modeling U.S. Attitudes Toward China via an Event-Steered Multi-Agent Simulator cs.MA · 2026-06-05 · unverdicted · none · ref 28 · internal anchor

    ES-MAS combines a new CURE dataset of 258 events and 14,000 news items with dual-stream integration and localized interaction modules to simulate opinion dynamics and claims better reproduction of historical U.S.-China attitude trends than prior simulators.