Stay Focused: Problem Drift in Multi-Agent Debate

· 2025 · cs.CL · arXiv 2502.19559

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

Multi-agent debate - multiple instances of large language models discussing problems in turn-based interaction - has shown promise for solving knowledge and reasoning tasks. However, these methods show limitations when solving complex problems that require longer reasoning chains. We analyze how multi-agent debate drifts away from the initial problem over multiple turns, thus harming task performance. We define this phenomenon as problem drift and quantify its presence across ten tasks (i.e., three generative, three knowledge, three reasoning, and one instruction-following task). We find that generative tasks drift often due to the subjectivity of the answer space (76-89%), compared to high-complexity tasks (7-21%). To identify the reasons, eight human experts analyze 170 multi-agent debates suffering from problem drift. We find the most common issues related to this drift are the lack of progress (35% of cases), low-quality feedback (26% of cases), and a lack of clarity (25% of cases). We propose DRIFTJudge, an LLM-as-a-judge method, as a first baseline to detect problem drift. We also propose DRIFTPolicy, which mitigates 31% of problem drift cases. Our study is a step toward understanding a key limitation of multi-agent debate, highlighting why longer debates can harm task performance and how problem drift could be addressed.

representative citing papers

Stay Focused: Problem Drift in Multi-Agent Debate

cs.CL · 2025-02-26 · unverdicted · novelty 7.0

The paper defines and measures 'problem drift' in multi-agent LLM debates across tasks and proposes DRIFTJudge and DRIFTPolicy as baselines to detect and reduce it.

The Reasoning Trap: An Information-Theoretic Bound on Closed-System Multi-Step LLM Reasoning

cs.CL · 2026-05-03 · unverdicted · novelty 6.0

Closed-system multi-step LLM reasoning is subject to an information-theoretic bound where mutual information with evidence decreases, preserving accuracy while eroding faithfulness, with EGSR recovering it on SciFact and FEVER.

Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling

cs.AI · 2026-05-02 · unverdicted · novelty 5.0

Multi-agent debate and mixture-of-agents outperform self-consistency by 1.3 and 2.7 percentage points respectively at equal compute budgets on MMLU-Pro and BBH, with advantages that continue at higher scales while self-consistency saturates.

The Prompt Engineering Report Distilled: Quick Start Guide for Life Sciences

cs.CL · 2025-09-14 · unverdicted · novelty 3.0 · 2 refs

The paper reduces a broad set of prompt engineering techniques to six core approaches and applies them to life sciences use cases while addressing common LLM pitfalls.

citing papers explorer

Showing 4 of 4 citing papers.

Stay Focused: Problem Drift in Multi-Agent Debate cs.CL · 2025-02-26 · unverdicted · none · ref 11 · internal anchor
The paper defines and measures 'problem drift' in multi-agent LLM debates across tasks and proposes DRIFTJudge and DRIFTPolicy as baselines to detect and reduce it.
The Reasoning Trap: An Information-Theoretic Bound on Closed-System Multi-Step LLM Reasoning cs.CL · 2026-05-03 · unverdicted · none · ref 17 · internal anchor
Closed-system multi-step LLM reasoning is subject to an information-theoretic bound where mutual information with evidence decreases, preserving accuracy while eroding faithfulness, with EGSR recovering it on SciFact and FEVER.
Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling cs.AI · 2026-05-02 · unverdicted · none · ref 1 · internal anchor
Multi-agent debate and mixture-of-agents outperform self-consistency by 1.3 and 2.7 percentage points respectively at equal compute budgets on MMLU-Pro and BBH, with advantages that continue at higher scales while self-consistency saturates.
The Prompt Engineering Report Distilled: Quick Start Guide for Life Sciences cs.CL · 2025-09-14 · unverdicted · none · ref 38 · 2 links · internal anchor
The paper reduces a broad set of prompt engineering techniques to six core approaches and applies them to life sciences use cases while addressing common LLM pitfalls.

Stay Focused: Problem Drift in Multi-Agent Debate

fields

years

verdicts

representative citing papers

citing papers explorer