Easier to Mislead Than to Correct: Harmful and Beneficial Revision in LLM Conformity

Jiaming Qu; Lucheng Fu; Yibo Hu

arxiv: 2606.01637 · v2 · pith:ZZVYG2ULnew · submitted 2026-06-01 · 💻 cs.CL · cs.AI

Easier to Mislead Than to Correct: Harmful and Beneficial Revision in LLM Conformity

Jiaming Qu , Lucheng Fu , Yibo Hu This is my paper

Pith reviewed 2026-06-28 15:20 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords LLM conformitymulti-agent systemsanswer revisionpeer agreementharmful revisionauthority labelsreasoning interventionsquestion answering

0 comments

The pith

Peer agreement misleads initially correct LLMs more easily than it corrects wrong ones

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper runs controlled experiments in which an LLM first gives its own answer to a question and then sees simulated responses from other models before deciding on a final answer. It varies the structure of peer agreement and whether peers carry authority labels, then measures how often the model revises from correct to incorrect versus incorrect to correct. The central finding is an asymmetry: agreement among peers shifts models away from right answers more readily than it shifts them toward right answers. Authority labels increase the pull toward the labeled answer even when that answer is wrong. Standard reasoning methods such as chain-of-thought prompting do not reliably block the harmful shifts while keeping the helpful ones.

Core claim

Across four open-weight LLMs and seven QA datasets, peer consensus produces more revisions that change a correct initial answer into an incorrect final answer than revisions that change an incorrect initial answer into a correct final answer. Authority labels raise the probability that the model selects the endorsed answer irrespective of its correctness. Interventions such as chain-of-thought and reflection do not consistently lower the rate of harmful revisions while preserving beneficial ones.

What carries the argument

Controlled presentation of simulated peer responses that vary in consensus structure and authority labels, with measurement of revision direction from an LLM's initial answer to its final answer on QA tasks.

If this is right

Multi-agent LLM systems risk a net increase in errors when they aggregate peer answers without additional verification steps.
Authority signals attached to peers increase the chance that models follow the labeled answer regardless of accuracy.
Common reasoning interventions fail to reduce harmful conformity more than they reduce beneficial conformity.
Multi-agent designs should incorporate explicit verification of peer answers instead of relying on simple aggregation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Live interactions among actual models could produce conformity patterns that differ from those observed with pre-scripted peer answers.
Error rates in collaborative LLM setups may increase with larger group sizes under majority-based aggregation.
The asymmetry could extend to tasks other than question answering, such as planning or decision-making under group input.

Load-bearing premise

The simulated peer responses and the seven QA datasets used accurately capture the conformity dynamics that would occur in real multi-agent LLM deployments with live models.

What would settle it

Repeating the experiments using live peer models that generate responses in real time rather than pre-simulated answers and checking whether the ratio of harmful to beneficial revisions remains greater than one.

Figures

Figures reproduced from arXiv: 2606.01637 by Jiaming Qu, Lucheng Fu, Yibo Hu.

**Figure 1.** Figure 1: LLMs are easier to mislead than to correct. Relative to a shared mixed-peer baseline, all-wrong peers increase harmful revision (correct → wrong) much more sharply than all-correct peers increase beneficial revision (wrong → correct). expected to help models correct their mistakes, it can also spread errors when peers provide wrong answers. This raises a core question: is peer influence equally effective … view at source ↗

**Figure 2.** Figure 2: Experimental design. The model first answers independently, then sees six simulated peer responses and answers again. RQ1 manipulates consensus structure and authority-label presence; RQ2 varies social-cue strength by changing the number of committed peers and authority-labeled peers; RQ3 tests whether chain-of-thought and reflect-then-revise prompting can reduce harmful revision while preserving beneficia… view at source ↗

**Figure 3.** Figure 3: RQ2b: authority-aligned revision by authority-label count. As more peers carry authority labels, models more often revise toward the authorityendorsed answer, whether it is correct or wrong. authority-aligned revision rises by about +30 pp from nauth=1 to nauth=5. When authority-labeled peers endorse a wrong answer, authority-aligned revision rises by a comparable +28 pp. Thus, increasing the number of a… view at source ↗

**Figure 4.** Figure 4: RQ3: CoT prompting and reflect-then-revise do not reliably reduce harmful revision while preserving beneficial revision. Error bars are 95% CIs clustered at the original per-instance level. vision from 63.9% to 36.5% (Figure 4a; consensusby-CoT interaction p < .001), suggesting that explicit reasoning can help models resist the strongest misleading peer signal. However, this benefit does not generalize a… view at source ↗

read the original abstract

Large language models are increasingly used in multi-agent systems, where they see and respond to other agents' answers. A key risk is conformity: a model may abandon its own answer simply because others agree on a different one. Prior studies show that LLMs often revise toward a majority answer, but it remains unclear whether these revisions help correct mistakes as often as they introduce new errors. In this paper, we conduct a controlled study in which an LLM first answers a question, then sees simulated peer responses before making a final decision. We manipulate two social cues: consensus structure and authority labels assigned to peers, and measure how they influence beneficial and harmful revisions. Across four open-weight LLMs and seven QA datasets, we find that peer agreement makes it much easier to mislead initially correct models than to correct initially wrong ones. Authority labels make models more likely to choose the endorsed answer, regardless of whether it is correct. More concerningly, generic reasoning interventions such as chain-of-thought and reflection do not reliably reduce harmful revision while preserving beneficial revision. These findings suggest that multi-agent LLM systems should verify peer answers rather than simply aggregate them.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Peer agreement misleads correct LLM answers more than it fixes wrong ones in this setup, but the simulated peers leave the multi-agent generalization open.

read the letter

The main point is that peer agreement sways models from correct to incorrect answers more readily than the reverse. The authors set up experiments where an LLM answers a question first, then sees manipulated peer responses with different consensus patterns and authority labels, and revises its answer. They track how often this produces harmful versus beneficial changes across four open-weight models and seven QA datasets.

They separate the two kinds of revision cleanly and test whether authority cues or generic reasoning steps like chain-of-thought change the rates. The asymmetry shows up consistently, and the interventions do not reliably protect against harmful shifts while keeping the beneficial ones. That separation and the intervention results are the concrete addition here.

The clearest limitation is the reliance on simulated peer answers. The design lets them control the inputs tightly, but it does not test whether the same pattern appears when multiple models generate responses live and influence each other. The motivation in the abstract is multi-agent systems, so how well the controlled results carry over is the open question.

This is useful for people building or evaluating multi-agent LLM setups who need data on conformity risks. The experiments are direct enough that the numbers can be checked and extended. It should go to peer review so reviewers can examine the exact prompt templates, statistical reporting, and whether the simulation choices affect the asymmetry.

Referee Report

3 major / 2 minor

Summary. The paper conducts a controlled empirical study on LLM conformity in multi-agent settings. An LLM first answers questions from seven QA datasets, then revises after viewing simulated peer responses whose consensus structure and authority labels are manipulated. Across four open-weight models, the authors report that peer agreement induces harmful revisions (correct to incorrect) more readily than beneficial revisions (incorrect to correct). Authority labels increase selection of the endorsed answer irrespective of correctness. Standard reasoning interventions (chain-of-thought, reflection) do not reliably reduce harmful revisions while preserving beneficial ones. The authors conclude that multi-agent systems should verify rather than aggregate peer answers.

Significance. If the reported asymmetry holds under live multi-agent conditions, the result is significant for AI system design: it indicates that conformity can systematically amplify rather than mitigate errors, providing a concrete reason to prefer verification mechanisms. The controlled manipulation of social cues and the use of multiple models and datasets are strengths that allow direct measurement of revision outcomes rather than inference from fitted parameters.

major comments (3)

[§3] §3 (Experimental Setup): The central claim of asymmetric revision rates is measured using pre-generated or manipulated peer responses rather than responses produced by independent live models. Without an explicit ablation or statistical comparison showing that the simulated peer error distributions and consensus structures match those arising in dynamic, interdependent generation, the observed harmful-to-beneficial ratio could be an artifact of the simulation method and would not necessarily generalize to the multi-agent deployments emphasized in the abstract and conclusion.
[§4.3] §4.3 (Intervention results): The claim that chain-of-thought and reflection 'do not reliably reduce harmful revision' is load-bearing for the practical recommendation. The manuscript reports aggregate trends but does not provide per-dataset statistical tests, confidence intervals on the revision rates, or power analysis; it is therefore unclear whether the interventions have null effects or whether the design lacks sensitivity to detect moderate reductions.
[Table 2] Table 2 (Revision counts by condition): The reported asymmetry is quantified via raw counts of harmful vs. beneficial revisions. If the initial model accuracy varies substantially across datasets, the base rates of correct and incorrect answers differ; without normalization or reporting of the conditional probabilities P(harmful|initially correct) and P(beneficial|initially incorrect) with standard errors, the claim that misleading is 'much easier' than correcting cannot be directly compared across conditions.

minor comments (2)

[Abstract] Abstract: The phrase 'generic reasoning interventions' is used without examples; adding 'e.g., chain-of-thought and self-reflection' would improve immediate clarity.
[§2] §2 (Related Work): The citation to prior conformity studies could explicitly note whether those works also used simulated vs. live peers, to better situate the novelty of the current simulation design.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. Below we address each major comment point-by-point, indicating where we will revise the manuscript.

read point-by-point responses

Referee: [§3] §3 (Experimental Setup): The central claim of asymmetric revision rates is measured using pre-generated or manipulated peer responses rather than responses produced by independent live models. Without an explicit ablation or statistical comparison showing that the simulated peer error distributions and consensus structures match those arising in dynamic, interdependent generation, the observed harmful-to-beneficial ratio could be an artifact of the simulation method and would not necessarily generalize to the multi-agent deployments emphasized in the abstract and conclusion.

Authors: The use of simulated peer responses was deliberate to achieve precise, orthogonal control over consensus structure and authority labels—factors that are confounded in live multi-agent generation. This controlled design directly supports the causal claims about social cues. We acknowledge the referee's point on generalizability and will add an expanded limitations paragraph in the revised manuscript that (a) explicitly states the simulation does not capture inter-agent dependencies and (b) outlines how future live multi-agent experiments could test the same asymmetry. No new ablation experiments are feasible within the current scope, but the controlled results still provide a lower bound on conformity risks. revision: partial
Referee: [§4.3] §4.3 (Intervention results): The claim that chain-of-thought and reflection 'do not reliably reduce harmful revision' is load-bearing for the practical recommendation. The manuscript reports aggregate trends but does not provide per-dataset statistical tests, confidence intervals on the revision rates, or power analysis; it is therefore unclear whether the interventions have null effects or whether the design lacks sensitivity to detect moderate reductions.

Authors: We agree that aggregate trends alone are insufficient for the strong claim. In the revision we will add (i) per-dataset revision rates with McNemar tests or paired proportion tests, (ii) 95% bootstrap confidence intervals on harmful and beneficial revision probabilities, and (iii) a post-hoc power analysis (using the observed effect sizes) to quantify the design's sensitivity. These additions will clarify whether the interventions truly lack effect or whether power is limited. revision: yes
Referee: [Table 2] Table 2 (Revision counts by condition): The reported asymmetry is quantified via raw counts of harmful vs. beneficial revisions. If the initial model accuracy varies substantially across datasets, the base rates of correct and incorrect answers differ; without normalization or reporting of the conditional probabilities P(harmful|initially correct) and P(beneficial|initially incorrect) with standard errors, the claim that misleading is 'much easier' than correcting cannot be directly compared across conditions.

Authors: We will revise Table 2 and the associated text to report both the raw counts and the normalized conditional probabilities P(harmful revision | initially correct) and P(beneficial revision | initially incorrect), each accompanied by standard errors (via bootstrap or binomial SE). This will allow readers to assess the asymmetry on a per-initial-answer basis and to compare effect sizes across datasets with differing base accuracies. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical measurements with no derivations or self-referential definitions

full rationale

The paper is a controlled empirical study that directly measures revision rates from LLM outputs on seven QA datasets under manipulated peer consensus and authority conditions. No equations, fitted parameters, ansatzes, or load-bearing self-citations appear in the derivation of the central claim; the asymmetry between harmful and beneficial revisions is observed rather than defined or predicted from prior quantities within the work itself. The simulation setup is an explicit methodological choice whose validity is external to any internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Empirical study; no free parameters, new entities, or non-standard axioms are introduced beyond standard assumptions of controlled experimentation.

axioms (1)

domain assumption Simulated peer responses produce conformity behavior comparable to live multi-agent interaction
Central to interpreting the revision rates as relevant to real systems

pith-pipeline@v0.9.1-grok · 5733 in / 1138 out tokens · 23746 ms · 2026-06-28T15:20:03.468735+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 8 linked inside Pith

[1]

Journal of Economic perspectives , volume=

Cognitive reflection and decision making , author=. Journal of Economic perspectives , volume=. 2005 , publisher=

2005
[2]

, author=

Considering the opposite: a corrective strategy for social judgment. , author=. Journal of personality and social psychology , volume=. 1984 , publisher=

1984
[3]

Scientific american , volume=

Opinions and social pressure , author=. Scientific american , volume=. 1955 , publisher=

1955
[4]

, author=

A study of normative and informational social influences upon individual judgment. , author=. The journal of abnormal and social psychology , volume=. 1955 , publisher=

1955
[5]

, author=

Behavioral study of obedience. , author=. The Journal of abnormal and social psychology , volume=. 1963 , publisher=

1963
[6]

, author=

The psychology of social impact. , author=. American psychologist , volume=. 1981 , publisher=

1981
[7]

, author=

Sources of the continued influence effect: When misinformation in memory affects later inferences. , author=. Journal of experimental psychology: Learning, memory, and cognition , volume=. 1994 , publisher=

1994
[8]

Psychological science in the public interest , volume=

Misinformation and its correction: Continued influence and successful debiasing , author=. Psychological science in the public interest , volume=. 2012 , publisher=

2012
[9]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Conformity in large language models , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[10]

arXiv preprint arXiv:2501.13381 , year=

Do as we do, not as you think: the conformity of large language models , author=. arXiv preprint arXiv:2501.13381 , year=

arXiv
[11]

arXiv preprint arXiv:2505.21588 , year=

Herd behavior: Investigating peer influence in llm-based multi-agent systems , author=. arXiv preprint arXiv:2505.21588 , year=

arXiv
[12]

arXiv preprint arXiv:2510.19107 , year=

When Your AI Agent Succumbs to Peer-Pressure: Studying Opinion-Change Dynamics of LLMs , author=. arXiv preprint arXiv:2510.19107 , year=

arXiv
[13]

arXiv preprint arXiv:2601.04790 , year=

Belief in Authority: Impact of Authority in Multi-Agent Evaluation Framework , author=. arXiv preprint arXiv:2601.04790 , year=

arXiv
[14]

arXiv preprint arXiv:2601.05606 , year=

Conformity Dynamics in LLM Multi-Agent Systems: The Roles of Topology and Self-Social Weighting , author=. arXiv preprint arXiv:2601.05606 , year=

arXiv
[15]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

An empirical study of group conformity in multi-agent systems , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025
[16]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

LLMs Trust Humans More, That’s a Problem! Unveiling and Mitigating the Authority Bias in Retrieval-Augmented Generation , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[17]

International Conference on Learning Representations , volume=

Justice or prejudice? quantifying biases in llm-as-a-judge , author=. International Conference on Learning Representations , volume=
[18]

arXiv preprint arXiv:2604.19301 , year=

Large Language Models Exhibit Normative Conformity , author=. arXiv preprint arXiv:2604.19301 , year=

Pith/arXiv arXiv
[19]

Studies in social power , volume=

The bases of social power , author=. Studies in social power , volume=. 1959 , publisher=

1959
[20]

Sociometry , pages=

Influence of a consistent minority on the responses of a majority in a color perception task , author=. Sociometry , pages=. 1969 , publisher=

1969
[21]

PLoS biology , volume=

Distinct neurocomputational mechanisms support informational and socially normative conformity , author=. PLoS biology , volume=. 2022 , publisher=

2022
[22]

Organizational behavior and human decision processes , volume=

Advice taking in decision making: Egocentric discounting and reputation formation , author=. Organizational behavior and human decision processes , volume=. 2000 , publisher=

2000
[23]

Organizational behavior and human decision processes , volume=

Trust, confidence, and expertise in a judge-advisor system , author=. Organizational behavior and human decision processes , volume=. 2001 , publisher=

2001
[24]

International Conference on Learning Representations , volume=

Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms , author=. International Conference on Learning Representations , volume=
[25]

International conference on learning representations , volume=

Large language models cannot self-correct reasoning yet , author=. International conference on learning representations , volume=
[26]

International Conference on Learning Representations , volume=

Critic: Large language models can self-correct with tool-interactive critiquing , author=. International Conference on Learning Representations , volume=
[27]

ICLR 2024 Workshop on Large Language Model (LLM) Agents , year=

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation , author=. ICLR 2024 Workshop on Large Language Model (LLM) Agents , year=

2024
[28]

Forty-first international conference on machine learning , year=

Improving factuality and reasoning in language models through multiagent debate , author=. Forty-first international conference on machine learning , year=
[29]

International conference on learning representations , volume=

Chateval: Towards better llm-based evaluators through multi-agent debate , author=. International conference on learning representations , volume=
[30]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

Unleashing the emergent cognitive synergy in large language models: A task-solving agent through multi-persona self-collaboration , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2024
[31]

Proceedings of the 2024 conference on empirical methods in natural language processing , pages=

Encouraging divergent thinking in large language models through multi-agent debate , author=. Proceedings of the 2024 conference on empirical methods in natural language processing , pages=

2024
[32]

Proceedings of the 29th symposium on operating systems principles , pages=

Efficient memory management for large language model serving with pagedattention , author=. Proceedings of the 29th symposium on operating systems principles , pages=
[33]

Findings of the Association for Computational Linguistics: ACL 2023 , pages=

Challenging big-bench tasks and whether chain-of-thought can solve them , author=. Findings of the Association for Computational Linguistics: ACL 2023 , pages=

2023
[34]

Advances in Neural Information Processing Systems , volume=

Mmlu-pro: A more robust and challenging multi-task language understanding benchmark , author=. Advances in Neural Information Processing Systems , volume=
[35]

arXiv preprint arXiv:1803.05457 , year=

Think you have solved question answering? try arc, the ai2 reasoning challenge , author=. arXiv preprint arXiv:1803.05457 , year=

Pith/arXiv arXiv
[36]

Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers) , pages=

Truthfulqa: Measuring how models mimic human falsehoods , author=. Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers) , pages=
[37]

arXiv preprint arXiv:2412.15115 , year=

Qwen2.5 Technical Report , author=. arXiv preprint arXiv:2412.15115 , year=

Pith/arXiv arXiv
[38]

arXiv preprint arXiv:2310.06825 , year=

Mistral 7B , author=. arXiv preprint arXiv:2310.06825 , year=

Pith/arXiv arXiv
[39]

arXiv preprint arXiv:2408.00118 , year=

Gemma 2: Improving open language models at a practical size , author=. arXiv preprint arXiv:2408.00118 , year=

Pith/arXiv arXiv
[40]

arXiv preprint arXiv:2407.21783 , year=

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

Pith/arXiv arXiv
[41]

International Conference on Learning Representations , volume=

Darkbench: Benchmarking dark patterns in large language models , author=. International Conference on Learning Representations , volume=
[42]

arXiv preprint arXiv:2303.13988 , year=

Machine psychology , author=. arXiv preprint arXiv:2303.13988 , year=

arXiv
[43]

Advances in neural information processing systems , volume=

Self-refine: Iterative refinement with self-feedback , author=. Advances in neural information processing systems , volume=
[44]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
[45]

arXiv preprint arXiv:2509.24130 , year=

Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE , author=. arXiv preprint arXiv:2509.24130 , year=

arXiv
[46]

arXiv preprint arXiv:2605.21318 , year=

TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization , author=. arXiv preprint arXiv:2605.21318 , year=

Pith/arXiv arXiv
[47]

Advances in neural information processing systems , volume=

Large language models are zero-shot reasoners , author=. Advances in neural information processing systems , volume=
[48]

, author=

Metacognition and cognitive monitoring: A new area of cognitive--developmental inquiry. , author=. American psychologist , volume=. 1979 , publisher=

1979
[49]

2011 , publisher=

Thinking, fast and slow , author=. 2011 , publisher=

2011
[50]

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages=

Uncertainty-aware reliable text classification , author=. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages=
[51]

Proceedings of the ACM Web Conference 2024 , pages=

Better to ask in english: Cross-lingual evaluation of large language models for healthcare queries , author=. Proceedings of the ACM Web Conference 2024 , pages=

2024
[52]

Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

Controllable fake document infilling for cyber deception , author=. Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

2022
[53]

Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

Understanding the Effects of Explaining Predictive but Unintuitive Features in Human-XAI Interaction , author=. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

2025
[54]

Proceedings of the 2023 Conference on Human Information Interaction and Retrieval , pages=

Understanding the cognitive influences of interpretability features on how users scrutinize machine-predicted categories , author=. Proceedings of the 2023 Conference on Human Information Interaction and Retrieval , pages=

2023
[55]

Proceedings of the 30th ACM International Conference on Information & Knowledge Management , pages=

A study of explainability features to scrutinize faceted filtering results , author=. Proceedings of the 30th ACM International Conference on Information & Knowledge Management , pages=
[56]

arXiv:2601.14230 , year=

MASCOT: Towards Multi-Agent Socio-Collaborative Companion Systems , author=. arXiv:2601.14230 , year=

Pith/arXiv arXiv
[57]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

Agentreview: Exploring peer review dynamics with llm agents , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

2024

[1] [1]

Journal of Economic perspectives , volume=

Cognitive reflection and decision making , author=. Journal of Economic perspectives , volume=. 2005 , publisher=

2005

[2] [2]

, author=

Considering the opposite: a corrective strategy for social judgment. , author=. Journal of personality and social psychology , volume=. 1984 , publisher=

1984

[3] [3]

Scientific american , volume=

Opinions and social pressure , author=. Scientific american , volume=. 1955 , publisher=

1955

[4] [4]

, author=

A study of normative and informational social influences upon individual judgment. , author=. The journal of abnormal and social psychology , volume=. 1955 , publisher=

1955

[5] [5]

, author=

Behavioral study of obedience. , author=. The Journal of abnormal and social psychology , volume=. 1963 , publisher=

1963

[6] [6]

, author=

The psychology of social impact. , author=. American psychologist , volume=. 1981 , publisher=

1981

[7] [7]

, author=

Sources of the continued influence effect: When misinformation in memory affects later inferences. , author=. Journal of experimental psychology: Learning, memory, and cognition , volume=. 1994 , publisher=

1994

[8] [8]

Psychological science in the public interest , volume=

Misinformation and its correction: Continued influence and successful debiasing , author=. Psychological science in the public interest , volume=. 2012 , publisher=

2012

[9] [9]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Conformity in large language models , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[10] [10]

arXiv preprint arXiv:2501.13381 , year=

Do as we do, not as you think: the conformity of large language models , author=. arXiv preprint arXiv:2501.13381 , year=

arXiv

[11] [11]

arXiv preprint arXiv:2505.21588 , year=

Herd behavior: Investigating peer influence in llm-based multi-agent systems , author=. arXiv preprint arXiv:2505.21588 , year=

arXiv

[12] [12]

arXiv preprint arXiv:2510.19107 , year=

When Your AI Agent Succumbs to Peer-Pressure: Studying Opinion-Change Dynamics of LLMs , author=. arXiv preprint arXiv:2510.19107 , year=

arXiv

[13] [13]

arXiv preprint arXiv:2601.04790 , year=

Belief in Authority: Impact of Authority in Multi-Agent Evaluation Framework , author=. arXiv preprint arXiv:2601.04790 , year=

arXiv

[14] [14]

arXiv preprint arXiv:2601.05606 , year=

Conformity Dynamics in LLM Multi-Agent Systems: The Roles of Topology and Self-Social Weighting , author=. arXiv preprint arXiv:2601.05606 , year=

arXiv

[15] [15]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

An empirical study of group conformity in multi-agent systems , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025

[16] [16]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

LLMs Trust Humans More, That’s a Problem! Unveiling and Mitigating the Authority Bias in Retrieval-Augmented Generation , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[17] [17]

International Conference on Learning Representations , volume=

Justice or prejudice? quantifying biases in llm-as-a-judge , author=. International Conference on Learning Representations , volume=

[18] [18]

arXiv preprint arXiv:2604.19301 , year=

Large Language Models Exhibit Normative Conformity , author=. arXiv preprint arXiv:2604.19301 , year=

Pith/arXiv arXiv

[19] [19]

Studies in social power , volume=

The bases of social power , author=. Studies in social power , volume=. 1959 , publisher=

1959

[20] [20]

Sociometry , pages=

Influence of a consistent minority on the responses of a majority in a color perception task , author=. Sociometry , pages=. 1969 , publisher=

1969

[21] [21]

PLoS biology , volume=

Distinct neurocomputational mechanisms support informational and socially normative conformity , author=. PLoS biology , volume=. 2022 , publisher=

2022

[22] [22]

Organizational behavior and human decision processes , volume=

Advice taking in decision making: Egocentric discounting and reputation formation , author=. Organizational behavior and human decision processes , volume=. 2000 , publisher=

2000

[23] [23]

Organizational behavior and human decision processes , volume=

Trust, confidence, and expertise in a judge-advisor system , author=. Organizational behavior and human decision processes , volume=. 2001 , publisher=

2001

[24] [24]

International Conference on Learning Representations , volume=

Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms , author=. International Conference on Learning Representations , volume=

[25] [25]

International conference on learning representations , volume=

Large language models cannot self-correct reasoning yet , author=. International conference on learning representations , volume=

[26] [26]

International Conference on Learning Representations , volume=

Critic: Large language models can self-correct with tool-interactive critiquing , author=. International Conference on Learning Representations , volume=

[27] [27]

ICLR 2024 Workshop on Large Language Model (LLM) Agents , year=

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation , author=. ICLR 2024 Workshop on Large Language Model (LLM) Agents , year=

2024

[28] [28]

Forty-first international conference on machine learning , year=

Improving factuality and reasoning in language models through multiagent debate , author=. Forty-first international conference on machine learning , year=

[29] [29]

International conference on learning representations , volume=

Chateval: Towards better llm-based evaluators through multi-agent debate , author=. International conference on learning representations , volume=

[30] [30]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

Unleashing the emergent cognitive synergy in large language models: A task-solving agent through multi-persona self-collaboration , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2024

[31] [31]

Proceedings of the 2024 conference on empirical methods in natural language processing , pages=

Encouraging divergent thinking in large language models through multi-agent debate , author=. Proceedings of the 2024 conference on empirical methods in natural language processing , pages=

2024

[32] [32]

Proceedings of the 29th symposium on operating systems principles , pages=

Efficient memory management for large language model serving with pagedattention , author=. Proceedings of the 29th symposium on operating systems principles , pages=

[33] [33]

Findings of the Association for Computational Linguistics: ACL 2023 , pages=

Challenging big-bench tasks and whether chain-of-thought can solve them , author=. Findings of the Association for Computational Linguistics: ACL 2023 , pages=

2023

[34] [34]

Advances in Neural Information Processing Systems , volume=

Mmlu-pro: A more robust and challenging multi-task language understanding benchmark , author=. Advances in Neural Information Processing Systems , volume=

[35] [35]

arXiv preprint arXiv:1803.05457 , year=

Think you have solved question answering? try arc, the ai2 reasoning challenge , author=. arXiv preprint arXiv:1803.05457 , year=

Pith/arXiv arXiv

[36] [36]

Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers) , pages=

Truthfulqa: Measuring how models mimic human falsehoods , author=. Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers) , pages=

[37] [37]

arXiv preprint arXiv:2412.15115 , year=

Qwen2.5 Technical Report , author=. arXiv preprint arXiv:2412.15115 , year=

Pith/arXiv arXiv

[38] [38]

arXiv preprint arXiv:2310.06825 , year=

Mistral 7B , author=. arXiv preprint arXiv:2310.06825 , year=

Pith/arXiv arXiv

[39] [39]

arXiv preprint arXiv:2408.00118 , year=

Gemma 2: Improving open language models at a practical size , author=. arXiv preprint arXiv:2408.00118 , year=

Pith/arXiv arXiv

[40] [40]

arXiv preprint arXiv:2407.21783 , year=

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

Pith/arXiv arXiv

[41] [41]

International Conference on Learning Representations , volume=

Darkbench: Benchmarking dark patterns in large language models , author=. International Conference on Learning Representations , volume=

[42] [42]

arXiv preprint arXiv:2303.13988 , year=

Machine psychology , author=. arXiv preprint arXiv:2303.13988 , year=

arXiv

[43] [43]

Advances in neural information processing systems , volume=

Self-refine: Iterative refinement with self-feedback , author=. Advances in neural information processing systems , volume=

[44] [44]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

[45] [45]

arXiv preprint arXiv:2509.24130 , year=

Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE , author=. arXiv preprint arXiv:2509.24130 , year=

arXiv

[46] [46]

arXiv preprint arXiv:2605.21318 , year=

TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization , author=. arXiv preprint arXiv:2605.21318 , year=

Pith/arXiv arXiv

[47] [47]

Advances in neural information processing systems , volume=

Large language models are zero-shot reasoners , author=. Advances in neural information processing systems , volume=

[48] [48]

, author=

Metacognition and cognitive monitoring: A new area of cognitive--developmental inquiry. , author=. American psychologist , volume=. 1979 , publisher=

1979

[49] [49]

2011 , publisher=

Thinking, fast and slow , author=. 2011 , publisher=

2011

[50] [50]

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages=

Uncertainty-aware reliable text classification , author=. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages=

[51] [51]

Proceedings of the ACM Web Conference 2024 , pages=

Better to ask in english: Cross-lingual evaluation of large language models for healthcare queries , author=. Proceedings of the ACM Web Conference 2024 , pages=

2024

[52] [52]

Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

Controllable fake document infilling for cyber deception , author=. Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

2022

[53] [53]

Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

Understanding the Effects of Explaining Predictive but Unintuitive Features in Human-XAI Interaction , author=. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

2025

[54] [54]

Proceedings of the 2023 Conference on Human Information Interaction and Retrieval , pages=

Understanding the cognitive influences of interpretability features on how users scrutinize machine-predicted categories , author=. Proceedings of the 2023 Conference on Human Information Interaction and Retrieval , pages=

2023

[55] [55]

Proceedings of the 30th ACM International Conference on Information & Knowledge Management , pages=

A study of explainability features to scrutinize faceted filtering results , author=. Proceedings of the 30th ACM International Conference on Information & Knowledge Management , pages=

[56] [56]

arXiv:2601.14230 , year=

MASCOT: Towards Multi-Agent Socio-Collaborative Companion Systems , author=. arXiv:2601.14230 , year=

Pith/arXiv arXiv

[57] [57]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

Agentreview: Exploring peer review dynamics with llm agents , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

2024