pith. machine review for the scientific record. sign in

arxiv: 2605.10528 · v1 · submitted 2026-05-11 · ❄️ cond-mat.stat-mech · cs.CL· cs.MA· physics.soc-ph

Recognition: 2 theorem links

· Lean Theorem

Collective Alignment in LLM Multi-Agent Systems: Disentangling Bias from Cooperation via Statistical Physics

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:52 UTC · model grok-4.3

classification ❄️ cond-mat.stat-mech cs.CLcs.MAphysics.soc-ph
keywords LLM multi-agent systemscollective alignmentstatistical physicsIsing modelintrinsic biasfinite-size scalingphase transitionseffective parameters
0
0 comments X

The pith

Collective alignment in LLM multi-agent systems is driven by intrinsic bias far more than by neighbor cooperation, yielding crossovers rather than true phase transitions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models groups of identical LLM agents as spins on a 2D lattice, where each agent updates its yes/no answer by querying the model with the states of its four neighbors. A statistical-physics procedure extracts effective coupling strengths and external fields from the observed magnetization and susceptibility curves. Across three different open models the extracted bias term greatly exceeds the coupling term at all temperatures, so the collective behavior resembles an external-field crossover instead of the critical point expected from strong neighbor interactions. This distinction matters because it shows that apparent group consensus in these systems largely reflects each model's built-in preferences rather than emergent cooperation.

Core claim

In the models we analyzed, we found that collective alignment is dominated by an intrinsic bias (h̃ ≫ J̃) rather than by cooperative neighbor coupling, producing field-driven crossovers instead of genuine phase transitions. Finite-size scaling on even-sized lattices yields effective exponents γ/ν that are model-dependent and incompatible with the 2D Ising value of 7/4. The extracted temperature-dependent parameters J̃(T) and h̃(T) provide compact fingerprints that differ qualitatively across LLMs.

What carries the argument

Effective β-weighted coupling J̃(T) and field h̃(T) obtained by treating LLM neighbor-conditioned updates as probabilistic binary spins and performing finite-size scaling on magnetization and susceptibility.

If this is right

  • Multi-agent consensus in these LLM systems exhibits smooth, field-driven crossovers rather than sharp phase transitions.
  • Each LLM carries a distinct collective-behavior fingerprint given by its pair of effective parameters J̃(T) and h̃(T).
  • The method supplies a quantitative diagnostic for how reliably a given model will produce aligned group outputs.
  • Exponents extracted from finite-size scaling vary by model and remain incompatible with Ising universality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Reducing intrinsic bias in base LLMs may be a more direct route to cooperative multi-agent behavior than adding explicit neighbor prompts.
  • The lattice-plus-Ising approach could be applied to other topologies or to agents with more than two states to test whether bias dominance persists.
  • If bias dominates in larger or more heterogeneous agent groups, collective decisions will largely mirror the strongest individual model tendencies rather than produce genuinely new group-level intelligence.

Load-bearing premise

LLM answers to prompts that include neighbor states can be treated as probabilistic binary updates whose effective couplings and fields can be extracted exactly as in an Ising model.

What would settle it

If repeated measurements on the same models showed effective fields h̃ comparable in magnitude to couplings J̃ or if the measured susceptibility scaling exponents matched the 2D Ising value of exactly 7/4 within error, the claim of bias dominance would be falsified.

Figures

Figures reproduced from arXiv: 2605.10528 by Cristiano De Nobili.

Figure 1
Figure 1. Figure 1: FIG. 1. Schematic of the LLM-on-lattice setup. Each site [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. Late-time magnetization distribution [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: shows the magnetization time series m(t) for all three models across sampler temperatures. The qual￾itative behavior is shared: at low T, the system rapidly reaches high alignment; as T increases, trajectories be￾come noisier and the steady-state magnetization drops. However, the rate and extent of this disordering differ markedly across models. llama3.1:8b [ [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Absolute susceptibility [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. Absolute susceptibility [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7. Effective coupling [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
read the original abstract

We investigate the emergent collective dynamics of LLM-based multi-agent systems on a 2D square lattice and present a model-agnostic statistical-physics method to disentangle social conformity from intrinsic bias, compute critical exponents, and probe the collective behavior and possible phase transitions of multi-agent systems. In our framework, each node of an $L\!\times\!L$ lattice hosts an identical LLM agent holding a binary state ($+1$/$-1$, mapped to yes/no) and updating it by querying the model conditioned on the four nearest-neighbor states. The sampler temperature $T$ serves as the sole control parameter. Across three open-weight models (llama3.1:8b, phi4-mini:3.8b, mistral:7b), we measure magnetization and susceptibility under a global-flip protocol designed to probe $\mathbb{Z}_2$ symmetry. All models display temperature-driven order-disorder crossovers and susceptibility peaks; finite-size scaling on even-$L$ lattices yields effective exponents $\gamma/\nu$ whose values are model-dependent, close to but incompatible with the 2D Ising universality class ($\gamma/\nu=7/4$). Our method enables the extraction of effective $\beta$-weighted couplings $\tilde{J}(T)$ and fields $\tilde{h}(T)$, which serve as a measure of social conformity and intrinsic bias. In the models we analyzed, we found that collective alignment is dominated by an intrinsic bias ($\tilde{h}\gg\tilde{J}$) rather than by cooperative neighbor coupling, producing field-driven crossovers instead of genuine phase transitions. These effective parameters vary qualitatively across models, providing compact collective-behavior fingerprints for LLM agents and a quantitative diagnostic for the reliability of multi-agent consensus and collective alignment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper develops a statistical-physics framework for analyzing collective dynamics in LLM multi-agent systems on a 2D square lattice. Each agent maintains a binary state and updates it via LLM queries conditioned on the four nearest-neighbor states, with sampler temperature T as the control parameter. For three open-weight models, magnetization and susceptibility are measured under a global-flip protocol; finite-size scaling on even-L lattices yields model-dependent effective exponents γ/ν close to but incompatible with the 2D Ising value. Effective temperature-dependent couplings J̃(T) and fields h̃(T) are extracted, leading to the claim that intrinsic bias dominates (h̃ ≫ J̃), producing field-driven crossovers rather than genuine phase transitions. The method is presented as model-agnostic and provides collective-behavior fingerprints.

Significance. If the mapping of LLM updates to an effective Ising Hamiltonian holds and the parameter extraction is robust, the work supplies a quantitative, physics-based diagnostic for distinguishing bias-driven from cooperation-driven alignment in multi-agent LLM systems. The model-specific fingerprints and the ability to probe crossovers versus transitions could inform design of reliable consensus mechanisms and diagnostics for collective reliability.

major comments (3)
  1. [Abstract and Results (finite-size scaling and effective-parameter extraction)] The procedure for extracting the effective parameters J̃(T) and h̃(T) from the measured magnetization and susceptibility is not specified (no formulas, fitting protocol, or data-processing steps are given), nor are error bars or robustness tests against post-hoc choices reported. This extraction is load-bearing for the central claim that h̃ ≫ J̃ implies field-driven crossovers rather than phase transitions.
  2. [Finite-size scaling analysis] The reported γ/ν values are model-dependent and stated to be incompatible with the 2D Ising value 7/4, yet no scaling plots, checks on other exponents (e.g., β/ν), or analysis of possible deviations from Ising universality (higher-order correlations, non-equilibrium sampling, or prompt-induced multi-spin effects) are provided. This leaves open whether the assumed two-body Hamiltonian form is valid.
  3. [Model and update rule (methods)] The mapping of neighbor-conditioned LLM queries to Glauber dynamics of an Ising model with only nearest-neighbor J and uniform h is assumed without independent validation. If prompt structure introduces asymmetric responses, higher-order correlations, or breaks the Z2 symmetry in ways not captured by the global-flip protocol, the separation into bias versus cooperation and the conclusion of no genuine phase transitions would not follow.
minor comments (2)
  1. [Notation and definitions] The notation 'β-weighted couplings' for J̃ should be defined explicitly at first use, including the precise relation to the underlying Hamiltonian.
  2. [Figures and results presentation] Figure captions and text should clarify whether susceptibility peaks are raw or rescaled, and whether finite-size scaling collapses are shown for all three models.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments identify important gaps in methodological transparency and supporting analyses. We have revised the manuscript to address these by adding explicit formulas, fitting details, scaling plots, additional exponent checks, and a limitations discussion. Point-by-point responses follow.

read point-by-point responses
  1. Referee: The procedure for extracting the effective parameters J̃(T) and h̃(T) from the measured magnetization and susceptibility is not specified (no formulas, fitting protocol, or data-processing steps are given), nor are error bars or robustness tests against post-hoc choices reported. This extraction is load-bearing for the central claim that h̃ ≫ J̃ implies field-driven crossovers rather than phase transitions.

    Authors: We agree the extraction procedure was under-specified. The method matches measured m(T) and χ(T) to the exact 2D Ising expressions in a field via numerical least-squares minimization at each T, solving simultaneously for J̃ and h̃. In the revision we add a dedicated Methods subsection with the explicit fitting equations, the optimizer used, bootstrap-derived error bars on the parameters, and robustness tests across fitting windows and initial guesses. These changes make the h̃ ≫ J̃ claim fully reproducible. revision: yes

  2. Referee: The reported γ/ν values are model-dependent and stated to be incompatible with the 2D Ising value 7/4, yet no scaling plots, checks on other exponents (e.g., β/ν), or analysis of possible deviations from Ising universality (higher-order correlations, non-equilibrium sampling, or prompt-induced multi-spin effects) are provided. This leaves open whether the assumed two-body Hamiltonian form is valid.

    Authors: We have added finite-size scaling plots (data collapse for χ L^{-γ/ν} and m L^{β/ν}) for all three models together with the extracted β/ν values. The new figures confirm the reported γ/ν while showing consistent deviations from Ising exponents. We interpret these as signatures of effective rather than universal behavior and include a short analysis of possible sources (non-equilibrium LLM sampling and prompt-induced correlations) in the revised Discussion. revision: yes

  3. Referee: The mapping of neighbor-conditioned LLM queries to Glauber dynamics of an Ising model with only nearest-neighbor J and uniform h is assumed without independent validation. If prompt structure introduces asymmetric responses, higher-order correlations, or breaks the Z2 symmetry in ways not captured by the global-flip protocol, the separation into bias versus cooperation and the conclusion of no genuine phase transitions would not follow.

    Authors: The global-flip protocol is our primary symmetry check: all agents are initialized in the fully flipped configuration and the dynamics are required to remain statistically equivalent under global inversion. This directly tests Z2 preservation in the LLM update rule. While higher-order prompt effects cannot be excluded a priori, they are absorbed into the extracted effective parameters; the observed h̃ ≫ J̃ dominance is robust across models. We have added a paragraph in the Discussion acknowledging possible multi-spin contributions and outlining correlation-function diagnostics for future validation. revision: partial

Circularity Check

0 steps flagged

No significant circularity: effective parameters are fitted interpretations of measured data, not self-referential predictions

full rationale

The paper measures magnetization and susceptibility directly from LLM agent updates on the lattice, applies finite-size scaling to extract exponents (which deviate from Ising values), and then computes effective J̃(T) and h̃(T) via the assumed mapping to an Ising-like Hamiltonian. The claim h̃ ≫ J̃ is an interpretation of the relative magnitudes of these fitted quantities, not a derivation that reduces to its own inputs by construction. No self-citations, ansatz smuggling, or uniqueness theorems appear in the provided text. The chain is an empirical application of statistical-physics tools to observed dynamics and remains self-contained against the external benchmark of the LLM query responses.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on treating LLM queries as effective Ising-like updates and on the validity of finite-size scaling for extracting separate coupling and field terms; these modeling choices are not independently verified in the provided abstract.

free parameters (2)
  • effective coupling J̃(T)
    Extracted from susceptibility and magnetization data to quantify neighbor cooperation; its value is determined by the measurement protocol rather than predicted a priori.
  • effective field h̃(T)
    Extracted to quantify intrinsic bias; again determined post-measurement.
axioms (2)
  • domain assumption LLM responses conditioned on neighbor states behave as probabilistic binary flips whose statistics can be captured by temperature-dependent effective J and h parameters.
    This is the core modeling step that allows the statistical-physics analysis.
  • domain assumption Finite-size scaling on even-L lattices yields meaningful effective exponents even when the underlying process is not a true equilibrium statistical mechanics system.
    Invoked when comparing γ/ν to the 2D Ising value.

pith-pipeline@v0.9.0 · 5627 in / 1542 out tokens · 63143 ms · 2026-05-12T03:52:12.188147+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 9 internal anchors

  1. [1]

    T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N. V. Chawla, O. Wiest, and X. Zhang, Large language model based multi-agents: A survey of progress and challenges (2024), arXiv:2402.01680 [cs.CL]

  2. [2]

    S. Hong, M. Zhuge, J. Chen, X. Zheng, Y. Cheng, C. Zhang, J. Wang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu, and J. Schmidhuber, MetaGPT: Meta programming for a multi-agent collab- orative framework (2024), arXiv:2308.00352 [cs.AI]

  3. [3]

    Y. Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch, Improving factuality and reasoning in language models through multiagent debate (2023), arXiv:2305.14325 [cs.CL]

  4. [4]

    A. Smit, P. Duckworth, N. Grinsztajn, T. D. Bar- rett, and A. Pretorius, Should we be going MAD? a look at multi-agent debate strategies for LLMs (2024), arXiv:2311.17371 [cs.CL]

  5. [5]

    X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou, Self-consistency improves chain of thought reasoning in language mod- els (2022), published at ICLR 2023, arXiv:2203.11171 [cs.CL]

  6. [6]

    C. Ruan, Y. Wang, Z. Shi, A. Panisson, and J. Li, Reach- ing agreement among reasoning LLM agents (2025), arXiv:2512.2018 [cs.AI]

  7. [7]

    arXiv preprint arXiv:2311.09618 , year=

    Y.-S. Chuang, A. Goyal, N. Harlalka, S. Suresh, R. Hawkins, S. Yang, D. Shah, J. Hu, and T. T. Rogers, Simulating opinion dynamics with networks of LLM- based agents (2024), arXiv:2311.09618 [cs.CL]

  8. [8]

    Piatti, Z

    G. Piatti, Z. Hu, and K. Cho, Cooperate or collapse: Emergence of sustainable cooperation in a society of LLM agents (2024), arXiv:2404.16698 [cs.AI]

  9. [9]

    Tomasev, M

    N. Tomasev, M. Franklin, J. Z. Leibo, J. Jacobs, W. A. Cunningham, I. Gabriel, and S. Osindero, Virtual agent economies (2025), arXiv:2509.10147 [cs.AI]

  10. [10]

    S. Wang, M. Politi, S. Marro, and D. Crapis, Profit is the red team: Stress-testing agents in strategic economic interactions (2026), arXiv:2603.20925 [cs.AI]

  11. [11]

    X. Sun, D. Crapis, M. Stephenson, B. Monnot, T. Thiery, and J. Passerat-Palmbach, Cooperative AI via decen- tralized commitment devices (2023), arXiv:2311.07815 [cs.AI]

  12. [12]

    Y. Ding, A. Twabi, J. Yu, L. Zhang, T. Kondo, and H. Sato, in2025 IEEE International Symposium on Par- allel and Distributed Processing with Applications (ISPA) (IEEE, 2025) pp. 1439–1445

  13. [13]

    Karma Mechanisms for Decentralised, Cooperative Multi Agent Path Finding

    K. Riehl, J. Schlapbach, A. Kouvelas, and M. A. Makridis, Karma mechanisms for decentralised, cooper- ative multi agent path finding (2026), arXiv:2604.07970 [eess.SY]

  14. [14]

    Santagata and C

    L. Santagata and C. De Nobili, More is more: Addition bias in large language models (2024), arXiv:2409.02569 [cs.CL]

  15. [15]

    Zheng, X

    H. Zheng, X. Liu, L. Wang, X. Yang, Y. Liu, and X. Wang, CalibraEval: Calibrating prediction distribu- tion to mitigate selection bias in LLMs-as-judges (2024), 10 arXiv:2410.15393 [cs.CL]

  16. [16]

    P. F. Christiano, J. Leike, T. B. Brown, M. Martic, S. Legg, and D. Amodei, inAdvances in Neural Infor- mation Processing Systems (NeurIPS), Vol. 30 (2017) arXiv:1706.03741 [stat.ML]

  17. [17]

    Training language models to follow instructions with human feedback

    L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wain- wright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe, inAdvances in Neural Infor- mation Processing Systems (NeurIPS), Vol. 35 (2022) arXiv:2203.02155 [cs.CL]

  18. [18]

    Towards Understanding Sycophancy in Language Models

    M. Sharma, M. Tong, T. Korbak, D. Duvenaud, A. Askell, S. R. Bowman, N. Cheng, E. Dur- mus, Z. Hatfield-Dodds, S. R. Johnston, S. Kravec, T. Maxwell, S. McCandlish, K. Ndousse, O. Rausch, N. Schiefer, D. Yan, M. Zhang, and E. Perez, Towards understanding sycophancy in language models (2023), arXiv:2310.13548 [cs.CL]

  19. [19]

    Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

    L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, inAdvances in Neural In- formation Processing Systems (NeurIPS), Datasets and Benchmarks Track(2023) arXiv:2306.05685 [cs.CL]

  20. [20]

    H. Li, Q. Dong, J. Chen, H. Su, Y. Zhou, Q. Ai, Z. Ye, and Y. Liu, A survey on LLM-as-a-Judge (2024), arXiv:2411.15594 [cs.CL]

  21. [21]

    Castellano, S

    C. Castellano, S. Fortunato, and V. Loreto, Rev. Mod. Phys.81, 591 (2009)

  22. [22]

    Mullick and P

    P. Mullick and P. Sen, Eur. Phys. J. B98, 206 (2025), arXiv:2506.23837 [physics.soc-ph]

  23. [23]

    Starnini, F

    M. Starnini, F. Baumann, T. Galla, D. Garcia, G. I˜ niguez, M. Karsai, J. Lorenz, and K. Sznajd- Weron, Opinion dynamics: Statistical physics and be- yond (2025), arXiv:2507.11521 [physics.soc-ph]

  24. [24]

    Sastre and M

    F. Sastre and M. Henkel, Physica A444, 897 (2016), arXiv:1509.04598 [cond-mat.stat-mech]

  25. [25]

    Dornic, H

    I. Dornic, H. Chat´ e, J. Chave, and H. Hinrichsen, Phys. Rev. Lett.87, 045701 (2001), arXiv:cond-mat/0101202

  26. [26]

    Germani and G

    F. Germani and G. Spitale, Source framing triggers sys- tematic evaluation bias in large language models (2025), arXiv:2505.13488 [cs.CL]

  27. [27]

    Nadeem, U

    A. Nadeem, U. Naseem, Y.-F. Ge, and I. Razzak, Bias be- yond borders: Political ideology evaluation and steering in multilingual LLMs (2026), arXiv:2601.23001 [cs.CL]

  28. [28]

    R. A. Knipper, C. S. Knipper, K. Zhang, V. Sims, C. Bowers, and S. Karmaker, The bias is in the de- tails: An assessment of cognitive bias in llms (2025), arXiv:2509.22856 [cs.CL]

  29. [29]

    Perez, G

    J. Perez, G. Kovaˇ c, C. L´ eger, C. Colas, G. Molinaro, M. Derex, P.-Y. Oudeyer, and C. Moulin-Frier, When LLMs play the telephone game: cultural attractors as conceptual tools to evaluate LLMs in multi-turn settings (2025), proc. ICLR 2025, arXiv:2407.04503 [cs.CL]

  30. [30]

    N. F. Johnson, Increasing intelligence in AI agents can worsen collective outcomes (2026), arXiv:2603.12129 [cs.AI]

  31. [31]

    Y. Yang, R. Luo, M. Li, M. Zhou, W. Zhang, and J. Wang, inProceedings of the 35th International Con- ference on Machine Learning (ICML), PMLR, Vol. 80 (2018) pp. 5571–5580

  32. [32]

    Cozzi, M

    F. Cozzi, M. Pangallo, A. Perotti, A. Panisson, and C. Monti, Learning individual behavior in agent-based models with graph diffusion networks (2025), adv. Neu- ral Inf. Process. Syst. 38, arXiv:2505.21426 [cs.LG]

  33. [33]

    Song, Q.-H

    Z.-Y. Song, Q.-H. Cao, M.-X. Luo, and H. X. Zhu, Detailed balance in large language model-driven agents (2025), arXiv:2512.10047 [cs.AI]

  34. [34]

    Ollama, Ollama: Run large language models locally, https://ollama.com(2024)