pith. sign in

arxiv: 2606.07487 · v1 · pith:QXPDOOE6new · submitted 2026-06-05 · 💻 cs.MA · cs.GT· cs.SI

Modelling Opinion Dynamics at Scale with Deep MARL

Pith reviewed 2026-06-27 20:06 UTC · model grok-4.3

classification 💻 cs.MA cs.GTcs.SI
keywords opinion dynamicsmulti-agent reinforcement learningconsensus gamesocial networksconformitymisinformationBluesky
0
0 comments X

The pith

High conformity in large social media networks reduces collective accuracy and promotes agents that lie to fit in.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a scalable multi-agent reinforcement learning setup for modeling how opinions form and spread, allowing populations of up to 1000 agents to learn interaction rules through reward optimization rather than fixed equations. It extends an existing technique called other-play to keep the learned behaviors realistic in social settings and tests the model by recovering which agents matter most from the structure of a real Bluesky network. The key result is that populations with strong conformity pressures match observed human patterns, yet in large networks this conformity lowers the group's overall accuracy and rewards dishonesty, while the same pressure can help agreement in small, changing groups. This points to a possible mismatch between human social instincts shaped by small-scale life and the scale of modern online platforms.

Core claim

A GPU-accelerated consensus and truth-finding game trained with deep MARL and extended other-play produces agent behaviors that, when an attention layer is trained on Bluesky graph topology alone, recover realistic importance rankings; the same model shows that high conformity in large populations lowers collective accuracy and favors dishonest agents, whereas in small dynamic populations conformity can raise agreement.

What carries the argument

The GPU-accelerated consensus and truth-finding game with extended other-play, whose learned attention layer recovers agent importance from graph topology.

If this is right

  • High conformity reduces collective accuracy in large networks.
  • High conformity promotes dishonest agents that lie to fit in large networks.
  • Small dynamic networks are less harmed by high conformity.
  • Conformity can improve collective agreement in small dynamic networks.
  • A mismatch between evolved conformity and large online environments may contribute to misinformation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Platform designs that reduce conformity pressure could raise the accuracy of shared information.
  • Simulations of this kind could be used to test network interventions before deployment.
  • The approach opens the possibility of studying how network size and change rate interact with learned social rules.

Load-bearing premise

The behaviors that emerge from the MARL consensus game with extended other-play correspond to real human opinion dynamics.

What would settle it

A direct measurement on the Bluesky network or similar data showing that higher conformity does not correlate with lower collective accuracy or increased dishonesty would falsify the main claim.

Figures

Figures reproduced from arXiv: 2606.07487 by Brandon Kaplowitz, Jakob Foerster, Lukas Seier, Richard Bailey, Sebastian Towers.

Figure 1
Figure 1. Figure 1: Opinion update loop. Agents receive self and neighbouring guesses, which are subse￾quently passed through an other-play symmetry operator and learned attention layer. The output is concatenated with agents’ private signals and passed into the main body of the architecture, after which the symmetry operation is reversed, yielding the updated guesses of the agents. However, scaling deep MARL methods to large… view at source ↗
Figure 2
Figure 2. Figure 2: Episode dynamics. Example episodes for reward weighting α = 0.2, selected to demon￾strate interesting outcomes. Top row: Bluesky network with many dishonest agents despite the population guessing correctly. Middle row: Congress network with a polarised state and a major￾ity of lying agents in the left cluster, despite signalling incorrectly. Bottom row: Hadza network (r = 5) with no lying agents despite th… view at source ↗
Figure 3
Figure 3. Figure 3: Left: Wall clock times for 106 training steps with 20 parallel environments. The JAX implementations trained on a single GPU scale orders of magnitude better than the baseline model trained on a CPU. The improved parameter efficiency of our model leads to additional performance gains. Right: Fraction of lying agents that output the opposite non-null guess to the output of their belief head. Social media pr… view at source ↗
Figure 4
Figure 4. Figure 4: Validation of Bluesky network. Left: Sum of Wasserstein distances for each node in￾degree. The dotted red line indicates the minimum of the MARL curve. Right grid: Hexagonal histograms showing PI distributions as a function of node in-degree, comparing MARL prediction at α = 0, 0.2, and 1 with the real data. α = 0.2 accurately captures the shape of the real data. are weak enough to encourage extended commu… view at source ↗
Figure 5
Figure 5. Figure 5: Fraction of agents whose output matches the ground-truth at various time steps for all [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: 117th U.S. Congress X/Twitter network classified by party association. Username to party [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Perceived importance for an SBM and BA graph with 100 agents. Truth-seeking agents [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Ablation examples for the Congress network. [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Effect of ablating GRU on the fraction of agents whose output matches the ground-truth [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Effect of ablating agent IDs on the fraction of agents whose output matches the ground [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Relative reward difference for three randomly selected agents retrained in separate tests [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Histograms showing the distribution of initial outputs for agents receiving a private signal [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Area charts showing fractional distribution of initial outputs for all values of [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Left: TV distance for increasing number of MC samples against a proxy average with 104 samples. Right: TV distance for different α between MC sampling vs same action method for extracting attention weights from the model. where w¯ ∗ ij are proxy true weights taken with M = 104 [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Three validation metrics, from left to right: edge-level MSE, node-level perceived im [PITH_FULL_IMAGE:figures/full_fig_p023_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Hexagonal histograms for the Bluesky network showing PI distributions as a function of [PITH_FULL_IMAGE:figures/full_fig_p023_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Validation of Congress network. Left. Sum of Wasserstein distances for each node in￾degree. Right grid: Hexagonal histograms showing PI distributions as a function of node in-degree, comparing MARL prediction at α = 0, 0.2, and 1 with the real data. The MARL simulations struggle to capture the full shape of the real data. Node In-Degree 0 1 2 3 4 5 PI Uniform Node In-Degree Degree Node In-Degree Eigenvect… view at source ↗
Figure 18
Figure 18. Figure 18: Hexagonal histograms for the Congress X/Twitter network showing PI distributions as a [PITH_FULL_IMAGE:figures/full_fig_p024_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Fraction of agents among the non-null outputting subpopulation whose output matches [PITH_FULL_IMAGE:figures/full_fig_p025_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Average private signal disagreement scores for lying agents receiving private signals for [PITH_FULL_IMAGE:figures/full_fig_p025_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Reward training curves for the Bluesky network. Note the decrease in reward with time [PITH_FULL_IMAGE:figures/full_fig_p028_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Reward training curves for the Congress network. Note the decrease in reward with time [PITH_FULL_IMAGE:figures/full_fig_p029_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Reward training curves for the Hadza network with [PITH_FULL_IMAGE:figures/full_fig_p030_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Reward training curves for the Hadza network with [PITH_FULL_IMAGE:figures/full_fig_p031_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Belief loss training curves for the Bluesky network. Note the increase in loss with time is [PITH_FULL_IMAGE:figures/full_fig_p032_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Belief loss training curves for the Congress network. Note the increase in loss with [PITH_FULL_IMAGE:figures/full_fig_p033_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Belief loss training curves for the Hadza network with [PITH_FULL_IMAGE:figures/full_fig_p034_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Belief loss training curves for the Hadza network with [PITH_FULL_IMAGE:figures/full_fig_p035_28.png] view at source ↗
read the original abstract

Modelling opinion dynamics typically relies on hand-crafted local interaction rules to study emergent macroscopic phenomena such as consensus and polarisation. In contrast, multi-agent reinforcement learning (MARL) enables agents to learn such behaviours directly by optimising simple rewards. To explore the potential of MARL for opinion dynamics, we introduce a GPU-accelerated consensus and truth-finding game that scales to populations of up to 1000 agents, comparable to many real-world social sub-networks. To prevent unrealistic conventions, we extend other-play to general-sum social interactions. We next validate our model on a subset of the Bluesky network by recovering agent importance structures from graph topology alone via a learned attention layer, finding that highly conforming populations most closely match human data. In large social media networks such high levels of conformity significantly reduce collective accuracy and promote dishonest agents that lie to fit in. By contrast, small, dynamic hunter-gatherer networks are less affected; here, conformity can even improve collective agreement. This suggests a mismatch between evolved human conformity heuristics and modern social media environments as a potential contributor to misinformation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces a GPU-accelerated MARL consensus and truth-finding game that scales to 1000 agents. Agents learn opinion-update behaviors by optimizing rewards, with an extension of other-play to general-sum interactions to avoid unrealistic conventions. Validation on a Bluesky subgraph recovers agent importance structures via a learned attention layer, with highly conforming populations matching human data on this metric. Results indicate that high conformity in large networks reduces collective accuracy and promotes dishonest agents, while small dynamic networks are less affected and may benefit from conformity; this is interpreted as an evolutionary mismatch contributing to misinformation.

Significance. If the model produces behaviors that correspond to human opinion dynamics, the work offers a scalable alternative to hand-crafted rules for studying emergent phenomena like consensus and polarization, with potential implications for understanding misinformation in social media. The GPU scaling and attention-based recovery of network structures are technical strengths. However, the significance is constrained because the reported validation is limited to topological recovery rather than behavioral or outcome-level correspondence with human data.

major comments (2)
  1. [Validation on Bluesky network] Validation section (Bluesky experiment): recovery of agent importance structures from graph topology alone is shown to match human data most closely under high conformity. This metric does not test whether the learned conformity levels, truth-telling vs. lying equilibria, or collective accuracy outcomes reproduce patterns from human opinion-dynamics experiments. The central claims about reduced accuracy and promoted dishonesty in large networks therefore rest on an unanchored simulation-to-human mapping.
  2. [Results on network size and conformity] Results on network-size effects: the reported contrast between large social-media networks and small hunter-gatherer networks (conformity harming vs. helping collective accuracy) is load-bearing for the evolutionary-mismatch interpretation, yet no direct comparison to empirical human data on these outcomes is provided.
minor comments (1)
  1. [Abstract and validation] The abstract and validation description would benefit from explicit statement of the quantitative match metric (e.g., correlation or rank agreement) between the attention-derived importance and human data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the technical contributions of the GPU scaling and attention mechanism. We address the two major comments point by point below, clarifying the scope of our validation and the basis for our interpretations while agreeing to strengthen the manuscript's caveats.

read point-by-point responses
  1. Referee: Validation section (Bluesky experiment): recovery of agent importance structures from graph topology alone is shown to match human data most closely under high conformity. This metric does not test whether the learned conformity levels, truth-telling vs. lying equilibria, or collective accuracy outcomes reproduce patterns from human opinion-dynamics experiments. The central claims about reduced accuracy and promoted dishonesty in large networks therefore rest on an unanchored simulation-to-human mapping.

    Authors: We agree that the Bluesky validation is limited to recovering agent importance rankings from topology via the learned attention weights, and that this does not directly test behavioral equilibria or collective accuracy against human experimental data. The match to human importance ratings under high conformity provides evidence that the policies produce plausible interaction structures, but we accept that this leaves the accuracy and honesty results as model-derived predictions rather than validated correspondences. We will revise the validation and discussion sections to explicitly distinguish the structural anchoring from the outcome-level claims and to qualify the latter as exploratory. revision: partial

  2. Referee: Results on network-size effects: the reported contrast between large social-media networks and small hunter-gatherer networks (conformity harming vs. helping collective accuracy) is load-bearing for the evolutionary-mismatch interpretation, yet no direct comparison to empirical human data on these outcomes is provided.

    Authors: The size-dependent effects are obtained by running the trained policies on static large graphs versus small dynamic graphs that approximate the cited anthropological examples. We do not possess or cite direct empirical measurements of collective accuracy under controlled conformity levels across these regimes, so the evolutionary-mismatch reading is presented as an interpretation suggested by the model's behavior rather than an empirically confirmed claim. We will add explicit language in the results and conclusion sections stating that the interpretation is model-generated and requires future empirical testing, and we will reference existing studies on conformity in small-scale societies to better contextualize the comparison. revision: yes

Circularity Check

0 steps flagged

No circularity: MARL optimization and external validation are independent of target claims

full rationale

The paper trains agents via MARL on a consensus/truth-finding reward in a GPU-scaled game, extends other-play for general-sum settings, then validates by recovering attention-based importance from Bluesky graph topology and comparing conformity levels to human data. No equations, fitted parameters, or self-citations are shown to reduce the central claims (conformity effects in large vs. small networks) to inputs by construction. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described in sufficient detail to populate the ledger.

pith-pipeline@v0.9.1-grok · 5727 in / 1131 out tokens · 20166 ms · 2026-06-27T20:06:35.286713+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

67 extracted references · 3 linked inside Pith

  1. [1]

    The Review of Economic Studies , volume=

    Bayesian learning in social networks , author=. The Review of Economic Studies , volume=. 2011 , publisher=

  2. [2]

    Emergence of conventions through social learning:

    Airiau, St. Emergence of conventions through social learning:. Autonomous Agents and Multi-Agent Systems , volume=. 2014 , publisher=

  3. [3]

    Nature , volume=

    Social networks and cooperation in hunter-gatherers , author=. Nature , volume=. 2012 , publisher=

  4. [4]

    , title =

    Asch, Solomon E. , title =. Groups, leadership and men: Research in human relations , editor =. 1951 , pages =

  5. [5]

    arXiv preprint arXiv:1708.06233 , year=

    Fake news in social networks , author=. arXiv preprint arXiv:1708.06233 , year=

  6. [6]

    International Conference on Learning Representations , year=

    Emergent tool use from multi-agent autocurricula , author=. International Conference on Learning Representations , year=

  7. [7]

    The Quarterly Journal of Economics , volume=

    A simple model of herd behavior , author=. The Quarterly Journal of Economics , volume=. 1992 , publisher=

  8. [8]

    The Journal of Mathematical Sociology , volume=

    Opinion polarization by learning from social feedback , author=. The Journal of Mathematical Sociology , volume=. 2019 , publisher=

  9. [9]

    Royal Society Open Science , volume=

    The emergence of consensus: a primer , author=. Royal Society Open Science , volume=. 2018 , publisher=

  10. [10]

    Physical Review Letters , volume=

    Modeling echo chambers and polarization dynamics in social networks , author=. Physical Review Letters , volume=. 2020 , publisher=

  11. [11]

    The complexity of decentralized control of

    Bernstein, Daniel S and Givan, Robert and Immerman, Neil and Zilberstein, Shlomo , journal=. The complexity of decentralized control of. 2002 , publisher=

  12. [12]

    Journal of Political Economy , volume=

    A theory of fads, fashion, custom, and cultural change as informational cascades , author=. Journal of Political Economy , volume=. 1992 , publisher=

  13. [13]

    James Bradbury and Roy Frostig and Peter Hawkins and Matthew James Johnson and Chris Leary and Dougal Maclaurin and George Necula and Adam Paszke and Jake Vander

  14. [14]

    Reviews of Modern Physics , volume=

    Statistical physics of social dynamics , author=. Reviews of Modern Physics , volume=. 2009 , publisher=

  15. [15]

    The emperor's dilemma:

    Centola, Damon and Willer, Robb and Macy, Michael , journal=. The emperor's dilemma:. 2005 , publisher=

  16. [16]

    Testing models of social learning on networks:

    Chandrasekhar, Arun G and Larreguy, Horacio and Xandri, Juan Pablo , year=. Testing models of social learning on networks:

  17. [17]

    IEEE Transactions on Cybernetics , volume=

    Collective learning for the emergence of social norms in networked multiagent systems , author=. IEEE Transactions on Cybernetics , volume=. 2014 , publisher=

  18. [18]

    Social influence:

    Cialdini, Robert B and Goldstein, Noah J , journal=. Social influence:. 2004 , publisher=

  19. [19]

    Advances in Complex Systems , volume=

    Mixing beliefs among interacting agents , author=. Advances in Complex Systems , volume=. 2000 , publisher=

  20. [20]

    Journal of the American Statistical Association , volume=

    Reaching a consensus , author=. Journal of the American Statistical Association , volume=. 1974 , publisher=

  21. [21]

    The Journal of Abnormal and Social Psychology , volume=

    A study of normative and informational social influences upon individual judgment , author=. The Journal of Abnormal and Social Psychology , volume=. 1955 , publisher=

  22. [22]

    1996 , publisher=

    Growing artificial societies: social science from the bottom up , author=. 1996 , publisher=

  23. [23]

    Behavioral Ecology , volume=

    Social status does not predict in-camp integration among egalitarian hunter-gatherer men , author=. Behavioral Ecology , volume=. 2022 , publisher=

  24. [24]

    Physica A: Statistical Mechanics and its Applications , volume=

    A centrality measure for quantifying spread on weighted, directed networks , author=. Physica A: Statistical Mechanics and its Applications , volume=. 2023 , publisher=

  25. [25]

    JASSS: The Journal of Artificial Societies and Social Simulation , volume=

    Models of social influence: Towards the next frontiers , author=. JASSS: The Journal of Artificial Societies and Social Simulation , volume=. 2017 , publisher=

  26. [26]

    Advances in Neural Information Processing Systems , volume=

    Learning to communicate with deep multi-agent reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

  27. [27]

    2021 , month = dec, day =

    Twitter Handles for Members of the 117th Congress , howpublished =. 2021 , month = dec, day =

  28. [28]

    Psychological Review , volume=

    A formal theory of social power , author=. Psychological Review , volume=. 1956 , publisher=

  29. [29]

    Journal of Mathematical Sociology , volume=

    Social influence and opinions , author=. Journal of Mathematical Sociology , volume=. 1990 , publisher=

  30. [30]

    Sociophysics: A new approach of sociological collective behaviour

    Galam, Serge and Gefen, Yuval and Shapir, Yonathan , journal=. Sociophysics: A new approach of sociological collective behaviour. 1982 , publisher=

  31. [31]

    Games and Economic Behavior , volume=

    Bayesian learning in social networks , author=. Games and Economic Behavior , volume=. 2003 , publisher=

  32. [32]

    International Conference on Neural Information Processing , pages=

    Reinforcement learning-based consensus reaching in large-scale social networks , author=. International Conference on Neural Information Processing , pages=. 2023 , organization=

  33. [33]

    International Conference on Autonomous Agents and Multiagent Systems , pages=

    Cooperative multi-agent control using deep reinforcement learning , author=. International Conference on Autonomous Agents and Multiagent Systems , pages=. 2017 , organization=

  34. [34]

    Journal of Artificial Societies and Social Simulation , volume=

    Opinion dynamics and bounded confidence models, analysis, and simulation , author=. Journal of Artificial Societies and Social Simulation , volume=

  35. [35]

    Evolution and Human Behavior , volume=

    The evolution of conformist transmission and the emergence of between-group differences , author=. Evolution and Human Behavior , volume=. 1998 , publisher=

  36. [36]

    Neural Computation , volume=

    Long short-term memory , author=. Neural Computation , volume=. 1997 , publisher=

  37. [37]

    The Annals of Probability , pages=

    Ergodic theorems for weakly interacting infinite systems and the voter model , author=. The Annals of Probability , pages=. 1975 , publisher=

  38. [38]

    Hu, Hengyuan and Lerer, Adam and Peysakhovich, Alex and Foerster, Jakob , booktitle=. ``. 2020 , organization=

  39. [39]

    International Conference on Machine Learning , pages=

    Off-belief learning , author=. International Conference on Machine Learning , pages=. 2021 , organization=

  40. [40]

    Climate Change 2023:

    IPCC , editor=. Climate Change 2023:. 2023 , publisher=

  41. [41]

    Proceedings of the ACM Conext-2024 Workshop on the Decentralization of the Internet , pages=

    Bluesky and the at protocol: Usable decentralized social media , author=. Proceedings of the ACM Conext-2024 Workshop on the Decentralization of the Internet , pages=

  42. [42]

    Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems , pages =

    Multi-agent Reinforcement Learning in Sequential Social Dilemmas , author =. Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems , pages =. 2017 , publisher =

  43. [43]

    Proceedings of the twelfth international conference on Information and knowledge management , pages=

    The link prediction problem for social networks , author=. Proceedings of the twelfth international conference on Information and knowledge management , pages=

  44. [44]

    Scientific Reports , volume=

    Large networks of rational agents form persistent echo chambers , author=. Scientific Reports , volume=. 2018 , publisher=

  45. [45]

    PLOS ONE , volume=

    How social reinforcement learning can lead to metastable polarisation and the voter model , author=. PLOS ONE , volume=. 2024 , publisher=

  46. [46]

    Erkenntnis , volume=

    Truth and conformity on networks , author=. Erkenntnis , volume=. 2021 , publisher=

  47. [47]

    Emerging pandemic diseases:

    Morens, David M and Fauci, Anthony S , journal=. Emerging pandemic diseases:. 2020 , publisher=

  48. [48]

    The Thirteenth International Conference on Learning Representations , year=

    Expected Return Symmetries , author=. The Thirteenth International Conference on Learning Representations , year=

  49. [49]

    European Journal for Philosophy of Science , volume=

    Scientific polarization , author=. European Journal for Philosophy of Science , volume=. 2018 , publisher=

  50. [50]

    2020 , howpublished=

  51. [51]

    The Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

    Alexander Rutherford and Benjamin Ellis and Matteo Gallici and Jonathan Cook and Andrei Lupu and Gar. The Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

  52. [52]

    Is independent learning all you need in the

    Schroeder de Witt, Christian and Gupta, Tarun and Makoviichuk, Denys and Makoviychuk, Viktor and Torr, Philip HS and Sun, Mingfei and Whiteson, Shimon , journal=. Is independent learning all you need in the

  53. [53]

    arXiv preprint arXiv:1707.06347 , year=

    Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

  54. [54]

    European Journal of Operational Research , year=

    A consensus method based on reinforcement learning for group decision-making , author=. European Journal of Operational Research , year=

  55. [55]

    arXiv preprint arXiv:2507.11521 , year=

    Opinion dynamics: Statistical physics and beyond , author=. arXiv preprint arXiv:2507.11521 , year=

  56. [56]

    2015 , howpublished=

    Truthcoin: Peer-to-Peer Oracle System and Prediction Marketplace , author=. 2015 , howpublished=

  57. [57]

    PLOS ONE , volume=

    Multiagent cooperation and competition with deep reinforcement learning , author=. PLOS ONE , volume=. 2017 , publisher=

  58. [58]

    Attention is all you need , booktitle =

    Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser,. Attention is all you need , booktitle =. 2017 , volume =

  59. [59]

    Journal of the Operational Research Society , volume=

    Consensus achievement strategy of opinion dynamics based on deep reinforcement learning with time constraint , author=. Journal of the Operational Research Society , volume=. 2022 , publisher=

  60. [60]

    Scientific Reports , volume=

    Modelling adaptive learning behaviours for consensus formation in human societies , author=. Scientific Reports , volume=. 2016 , publisher=

  61. [61]

    Predicting how people play games:

    Erev, Ido and Roth, Alvin E , journal=. Predicting how people play games:. 1998 , publisher=

  62. [62]

    1998 , publisher=

    The theory of learning in games , author=. 1998 , publisher=

  63. [63]

    Proceedings of the fortieth annual ACM Symposium on Theory of Computing , pages=

    Regret minimization and the price of total anarchy , author=. Proceedings of the fortieth annual ACM Symposium on Theory of Computing , pages=

  64. [64]

    Unraveling in guessing games:

    Nagel, Rosemarie , journal=. Unraveling in guessing games:. 1995 , publisher=

  65. [65]

    Games and Economic Behavior , volume=

    Quantal response equilibria for normal form games , author=. Games and Economic Behavior , volume=. 1995 , publisher=

  66. [66]

    Forty-first International Conference on Machine Learning , year=

    Improving factuality and reasoning in language models through multiagent debate , author=. Forty-first International Conference on Machine Learning , year=

  67. [67]

    Reconcile: Round-table conference improves reasoning via consensus among diverse

    Chen, Justin and Saha, Swarnadeep and Bansal, Mohit , booktitle=. Reconcile: Round-table conference improves reasoning via consensus among diverse