pith. sign in

arxiv: 2604.09570 · v2 · submitted 2026-02-20 · 💻 cs.HC

Conversational Forecasting Across Large Human Groups Using A Swarm of Surrogate AI Agents

Pith reviewed 2026-05-15 20:51 UTC · model grok-4.3

classification 💻 cs.HC
keywords conversational forecastingcollective intelligenceAI agentsgroup deliberationNBA predictionssurrogate agentsprediction accuracy
0
0 comments X

The pith

Groups of 25-30 fans using AI-swarm conversations forecast NBA games at 62 percent accuracy against the spread.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether real-time group conversations mediated by surrogate AI agents can improve collective forecasting on uncertain outcomes such as NBA games. Teams of 25 to 30 basketball fans discussed each of 50 games for five minutes on the Thinkscape platform and reached 62 percent accuracy, exceeding the 50 percent Vegas line. Accuracy increased to 68 percent for the 38 forecasts with higher average conversation rates, and the approach would have produced an 18.4 percent return if bets had been placed. The results also exceeded the Polymarket prediction market on the same games. These findings indicate that AI-facilitated deliberation can scale collective intelligence beyond what individuals or markets achieve alone.

Core claim

When human teams of 25 to 30 participants discuss and debate NBA games for five minutes each using the Hyperchat AI architecture, they achieve 62 percent accuracy forecasting outcomes against the spread across 50 games, rising to 68 percent when low-conversation forecasts are excluded, outperforming both Vegas odds and Polymarket while generating positive ROI.

What carries the argument

Hyperchat AI, a communication architecture that inserts surrogate AI agents to sustain real-time text, voice, or video deliberation among large human teams without conversation collapse.

Load-bearing premise

Accuracy gains arise from the AI-mediated conversation process rather than from fan knowledge, game selection, or the post-hoc exclusion of low-discussion cases.

What would settle it

Run the identical 50 games with matched groups of 25-30 fans but without surrogate AI agents, then check whether accuracy falls to 50 percent or below.

Figures

Figures reproduced from arXiv: 2604.09570 by Ganesh Mani, Gregg Willcox, Hans Schumann, Louis Rosenberg.

Figure 1
Figure 1. Figure 1: Hyperchat AI enables thoughtful deliberations at scale [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Screenshot of forecasting question presented to participants [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
read the original abstract

Hyperchat AI is a communication and collaboration architecture that employs intervening AI agents to enable real-time conversational deliberations among networked human teams of unlimited size. Prior work has shown that teams as large as 250 people can hold productive real-time conversations by text, voice, or video using Hyperchat AI to discuss complex problems, brainstorm solutions, surface risks, assess alternatives, prioritize options, and converge on optimized results. Building on this prior work, this new study tasked groups of 25 to 30 basketball fans with conversationally forecasting NBA games (against the spread) over a 12-week period. Results show that when discussing and debating NBA games (for five minutes each) using a Hyperchat AI enabled platform called Thinkscape, human teams were 62% accurate across a set of 50 forecasted NBA games. This is an impressive result versus the Vegas odds of 50% (p=0.059). Furthermore, had the participants wagered on the games, they would have produced an 18.4% ROI over the 12-week period. In addition, this study found that the group's conversation rate during each forecast was positively correlated with their prediction accuracy. In fact, when excluding the 12 forecasts in the bottom 25th percentile by average conversation rate, the remaining 38 forecasts recorded a 68% accuracy, significantly better than the 50% Vegas odds (p=0.017). This result also outperformed the well-known prediction market Polymarket (p=0.062) across the same set of NBA games. These outcomes suggest that real-time conversational deliberations, when facilitated by Surrogate AI agents, can significantly amplify groupwise collective intelligence during human forecasting tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents results from an experiment in which groups of 25-30 NBA fans used the Thinkscape platform (powered by Hyperchat AI and surrogate AI agents) to hold five-minute real-time deliberations forecasting the outcomes of 50 NBA games against the spread. The authors report 62% group accuracy (p=0.059 vs. 50% Vegas odds baseline), an 18.4% ROI if wagering, and that excluding the 12 lowest-conversation-rate forecasts yields 68% accuracy (p=0.017) that also outperforms Polymarket (p=0.062). They conclude that AI-facilitated conversational deliberation amplifies collective intelligence in forecasting tasks.

Significance. If the accuracy gains prove robust after addressing selection bias and marginal significance, the work would provide empirical evidence that AI-mediated group conversations can improve forecasting performance over market baselines in a scalable way, with implications for collective intelligence research and applied prediction systems.

major comments (3)
  1. [Abstract / Results] Abstract and Results: The stronger 68% accuracy claim (p=0.017) is obtained only after post-hoc exclusion of the bottom 25% of forecasts by average conversation rate. If conversation rate correlates with game difficulty or spread width, this filter preferentially removes harder-to-predict games and directly undermines the central claim of improved collective intelligence. The manuscript must either pre-specify the filter, demonstrate independence from game-level covariates, or report the unfiltered result as primary.
  2. [Abstract] Abstract: The headline 62% accuracy on all 50 games yields p=0.059, which falls short of conventional significance thresholds. No confidence intervals, effect sizes, per-game sample details, or error bars are reported, making it impossible to assess the reliability of the result or its sensitivity to outliers.
  3. [Methods] Methods: No description of game selection randomization, pre-registered analysis plan, or control arm comparing AI-facilitated vs. non-facilitated high-engagement groups. Without these, the attribution of accuracy gains specifically to the surrogate AI agents (rather than fan expertise or game selection) cannot be isolated.
minor comments (2)
  1. [Abstract] The abstract states groups of 25-30 fans but provides no exact participant counts, total unique individuals, or per-game participation rates.
  2. [Results] Clarify whether the 18.4% ROI calculation assumes equal wagering on every game and accounts for vig or transaction costs.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive comments, which have helped us identify areas for improvement in statistical reporting and methodological transparency. We address each point below and have revised the manuscript to prioritize the unfiltered results, add missing statistical details, and acknowledge study limitations.

read point-by-point responses
  1. Referee: [Abstract / Results] Abstract and Results: The stronger 68% accuracy claim (p=0.017) is obtained only after post-hoc exclusion of the bottom 25% of forecasts by average conversation rate. If conversation rate correlates with game difficulty or spread width, this filter preferentially removes harder-to-predict games and directly undermines the central claim of improved collective intelligence. The manuscript must either pre-specify the filter, demonstrate independence from game-level covariates, or report the unfiltered result as primary.

    Authors: We agree that the 68% result is post-hoc and should not be the primary claim. In the revised manuscript, the 62% accuracy on all 50 games will be presented as the main result. The conversation-rate filtered analysis will be repositioned as exploratory, accompanied by an explicit discussion of potential selection bias. We will test for correlations between conversation rate and game-level covariates (e.g., spread width and historical team performance) and report the findings or note their absence as a limitation. revision: partial

  2. Referee: [Abstract] Abstract: The headline 62% accuracy on all 50 games yields p=0.059, which falls short of conventional significance thresholds. No confidence intervals, effect sizes, per-game sample details, or error bars are reported, making it impossible to assess the reliability of the result or its sensitivity to outliers.

    Authors: We will update the abstract, results, and any figures to report 95% confidence intervals for the accuracy rates, effect sizes (e.g., Cohen's h), and per-game participant counts. Error bars will be added to visualizations. The p=0.059 value will be framed as marginal, with emphasis on the accompanying 18.4% ROI as a practical indicator of performance. revision: yes

  3. Referee: [Methods] Methods: No description of game selection randomization, pre-registered analysis plan, or control arm comparing AI-facilitated vs. non-facilitated high-engagement groups. Without these, the attribution of accuracy gains specifically to the surrogate AI agents (rather than fan expertise or game selection) cannot be isolated.

    Authors: The Methods section will be expanded to clarify that games were selected as a convenience sample of NBA contests during the 12-week period for which participant groups were available. This was an exploratory study without a pre-registered analysis plan. No non-AI control arm was included, as the goal was to assess the AI-facilitated platform's performance. These design choices will be stated explicitly as limitations, with recommendations for future randomized studies to isolate the agents' contribution. revision: partial

standing simulated objections not resolved
  • Absence of a pre-registered analysis plan and a non-AI control arm, which cannot be retroactively implemented in the existing dataset.

Circularity Check

0 steps flagged

No significant circularity in empirical forecasting results

full rationale

The paper reports new experimental outcomes from 50 NBA game forecasts by human teams using the Thinkscape platform. Accuracy figures (62% overall, 68% after excluding bottom-quartile conversation-rate cases) are computed directly from observed human predictions and compared to external benchmarks (Vegas odds at 50%, Polymarket). No equations, fitted parameters presented as predictions, self-definitional constructs, or load-bearing self-citations appear in the derivation of these results. Prior work is referenced only to establish the platform's general conversational capability, not to derive or justify the specific accuracy numbers. The chain is therefore self-contained against independent external data.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the assumption that AI surrogate agents neutrally facilitate discussion without introducing their own bias and that conversation rate is a valid, non-confounded proxy for group insight. No free parameters are fitted in the reported statistics.

axioms (2)
  • domain assumption AI agents can intervene in group text/voice/video conversations without systematically altering human judgment
    Invoked throughout the description of Hyperchat AI and Thinkscape platform operation.
  • ad hoc to paper Conversation rate during five-minute deliberations is independent of game difficulty or other external factors
    Required to interpret the post-hoc exclusion of low-conversation forecasts as evidence of amplified intelligence rather than data selection.
invented entities (1)
  • Surrogate AI agents no independent evidence
    purpose: Intervene in real-time group conversations to enable scalable deliberation
    Core component of the Hyperchat AI architecture introduced to facilitate large-team forecasting.

pith-pipeline@v0.9.0 · 5612 in / 1687 out tokens · 54664 ms · 2026-05-15T20:51:59.929867+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    Aggregating predictions from experts: A review of statistical methods, experiments, and applications,

    T. McAndrew, N. Wattanachit, G. C. Gibson, and N. G. Reich, "Aggregating predictions from experts: A review of statistical methods, experiments, and applications," WIREs Comput. Stat., vol. 13, no. 2, Mar

  2. [2]

    Available: https://doi.org/10.1002/wics.1514

    [Online]. Available: https://doi.org/10.1002/wics.1514

  3. [3]

    Hemming, M

    V. Hemming, M. A., Burgman, A. M. Hanea, M.F., McBride, & B. C. Wintle. (2018). A practical guide to structured expert elicitation using the IDEA protocol. Methods in Ecology and Evolution, 9(1), 169-180

  4. [4]

    Branch prediction in CPU pipelining,

    L. Rosenberg, G. Willcox and H. Schumann, "Conversational Swarm Intelligence (CSI) Enables Rapid Group Insights," 2023 IEEE 14th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 2023, pp. 0534 -0539, doi: 10.1109/UEMCON59035.2023.10316130

  5. [5]

    Conversational Collective Intelligence (CCI) using Hyperchat AI in a Real -world Forecasting Task

    H. Schuman, L. Rosenberg, G. Mani, and G. Willcox. "Conversational Collective Intelligence (CCI) using Hyperchat AI in a Real -world Forecasting Task." 2025 11th Int. HCI and UX Conf. in Indonesia (CHIuXiD). IEEE, Dec 2025. https://arxiv.org/abs/2511.03732

  6. [6]

    Choosing a home: How the scouts in a honey bee swarm perceive the completion of their group decision making

    T. D. Seeley, K. P. Visscher, “Choosing a home: How the scouts in a honey bee swarm perceive the completion of their group decision making.” Behavioural Ecology and Sociobiology 54 (5) 511-520

  7. [7]

    L. B. Rosenberg, (2015). Human Swarms, a real -time method for collective intelligence. In Artificial Life Conference Proceedings (pp. 658-659). MIT Press. https://doi.org/10.1162/978-0-262-33027-5-ch117

  8. [8]

    Rosenberg, G

    L. Rosenberg, G. Willcox, M. Palosuo & G. Mani (2021). Forecasting of volatile assets using artificial swarm intelligence. In Proceedings of the 2021 4th International Conf on Artificial Intelligence for Industries (AI4I) (pp. 30–33). IEEE. https://doi.org/10.1109/AI4I51902.2021.00015

  9. [9]

    B. N. Patel, L. Rosenberg, G. Willcox, et al. Human–machine partnership with artificial intelligence for chest radiograph diagnosis. npj Digit. Med. 2, 111 (2019). https://doi.org/10.1038/s41746-019-0189-7

  10. [10]

    Keeping Humans in the Loop: Pooling Knowledge through Artificial Swarm Intelligence to Improve Business Decision Making

    L. Metcalf, D. A. Askay, and L. B. Rosenberg, "Keeping Humans in the Loop: Pooling Knowledge through Artificial Swarm Intelligence to Improve Business Decision Making", California Management Review, 2019,https://doi.org/10.1177/0008125619862256

  11. [11]

    L. B. Rosenberg, ‘Collective Superintelligence: Enabling Real -Time Conversational Deliberations among Humans and AI Agents at Unprecedented Scale’. In Foundations and Frontiers in Decision Science, edited by Dakshina Ranjan Kisku. Rijeka: IntechOpen, 2025. https://doi.org/10.5772/intechopen.1010201

  12. [12]

    Rosenberg, G

    L. Rosenberg, G. Willcox, H. Schumann and G. Mani, 2024, January. Conversational Swarm Intelligence amplifies the accuracy of networked groupwise deliberations. In 2024 IEEE 14th Annual Computing and Communication Workshop and Conf (CCWC) (pp. 0086-0091). IEEE

  13. [13]

    Dense Neural Network used to Amplify the Forecasting Accuracy of real -time Human Swarms,

    G. Willcox, L. Rosenberg, R. Donovan and H. Schumann, "Dense Neural Network used to Amplify the Forecasting Accuracy of real -time Human Swarms," 2019 11th International Conference on Computational Intelligence and Communication Networks (CICN), Honolulu, HI, USA, 2019, pp. 69-74, doi: 10.1109/CICN.2019.8902352

  14. [14]

    ‘Methods and systems for hyperchat conversations among large networked populations with collective intelligence’

    US Patent 11,949,638. ‘Methods and systems for hyperchat conversations among large networked populations with collective intelligence’. https://patents.google.com/patent/US11949638B1

  15. [15]

    Winning Percentage of Professional Sports Bettors,

    ProfessionalGambler.org, “Winning Percentage of Professional Sports Bettors,” ProfessionalGambler.org. [Online]. Available: https://professionalgambler.org/winning-percentages Accessed: 2/18/26

  16. [16]

    Robbins, (2023)

    T. Robbins, (2023). Weak Form Efficiency in Sports Betting Markets. American Journal of Management, 23(2). doi.org/10.33423/ajm.v23i2.6051