Conversational Forecasting Across Large Human Groups Using A Swarm of Surrogate AI Agents

Ganesh Mani; Gregg Willcox; Hans Schumann; Louis Rosenberg

arxiv: 2604.09570 · v2 · submitted 2026-02-20 · 💻 cs.HC

Conversational Forecasting Across Large Human Groups Using A Swarm of Surrogate AI Agents

Louis Rosenberg , Hans Schumann , Ganesh Mani , Gregg Willcox This is my paper

Pith reviewed 2026-05-15 20:51 UTC · model grok-4.3

classification 💻 cs.HC

keywords conversational forecastingcollective intelligenceAI agentsgroup deliberationNBA predictionssurrogate agentsprediction accuracy

0 comments

The pith

Groups of 25-30 fans using AI-swarm conversations forecast NBA games at 62 percent accuracy against the spread.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether real-time group conversations mediated by surrogate AI agents can improve collective forecasting on uncertain outcomes such as NBA games. Teams of 25 to 30 basketball fans discussed each of 50 games for five minutes on the Thinkscape platform and reached 62 percent accuracy, exceeding the 50 percent Vegas line. Accuracy increased to 68 percent for the 38 forecasts with higher average conversation rates, and the approach would have produced an 18.4 percent return if bets had been placed. The results also exceeded the Polymarket prediction market on the same games. These findings indicate that AI-facilitated deliberation can scale collective intelligence beyond what individuals or markets achieve alone.

Core claim

When human teams of 25 to 30 participants discuss and debate NBA games for five minutes each using the Hyperchat AI architecture, they achieve 62 percent accuracy forecasting outcomes against the spread across 50 games, rising to 68 percent when low-conversation forecasts are excluded, outperforming both Vegas odds and Polymarket while generating positive ROI.

What carries the argument

Hyperchat AI, a communication architecture that inserts surrogate AI agents to sustain real-time text, voice, or video deliberation among large human teams without conversation collapse.

Load-bearing premise

Accuracy gains arise from the AI-mediated conversation process rather than from fan knowledge, game selection, or the post-hoc exclusion of low-discussion cases.

What would settle it

Run the identical 50 games with matched groups of 25-30 fans but without surrogate AI agents, then check whether accuracy falls to 50 percent or below.

Figures

Figures reproduced from arXiv: 2604.09570 by Ganesh Mani, Gregg Willcox, Hans Schumann, Louis Rosenberg.

**Figure 2.** Figure 2: Screenshot of forecasting question presented to participants [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

read the original abstract

Hyperchat AI is a communication and collaboration architecture that employs intervening AI agents to enable real-time conversational deliberations among networked human teams of unlimited size. Prior work has shown that teams as large as 250 people can hold productive real-time conversations by text, voice, or video using Hyperchat AI to discuss complex problems, brainstorm solutions, surface risks, assess alternatives, prioritize options, and converge on optimized results. Building on this prior work, this new study tasked groups of 25 to 30 basketball fans with conversationally forecasting NBA games (against the spread) over a 12-week period. Results show that when discussing and debating NBA games (for five minutes each) using a Hyperchat AI enabled platform called Thinkscape, human teams were 62% accurate across a set of 50 forecasted NBA games. This is an impressive result versus the Vegas odds of 50% (p=0.059). Furthermore, had the participants wagered on the games, they would have produced an 18.4% ROI over the 12-week period. In addition, this study found that the group's conversation rate during each forecast was positively correlated with their prediction accuracy. In fact, when excluding the 12 forecasts in the bottom 25th percentile by average conversation rate, the remaining 38 forecasts recorded a 68% accuracy, significantly better than the 50% Vegas odds (p=0.017). This result also outperformed the well-known prediction market Polymarket (p=0.062) across the same set of NBA games. These outcomes suggest that real-time conversational deliberations, when facilitated by Surrogate AI agents, can significantly amplify groupwise collective intelligence during human forecasting tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The headline accuracy numbers rest on a post-hoc drop of the lowest-conversation forecasts, which undercuts the claim that the AI platform itself drove the gains.

read the letter

The paper reports that 25-30 NBA fans using the Thinkscape platform (built on the authors' prior Hyperchat AI) reached 62% accuracy forecasting 50 games against the spread, with an 18.4% ROI if they had bet, and a positive link between conversation volume and correctness. After dropping the bottom 25% of forecasts by average chat rate, accuracy rose to 68% on the remaining 38 games. Those are the concrete numbers on offer, and they beat the 50% Vegas baseline at p=0.059 overall and p=0.017 on the filtered set, plus a marginal edge over Polymarket on the same games. The work is basically an extension of the authors' earlier platform papers, now applied to sports betting with fresh data from real users over 12 weeks. That empirical step is useful; it shows the system can run live group deliberations on a concrete task and produces measurable outputs. The correlation between talk rate and accuracy is the part that feels worth checking further, because it points to a possible mechanism rather than just a black-box result. The soft spot is the post-hoc filter. Removing the lowest-conversation cases after seeing the data risks selecting easier games if low discussion happens more on lopsided or uncertain matchups. The abstract gives no sign this cutoff was pre-registered, and without that or a control arm comparing facilitated versus unfacilitated high-engagement subsets, the jump from 62% to 68% is hard to attribute cleanly to the AI agents. The p-values sit right on the edge, and the full set barely clears conventional thresholds. No error bars or full dataset details appear in the summary. This is the kind of work that belongs in a reading group focused on collective-intelligence tools or AI-mediated teams, mainly for the platform description and the raw accuracy figures. A serious referee should see it to examine the analysis plan, the game-selection process, and whether the conversation-rate filter holds up under robustness checks. I would send it out for review rather than desk-reject, but expect the stats and controls to need tightening before publication.

Referee Report

3 major / 2 minor

Summary. The manuscript presents results from an experiment in which groups of 25-30 NBA fans used the Thinkscape platform (powered by Hyperchat AI and surrogate AI agents) to hold five-minute real-time deliberations forecasting the outcomes of 50 NBA games against the spread. The authors report 62% group accuracy (p=0.059 vs. 50% Vegas odds baseline), an 18.4% ROI if wagering, and that excluding the 12 lowest-conversation-rate forecasts yields 68% accuracy (p=0.017) that also outperforms Polymarket (p=0.062). They conclude that AI-facilitated conversational deliberation amplifies collective intelligence in forecasting tasks.

Significance. If the accuracy gains prove robust after addressing selection bias and marginal significance, the work would provide empirical evidence that AI-mediated group conversations can improve forecasting performance over market baselines in a scalable way, with implications for collective intelligence research and applied prediction systems.

major comments (3)

[Abstract / Results] Abstract and Results: The stronger 68% accuracy claim (p=0.017) is obtained only after post-hoc exclusion of the bottom 25% of forecasts by average conversation rate. If conversation rate correlates with game difficulty or spread width, this filter preferentially removes harder-to-predict games and directly undermines the central claim of improved collective intelligence. The manuscript must either pre-specify the filter, demonstrate independence from game-level covariates, or report the unfiltered result as primary.
[Abstract] Abstract: The headline 62% accuracy on all 50 games yields p=0.059, which falls short of conventional significance thresholds. No confidence intervals, effect sizes, per-game sample details, or error bars are reported, making it impossible to assess the reliability of the result or its sensitivity to outliers.
[Methods] Methods: No description of game selection randomization, pre-registered analysis plan, or control arm comparing AI-facilitated vs. non-facilitated high-engagement groups. Without these, the attribution of accuracy gains specifically to the surrogate AI agents (rather than fan expertise or game selection) cannot be isolated.

minor comments (2)

[Abstract] The abstract states groups of 25-30 fans but provides no exact participant counts, total unique individuals, or per-game participation rates.
[Results] Clarify whether the 18.4% ROI calculation assumes equal wagering on every game and accounts for vig or transaction costs.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive comments, which have helped us identify areas for improvement in statistical reporting and methodological transparency. We address each point below and have revised the manuscript to prioritize the unfiltered results, add missing statistical details, and acknowledge study limitations.

read point-by-point responses

Referee: [Abstract / Results] Abstract and Results: The stronger 68% accuracy claim (p=0.017) is obtained only after post-hoc exclusion of the bottom 25% of forecasts by average conversation rate. If conversation rate correlates with game difficulty or spread width, this filter preferentially removes harder-to-predict games and directly undermines the central claim of improved collective intelligence. The manuscript must either pre-specify the filter, demonstrate independence from game-level covariates, or report the unfiltered result as primary.

Authors: We agree that the 68% result is post-hoc and should not be the primary claim. In the revised manuscript, the 62% accuracy on all 50 games will be presented as the main result. The conversation-rate filtered analysis will be repositioned as exploratory, accompanied by an explicit discussion of potential selection bias. We will test for correlations between conversation rate and game-level covariates (e.g., spread width and historical team performance) and report the findings or note their absence as a limitation. revision: partial
Referee: [Abstract] Abstract: The headline 62% accuracy on all 50 games yields p=0.059, which falls short of conventional significance thresholds. No confidence intervals, effect sizes, per-game sample details, or error bars are reported, making it impossible to assess the reliability of the result or its sensitivity to outliers.

Authors: We will update the abstract, results, and any figures to report 95% confidence intervals for the accuracy rates, effect sizes (e.g., Cohen's h), and per-game participant counts. Error bars will be added to visualizations. The p=0.059 value will be framed as marginal, with emphasis on the accompanying 18.4% ROI as a practical indicator of performance. revision: yes
Referee: [Methods] Methods: No description of game selection randomization, pre-registered analysis plan, or control arm comparing AI-facilitated vs. non-facilitated high-engagement groups. Without these, the attribution of accuracy gains specifically to the surrogate AI agents (rather than fan expertise or game selection) cannot be isolated.

Authors: The Methods section will be expanded to clarify that games were selected as a convenience sample of NBA contests during the 12-week period for which participant groups were available. This was an exploratory study without a pre-registered analysis plan. No non-AI control arm was included, as the goal was to assess the AI-facilitated platform's performance. These design choices will be stated explicitly as limitations, with recommendations for future randomized studies to isolate the agents' contribution. revision: partial

standing simulated objections not resolved

Absence of a pre-registered analysis plan and a non-AI control arm, which cannot be retroactively implemented in the existing dataset.

Circularity Check

0 steps flagged

No significant circularity in empirical forecasting results

full rationale

The paper reports new experimental outcomes from 50 NBA game forecasts by human teams using the Thinkscape platform. Accuracy figures (62% overall, 68% after excluding bottom-quartile conversation-rate cases) are computed directly from observed human predictions and compared to external benchmarks (Vegas odds at 50%, Polymarket). No equations, fitted parameters presented as predictions, self-definitional constructs, or load-bearing self-citations appear in the derivation of these results. Prior work is referenced only to establish the platform's general conversational capability, not to derive or justify the specific accuracy numbers. The chain is therefore self-contained against independent external data.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the assumption that AI surrogate agents neutrally facilitate discussion without introducing their own bias and that conversation rate is a valid, non-confounded proxy for group insight. No free parameters are fitted in the reported statistics.

axioms (2)

domain assumption AI agents can intervene in group text/voice/video conversations without systematically altering human judgment
Invoked throughout the description of Hyperchat AI and Thinkscape platform operation.
ad hoc to paper Conversation rate during five-minute deliberations is independent of game difficulty or other external factors
Required to interpret the post-hoc exclusion of low-conversation forecasts as evidence of amplified intelligence rather than data selection.

invented entities (1)

Surrogate AI agents no independent evidence
purpose: Intervene in real-time group conversations to enable scalable deliberation
Core component of the Hyperchat AI architecture introduced to facilitate large-team forecasting.

pith-pipeline@v0.9.0 · 5612 in / 1687 out tokens · 54664 ms · 2026-05-15T20:51:59.929867+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Aggregating predictions from experts: A review of statistical methods, experiments, and applications,

T. McAndrew, N. Wattanachit, G. C. Gibson, and N. G. Reich, "Aggregating predictions from experts: A review of statistical methods, experiments, and applications," WIREs Comput. Stat., vol. 13, no. 2, Mar

work page
[2]

Available: https://doi.org/10.1002/wics.1514

[Online]. Available: https://doi.org/10.1002/wics.1514

work page doi:10.1002/wics.1514
[3]

Hemming, M

V. Hemming, M. A., Burgman, A. M. Hanea, M.F., McBride, & B. C. Wintle. (2018). A practical guide to structured expert elicitation using the IDEA protocol. Methods in Ecology and Evolution, 9(1), 169-180

work page 2018
[4]

Branch prediction in CPU pipelining,

L. Rosenberg, G. Willcox and H. Schumann, "Conversational Swarm Intelligence (CSI) Enables Rapid Group Insights," 2023 IEEE 14th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 2023, pp. 0534 -0539, doi: 10.1109/UEMCON59035.2023.10316130

work page doi:10.1109/uemcon59035.2023.10316130 2023
[5]

Conversational Collective Intelligence (CCI) using Hyperchat AI in a Real -world Forecasting Task

H. Schuman, L. Rosenberg, G. Mani, and G. Willcox. "Conversational Collective Intelligence (CCI) using Hyperchat AI in a Real -world Forecasting Task." 2025 11th Int. HCI and UX Conf. in Indonesia (CHIuXiD). IEEE, Dec 2025. https://arxiv.org/abs/2511.03732

work page arXiv 2025
[6]

Choosing a home: How the scouts in a honey bee swarm perceive the completion of their group decision making

T. D. Seeley, K. P. Visscher, “Choosing a home: How the scouts in a honey bee swarm perceive the completion of their group decision making.” Behavioural Ecology and Sociobiology 54 (5) 511-520

work page
[7]

L. B. Rosenberg, (2015). Human Swarms, a real -time method for collective intelligence. In Artificial Life Conference Proceedings (pp. 658-659). MIT Press. https://doi.org/10.1162/978-0-262-33027-5-ch117

work page doi:10.1162/978-0-262-33027-5-ch117 2015
[8]

Rosenberg, G

L. Rosenberg, G. Willcox, M. Palosuo & G. Mani (2021). Forecasting of volatile assets using artificial swarm intelligence. In Proceedings of the 2021 4th International Conf on Artificial Intelligence for Industries (AI4I) (pp. 30–33). IEEE. https://doi.org/10.1109/AI4I51902.2021.00015

work page doi:10.1109/ai4i51902.2021.00015 2021
[9]

B. N. Patel, L. Rosenberg, G. Willcox, et al. Human–machine partnership with artificial intelligence for chest radiograph diagnosis. npj Digit. Med. 2, 111 (2019). https://doi.org/10.1038/s41746-019-0189-7

work page doi:10.1038/s41746-019-0189-7 2019
[10]

Keeping Humans in the Loop: Pooling Knowledge through Artificial Swarm Intelligence to Improve Business Decision Making

L. Metcalf, D. A. Askay, and L. B. Rosenberg, "Keeping Humans in the Loop: Pooling Knowledge through Artificial Swarm Intelligence to Improve Business Decision Making", California Management Review, 2019,https://doi.org/10.1177/0008125619862256

work page doi:10.1177/0008125619862256 2019
[11]

L. B. Rosenberg, ‘Collective Superintelligence: Enabling Real -Time Conversational Deliberations among Humans and AI Agents at Unprecedented Scale’. In Foundations and Frontiers in Decision Science, edited by Dakshina Ranjan Kisku. Rijeka: IntechOpen, 2025. https://doi.org/10.5772/intechopen.1010201

work page doi:10.5772/intechopen.1010201 2025
[12]

Rosenberg, G

L. Rosenberg, G. Willcox, H. Schumann and G. Mani, 2024, January. Conversational Swarm Intelligence amplifies the accuracy of networked groupwise deliberations. In 2024 IEEE 14th Annual Computing and Communication Workshop and Conf (CCWC) (pp. 0086-0091). IEEE

work page 2024
[13]

Dense Neural Network used to Amplify the Forecasting Accuracy of real -time Human Swarms,

G. Willcox, L. Rosenberg, R. Donovan and H. Schumann, "Dense Neural Network used to Amplify the Forecasting Accuracy of real -time Human Swarms," 2019 11th International Conference on Computational Intelligence and Communication Networks (CICN), Honolulu, HI, USA, 2019, pp. 69-74, doi: 10.1109/CICN.2019.8902352

work page doi:10.1109/cicn.2019.8902352 2019
[14]

‘Methods and systems for hyperchat conversations among large networked populations with collective intelligence’

US Patent 11,949,638. ‘Methods and systems for hyperchat conversations among large networked populations with collective intelligence’. https://patents.google.com/patent/US11949638B1

work page
[15]

Winning Percentage of Professional Sports Bettors,

ProfessionalGambler.org, “Winning Percentage of Professional Sports Bettors,” ProfessionalGambler.org. [Online]. Available: https://professionalgambler.org/winning-percentages Accessed: 2/18/26

work page
[16]

Robbins, (2023)

T. Robbins, (2023). Weak Form Efficiency in Sports Betting Markets. American Journal of Management, 23(2). doi.org/10.33423/ajm.v23i2.6051

work page doi:10.33423/ajm.v23i2.6051 2023

[1] [1]

Aggregating predictions from experts: A review of statistical methods, experiments, and applications,

T. McAndrew, N. Wattanachit, G. C. Gibson, and N. G. Reich, "Aggregating predictions from experts: A review of statistical methods, experiments, and applications," WIREs Comput. Stat., vol. 13, no. 2, Mar

work page

[2] [2]

Available: https://doi.org/10.1002/wics.1514

[Online]. Available: https://doi.org/10.1002/wics.1514

work page doi:10.1002/wics.1514

[3] [3]

Hemming, M

V. Hemming, M. A., Burgman, A. M. Hanea, M.F., McBride, & B. C. Wintle. (2018). A practical guide to structured expert elicitation using the IDEA protocol. Methods in Ecology and Evolution, 9(1), 169-180

work page 2018

[4] [4]

Branch prediction in CPU pipelining,

L. Rosenberg, G. Willcox and H. Schumann, "Conversational Swarm Intelligence (CSI) Enables Rapid Group Insights," 2023 IEEE 14th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 2023, pp. 0534 -0539, doi: 10.1109/UEMCON59035.2023.10316130

work page doi:10.1109/uemcon59035.2023.10316130 2023

[5] [5]

Conversational Collective Intelligence (CCI) using Hyperchat AI in a Real -world Forecasting Task

H. Schuman, L. Rosenberg, G. Mani, and G. Willcox. "Conversational Collective Intelligence (CCI) using Hyperchat AI in a Real -world Forecasting Task." 2025 11th Int. HCI and UX Conf. in Indonesia (CHIuXiD). IEEE, Dec 2025. https://arxiv.org/abs/2511.03732

work page arXiv 2025

[6] [6]

Choosing a home: How the scouts in a honey bee swarm perceive the completion of their group decision making

T. D. Seeley, K. P. Visscher, “Choosing a home: How the scouts in a honey bee swarm perceive the completion of their group decision making.” Behavioural Ecology and Sociobiology 54 (5) 511-520

work page

[7] [7]

L. B. Rosenberg, (2015). Human Swarms, a real -time method for collective intelligence. In Artificial Life Conference Proceedings (pp. 658-659). MIT Press. https://doi.org/10.1162/978-0-262-33027-5-ch117

work page doi:10.1162/978-0-262-33027-5-ch117 2015

[8] [8]

Rosenberg, G

L. Rosenberg, G. Willcox, M. Palosuo & G. Mani (2021). Forecasting of volatile assets using artificial swarm intelligence. In Proceedings of the 2021 4th International Conf on Artificial Intelligence for Industries (AI4I) (pp. 30–33). IEEE. https://doi.org/10.1109/AI4I51902.2021.00015

work page doi:10.1109/ai4i51902.2021.00015 2021

[9] [9]

B. N. Patel, L. Rosenberg, G. Willcox, et al. Human–machine partnership with artificial intelligence for chest radiograph diagnosis. npj Digit. Med. 2, 111 (2019). https://doi.org/10.1038/s41746-019-0189-7

work page doi:10.1038/s41746-019-0189-7 2019

[10] [10]

Keeping Humans in the Loop: Pooling Knowledge through Artificial Swarm Intelligence to Improve Business Decision Making

L. Metcalf, D. A. Askay, and L. B. Rosenberg, "Keeping Humans in the Loop: Pooling Knowledge through Artificial Swarm Intelligence to Improve Business Decision Making", California Management Review, 2019,https://doi.org/10.1177/0008125619862256

work page doi:10.1177/0008125619862256 2019

[11] [11]

L. B. Rosenberg, ‘Collective Superintelligence: Enabling Real -Time Conversational Deliberations among Humans and AI Agents at Unprecedented Scale’. In Foundations and Frontiers in Decision Science, edited by Dakshina Ranjan Kisku. Rijeka: IntechOpen, 2025. https://doi.org/10.5772/intechopen.1010201

work page doi:10.5772/intechopen.1010201 2025

[12] [12]

Rosenberg, G

L. Rosenberg, G. Willcox, H. Schumann and G. Mani, 2024, January. Conversational Swarm Intelligence amplifies the accuracy of networked groupwise deliberations. In 2024 IEEE 14th Annual Computing and Communication Workshop and Conf (CCWC) (pp. 0086-0091). IEEE

work page 2024

[13] [13]

Dense Neural Network used to Amplify the Forecasting Accuracy of real -time Human Swarms,

G. Willcox, L. Rosenberg, R. Donovan and H. Schumann, "Dense Neural Network used to Amplify the Forecasting Accuracy of real -time Human Swarms," 2019 11th International Conference on Computational Intelligence and Communication Networks (CICN), Honolulu, HI, USA, 2019, pp. 69-74, doi: 10.1109/CICN.2019.8902352

work page doi:10.1109/cicn.2019.8902352 2019

[14] [14]

‘Methods and systems for hyperchat conversations among large networked populations with collective intelligence’

US Patent 11,949,638. ‘Methods and systems for hyperchat conversations among large networked populations with collective intelligence’. https://patents.google.com/patent/US11949638B1

work page

[15] [15]

Winning Percentage of Professional Sports Bettors,

ProfessionalGambler.org, “Winning Percentage of Professional Sports Bettors,” ProfessionalGambler.org. [Online]. Available: https://professionalgambler.org/winning-percentages Accessed: 2/18/26

work page

[16] [16]

Robbins, (2023)

T. Robbins, (2023). Weak Form Efficiency in Sports Betting Markets. American Journal of Management, 23(2). doi.org/10.33423/ajm.v23i2.6051

work page doi:10.33423/ajm.v23i2.6051 2023