Conversational Forecasting Across Large Human Groups Using A Swarm of Surrogate AI Agents
Pith reviewed 2026-05-15 20:51 UTC · model grok-4.3
The pith
Groups of 25-30 fans using AI-swarm conversations forecast NBA games at 62 percent accuracy against the spread.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When human teams of 25 to 30 participants discuss and debate NBA games for five minutes each using the Hyperchat AI architecture, they achieve 62 percent accuracy forecasting outcomes against the spread across 50 games, rising to 68 percent when low-conversation forecasts are excluded, outperforming both Vegas odds and Polymarket while generating positive ROI.
What carries the argument
Hyperchat AI, a communication architecture that inserts surrogate AI agents to sustain real-time text, voice, or video deliberation among large human teams without conversation collapse.
Load-bearing premise
Accuracy gains arise from the AI-mediated conversation process rather than from fan knowledge, game selection, or the post-hoc exclusion of low-discussion cases.
What would settle it
Run the identical 50 games with matched groups of 25-30 fans but without surrogate AI agents, then check whether accuracy falls to 50 percent or below.
Figures
read the original abstract
Hyperchat AI is a communication and collaboration architecture that employs intervening AI agents to enable real-time conversational deliberations among networked human teams of unlimited size. Prior work has shown that teams as large as 250 people can hold productive real-time conversations by text, voice, or video using Hyperchat AI to discuss complex problems, brainstorm solutions, surface risks, assess alternatives, prioritize options, and converge on optimized results. Building on this prior work, this new study tasked groups of 25 to 30 basketball fans with conversationally forecasting NBA games (against the spread) over a 12-week period. Results show that when discussing and debating NBA games (for five minutes each) using a Hyperchat AI enabled platform called Thinkscape, human teams were 62% accurate across a set of 50 forecasted NBA games. This is an impressive result versus the Vegas odds of 50% (p=0.059). Furthermore, had the participants wagered on the games, they would have produced an 18.4% ROI over the 12-week period. In addition, this study found that the group's conversation rate during each forecast was positively correlated with their prediction accuracy. In fact, when excluding the 12 forecasts in the bottom 25th percentile by average conversation rate, the remaining 38 forecasts recorded a 68% accuracy, significantly better than the 50% Vegas odds (p=0.017). This result also outperformed the well-known prediction market Polymarket (p=0.062) across the same set of NBA games. These outcomes suggest that real-time conversational deliberations, when facilitated by Surrogate AI agents, can significantly amplify groupwise collective intelligence during human forecasting tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents results from an experiment in which groups of 25-30 NBA fans used the Thinkscape platform (powered by Hyperchat AI and surrogate AI agents) to hold five-minute real-time deliberations forecasting the outcomes of 50 NBA games against the spread. The authors report 62% group accuracy (p=0.059 vs. 50% Vegas odds baseline), an 18.4% ROI if wagering, and that excluding the 12 lowest-conversation-rate forecasts yields 68% accuracy (p=0.017) that also outperforms Polymarket (p=0.062). They conclude that AI-facilitated conversational deliberation amplifies collective intelligence in forecasting tasks.
Significance. If the accuracy gains prove robust after addressing selection bias and marginal significance, the work would provide empirical evidence that AI-mediated group conversations can improve forecasting performance over market baselines in a scalable way, with implications for collective intelligence research and applied prediction systems.
major comments (3)
- [Abstract / Results] Abstract and Results: The stronger 68% accuracy claim (p=0.017) is obtained only after post-hoc exclusion of the bottom 25% of forecasts by average conversation rate. If conversation rate correlates with game difficulty or spread width, this filter preferentially removes harder-to-predict games and directly undermines the central claim of improved collective intelligence. The manuscript must either pre-specify the filter, demonstrate independence from game-level covariates, or report the unfiltered result as primary.
- [Abstract] Abstract: The headline 62% accuracy on all 50 games yields p=0.059, which falls short of conventional significance thresholds. No confidence intervals, effect sizes, per-game sample details, or error bars are reported, making it impossible to assess the reliability of the result or its sensitivity to outliers.
- [Methods] Methods: No description of game selection randomization, pre-registered analysis plan, or control arm comparing AI-facilitated vs. non-facilitated high-engagement groups. Without these, the attribution of accuracy gains specifically to the surrogate AI agents (rather than fan expertise or game selection) cannot be isolated.
minor comments (2)
- [Abstract] The abstract states groups of 25-30 fans but provides no exact participant counts, total unique individuals, or per-game participation rates.
- [Results] Clarify whether the 18.4% ROI calculation assumes equal wagering on every game and accounts for vig or transaction costs.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have helped us identify areas for improvement in statistical reporting and methodological transparency. We address each point below and have revised the manuscript to prioritize the unfiltered results, add missing statistical details, and acknowledge study limitations.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and Results: The stronger 68% accuracy claim (p=0.017) is obtained only after post-hoc exclusion of the bottom 25% of forecasts by average conversation rate. If conversation rate correlates with game difficulty or spread width, this filter preferentially removes harder-to-predict games and directly undermines the central claim of improved collective intelligence. The manuscript must either pre-specify the filter, demonstrate independence from game-level covariates, or report the unfiltered result as primary.
Authors: We agree that the 68% result is post-hoc and should not be the primary claim. In the revised manuscript, the 62% accuracy on all 50 games will be presented as the main result. The conversation-rate filtered analysis will be repositioned as exploratory, accompanied by an explicit discussion of potential selection bias. We will test for correlations between conversation rate and game-level covariates (e.g., spread width and historical team performance) and report the findings or note their absence as a limitation. revision: partial
-
Referee: [Abstract] Abstract: The headline 62% accuracy on all 50 games yields p=0.059, which falls short of conventional significance thresholds. No confidence intervals, effect sizes, per-game sample details, or error bars are reported, making it impossible to assess the reliability of the result or its sensitivity to outliers.
Authors: We will update the abstract, results, and any figures to report 95% confidence intervals for the accuracy rates, effect sizes (e.g., Cohen's h), and per-game participant counts. Error bars will be added to visualizations. The p=0.059 value will be framed as marginal, with emphasis on the accompanying 18.4% ROI as a practical indicator of performance. revision: yes
-
Referee: [Methods] Methods: No description of game selection randomization, pre-registered analysis plan, or control arm comparing AI-facilitated vs. non-facilitated high-engagement groups. Without these, the attribution of accuracy gains specifically to the surrogate AI agents (rather than fan expertise or game selection) cannot be isolated.
Authors: The Methods section will be expanded to clarify that games were selected as a convenience sample of NBA contests during the 12-week period for which participant groups were available. This was an exploratory study without a pre-registered analysis plan. No non-AI control arm was included, as the goal was to assess the AI-facilitated platform's performance. These design choices will be stated explicitly as limitations, with recommendations for future randomized studies to isolate the agents' contribution. revision: partial
- Absence of a pre-registered analysis plan and a non-AI control arm, which cannot be retroactively implemented in the existing dataset.
Circularity Check
No significant circularity in empirical forecasting results
full rationale
The paper reports new experimental outcomes from 50 NBA game forecasts by human teams using the Thinkscape platform. Accuracy figures (62% overall, 68% after excluding bottom-quartile conversation-rate cases) are computed directly from observed human predictions and compared to external benchmarks (Vegas odds at 50%, Polymarket). No equations, fitted parameters presented as predictions, self-definitional constructs, or load-bearing self-citations appear in the derivation of these results. Prior work is referenced only to establish the platform's general conversational capability, not to derive or justify the specific accuracy numbers. The chain is therefore self-contained against independent external data.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption AI agents can intervene in group text/voice/video conversations without systematically altering human judgment
- ad hoc to paper Conversation rate during five-minute deliberations is independent of game difficulty or other external factors
invented entities (1)
-
Surrogate AI agents
no independent evidence
Reference graph
Works this paper leans on
-
[1]
T. McAndrew, N. Wattanachit, G. C. Gibson, and N. G. Reich, "Aggregating predictions from experts: A review of statistical methods, experiments, and applications," WIREs Comput. Stat., vol. 13, no. 2, Mar
-
[2]
Available: https://doi.org/10.1002/wics.1514
[Online]. Available: https://doi.org/10.1002/wics.1514
-
[3]
V. Hemming, M. A., Burgman, A. M. Hanea, M.F., McBride, & B. C. Wintle. (2018). A practical guide to structured expert elicitation using the IDEA protocol. Methods in Ecology and Evolution, 9(1), 169-180
work page 2018
-
[4]
Branch prediction in CPU pipelining,
L. Rosenberg, G. Willcox and H. Schumann, "Conversational Swarm Intelligence (CSI) Enables Rapid Group Insights," 2023 IEEE 14th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 2023, pp. 0534 -0539, doi: 10.1109/UEMCON59035.2023.10316130
-
[5]
Conversational Collective Intelligence (CCI) using Hyperchat AI in a Real -world Forecasting Task
H. Schuman, L. Rosenberg, G. Mani, and G. Willcox. "Conversational Collective Intelligence (CCI) using Hyperchat AI in a Real -world Forecasting Task." 2025 11th Int. HCI and UX Conf. in Indonesia (CHIuXiD). IEEE, Dec 2025. https://arxiv.org/abs/2511.03732
-
[6]
T. D. Seeley, K. P. Visscher, “Choosing a home: How the scouts in a honey bee swarm perceive the completion of their group decision making.” Behavioural Ecology and Sociobiology 54 (5) 511-520
-
[7]
L. B. Rosenberg, (2015). Human Swarms, a real -time method for collective intelligence. In Artificial Life Conference Proceedings (pp. 658-659). MIT Press. https://doi.org/10.1162/978-0-262-33027-5-ch117
-
[8]
L. Rosenberg, G. Willcox, M. Palosuo & G. Mani (2021). Forecasting of volatile assets using artificial swarm intelligence. In Proceedings of the 2021 4th International Conf on Artificial Intelligence for Industries (AI4I) (pp. 30–33). IEEE. https://doi.org/10.1109/AI4I51902.2021.00015
-
[9]
B. N. Patel, L. Rosenberg, G. Willcox, et al. Human–machine partnership with artificial intelligence for chest radiograph diagnosis. npj Digit. Med. 2, 111 (2019). https://doi.org/10.1038/s41746-019-0189-7
-
[10]
L. Metcalf, D. A. Askay, and L. B. Rosenberg, "Keeping Humans in the Loop: Pooling Knowledge through Artificial Swarm Intelligence to Improve Business Decision Making", California Management Review, 2019,https://doi.org/10.1177/0008125619862256
-
[11]
L. B. Rosenberg, ‘Collective Superintelligence: Enabling Real -Time Conversational Deliberations among Humans and AI Agents at Unprecedented Scale’. In Foundations and Frontiers in Decision Science, edited by Dakshina Ranjan Kisku. Rijeka: IntechOpen, 2025. https://doi.org/10.5772/intechopen.1010201
-
[12]
L. Rosenberg, G. Willcox, H. Schumann and G. Mani, 2024, January. Conversational Swarm Intelligence amplifies the accuracy of networked groupwise deliberations. In 2024 IEEE 14th Annual Computing and Communication Workshop and Conf (CCWC) (pp. 0086-0091). IEEE
work page 2024
-
[13]
Dense Neural Network used to Amplify the Forecasting Accuracy of real -time Human Swarms,
G. Willcox, L. Rosenberg, R. Donovan and H. Schumann, "Dense Neural Network used to Amplify the Forecasting Accuracy of real -time Human Swarms," 2019 11th International Conference on Computational Intelligence and Communication Networks (CICN), Honolulu, HI, USA, 2019, pp. 69-74, doi: 10.1109/CICN.2019.8902352
-
[14]
US Patent 11,949,638. ‘Methods and systems for hyperchat conversations among large networked populations with collective intelligence’. https://patents.google.com/patent/US11949638B1
-
[15]
Winning Percentage of Professional Sports Bettors,
ProfessionalGambler.org, “Winning Percentage of Professional Sports Bettors,” ProfessionalGambler.org. [Online]. Available: https://professionalgambler.org/winning-percentages Accessed: 2/18/26
-
[16]
T. Robbins, (2023). Weak Form Efficiency in Sports Betting Markets. American Journal of Management, 23(2). doi.org/10.33423/ajm.v23i2.6051
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.