More capable LLMs produce worse distributional forecasts on superlinear growth time series with tail risks of regime change, with the error concentrated in the upper tail; this reverses on conventional threshold metrics.
arXiv preprint arXiv:2206.15474 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
baseline 1polarities
baseline 1representative citing papers
Frontier AI models lose 16-31% trading on Kalshi over 57 days but show better results on Polymarket, with platform design strongly affecting outcomes and prediction accuracy mattering more than research volume.
BLF achieves state-of-the-art binary forecasting on ForecastBench by using linguistic belief states updated in tool-use loops, hierarchical multi-trial logit averaging, and hierarchical Platt scaling calibration.
Milkyway uses pre-resolution signals from temporal contrasts in evolving evidence and repeated forecasts to evolve a harness and improve predictions before resolution, outperforming baselines on FutureX and FutureWorld.
citing papers explorer
-
Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most
More capable LLMs produce worse distributional forecasts on superlinear growth time series with tail risks of regime change, with the error concentrated in the upper tail; this reverses on conventional threshold metrics.
-
Prediction Arena: Benchmarking AI Models on Real-World Prediction Markets
Frontier AI models lose 16-31% trading on Kalshi over 57 days but show better results on Polymarket, with platform design strongly affecting outcomes and prediction accuracy mattering more than research volume.
-
Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs
BLF achieves state-of-the-art binary forecasting on ForecastBench by using linguistic belief states updated in tool-use loops, hierarchical multi-trial logit averaging, and hierarchical Platt scaling calibration.
-
Harnessing Pre-Resolution Signals for Future Prediction Agents
Milkyway uses pre-resolution signals from temporal contrasts in evolving evidence and repeated forecasts to evolve a harness and improve predictions before resolution, outperforming baselines on FutureX and FutureWorld.