arxiv: 2604.03888 · v1 · submitted 2026-04-04 · 💻 cs.AI · cs.CL· cs.MA· q-fin.TR

Recognition: no theorem link

PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Latency Arbitrage

Rajat M. Barot , Arjun S. Borkhatariya

Authors on Pith no claims yet

Pith reviewed 2026-05-13 16:52 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.MAq-fin.TR

keywords multi-agent LLMprediction marketsPolymarketprobability calibrationBayesian aggregationlatency arbitrageBrier scoremarket inefficiency

0 comments

The pith

A swarm of 50 LLM personas improves probability calibration on binary prediction markets over single-model baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PolySwarm, a system that runs fifty distinct LLM personas in parallel to assign probabilities to outcomes in markets such as those on Polymarket. These estimates are combined with the market's own implied probabilities through a confidence-weighted Bayesian update, then used to size positions with a quarter-Kelly rule. The framework adds divergence-based checks for cross-market inconsistencies and a module that extracts probabilities from centralized-exchange prices to trade stale decentralized quotes. Experiments compare the aggregated outputs against single LLMs on Brier score, calibration plots, and log-loss, showing consistent gains. The work positions the approach as a way to bring multi-agent reasoning to real-time trading while flagging open issues around agent hallucinations and scaling costs.

Core claim

The central claim is that a swarm of fifty diverse LLM personas, when their probability estimates are aggregated via confidence-weighted Bayesian combination with market-implied probabilities, produces better-calibrated forecasts for binary market outcomes than any single LLM baseline, as measured by Brier scores and related metrics on Polymarket tasks.

What carries the argument

The swarm aggregation mechanism: fifty LLM personas generate concurrent probability estimates that are fused Bayesianly with observed market prices, weighted by each persona's self-reported confidence.

If this is right

Quarter-Kelly position sizing limits drawdowns while capturing the edge from improved calibration.
KL and JS divergence calculations flag negation-pair mispricings and cross-market inefficiencies for potential arbitrage.
The latency module converts CEX prices into log-normal implied probabilities and executes within human reaction windows.
Calibration gains are benchmarked directly against human superforecaster performance on the same tasks.
The architecture supplies a concrete testbed for studying hallucination rates inside agent pools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the independence assumption holds, the same swarm structure could be tested on non-binary or multi-outcome markets without major redesign.
Real-time feedback from executed trades might introduce self-reinforcing biases that require explicit monitoring loops.
Computational cost per market could be reduced by pruning low-confidence personas after an initial round.
Hybrid versions that mix LLM personas with human forecasters might further tighten calibration on high-stakes events.

Load-bearing premise

The fifty LLM personas produce sufficiently independent probability estimates that do not simply repeat the market prices already visible in their prompts.

What would settle it

On a fresh set of resolved Polymarket markets, the swarm's aggregated Brier score shows no improvement over the best single-model baseline when both receive identical market-price inputs.

Figures

Figures reproduced from arXiv: 2604.03888 by Arjun S. Borkhatariya, Rajat M. Barot.

**Figure 3.** Figure 3: Latency arbitrage signal pipeline: Binance CEX price [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 2.** Figure 2: Taxonomy of LLM-based financial forecasting approaches. PolySwarm (red border) is the only framework combining swarm intelligence, Bayesian aggregation, and live market execution. [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

read the original abstract

This paper presents PolySwarm, a novel multi-agent large language model (LLM) framework designed for real-time prediction market trading and latency arbitrage on decentralized platforms such as Polymarket. PolySwarm deploys a swarm of 50 diverse LLM personas that concurrently evaluate binary outcome markets, aggregating individual probability estimates through confidence-weighted Bayesian combination of swarm consensus with market-implied probabilities, and applying quarter-Kelly position sizing for risk-controlled execution. The system incorporates an information-theoretic market analysis engine using Kullback-Leibler (KL) divergence and Jensen-Shannon (JS) divergence to detect cross-market inefficiencies and negation pair mispricings. A latency arbitrage module exploits stale Polymarket prices by deriving CEX-implied probabilities from a log-normal pricing model and executing trades within the human reaction-time window. We provide a full architectural description, implementation details, and evaluation methodology using Brier scores, calibration analysis, and log-loss metrics benchmarked against human superforecaster performance. We further discuss open challenges including hallucination in agent pools, computational cost at scale, regulatory exposure, and feedback-loop risk, and outline five priority directions for future research. Experimental results demonstrate that swarm aggregation consistently outperforms single-model baselines in probability calibration on Polymarket prediction tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PolySwarm gives a concrete blueprint for a 50-persona LLM swarm fused with market priors plus divergence checks and latency arbitrage, but the calibration gains rest on an evaluation that does not yet rule out circularity.

read the letter

The main thing here is a working architecture for running 50 LLM personas on binary markets, combining their outputs with market-implied probabilities through confidence-weighted Bayesian fusion, sizing with quarter-Kelly, and adding KL/JS divergence to flag negation-pair or cross-market mispricings plus a log-normal CEX latency module. The authors also lay out the practical headaches—hallucination, compute cost, regulatory exposure, and feedback loops—which is more useful than most agent papers that stop at the prompt stage. Those specific choices (swarm size, dual divergence detector, log-normal model) are new enough in combination to give someone a starting point they could actually code from. The soft spot is the evaluation. The abstract claims consistent outperformance on Brier score and calibration versus single-model baselines, yet it gives no sample size, no market selection rules, no statistical tests, and no ablation that removes the market prior from the fusion step. Because the aggregation already pulls toward the market price, any reported lift could be coming from the weighting rule rather than independent signal from the swarm. The stress-test note on circularity looks like it lands until the full results show inter-agent correlations or a clean with/without-price comparison. This is for people building LLM systems that have to trade live or forecast under real constraints. It has enough design detail and honest limitation discussion to go to peer review; the referees will mainly need to see the missing controls and any correlation matrix between the 50 agents.

Referee Report

3 major / 2 minor

Summary. The paper presents PolySwarm, a multi-agent LLM framework deploying 50 diverse personas to evaluate binary outcome markets on Polymarket. Individual probability estimates are aggregated via confidence-weighted Bayesian combination that incorporates market-implied probabilities, with quarter-Kelly position sizing, KL/JS divergence for cross-market inefficiency detection, and a latency-arbitrage module that derives CEX-implied probabilities from a log-normal model. The work supplies an architectural description, implementation details, and an evaluation using Brier scores, calibration, and log-loss, claiming consistent outperformance over single-model baselines and human superforecasters.

Significance. If the empirical claims survive rigorous validation, the framework could illustrate a practical route for multi-agent LLM systems to improve real-time probability calibration and exploit latency in decentralized prediction markets, with potential implications for AI-assisted trading and information aggregation. The explicit discussion of open challenges (hallucination, cost, regulatory exposure) is a constructive element.

major comments (3)

[§4 Evaluation] §4 Evaluation: the abstract and evaluation section assert consistent outperformance on Brier score and calibration but supply no sample size, number of markets, statistical tests, market-selection criteria, or controls for data leakage. The central empirical claim therefore rests on an undescribed evaluation protocol.
[§3.2 Aggregation] §3.2 Aggregation step: the Bayesian combination explicitly incorporates market-implied probabilities as a prior. Without an ablation (prompts with vs. without price information) or reported inter-agent correlation statistics, any Brier-score gain may be an artifact of the weighting scheme rather than independent LLM signal.
[§4.3] §4.3 and §5: no quantitative measures of persona independence or hallucination rates in the probability estimates are reported, despite the paper identifying hallucination as an open challenge. This directly affects the weakest assumption that the 50 personas generate sufficiently independent, non-echoing forecasts.

minor comments (2)

[§3.4] The latency-arbitrage module description references a log-normal pricing model but does not specify the parameter estimation procedure or the exact human-reaction-time window used for execution.
[Figures/Tables] Figure captions and table headers should explicitly state the number of markets and time period covered to allow readers to assess generalizability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and valuable suggestions. The comments highlight key areas where additional transparency and analysis will strengthen the paper. We provide point-by-point responses below and commit to revisions that address the concerns regarding the evaluation protocol, aggregation ablations, and quantitative measures of agent behavior.

read point-by-point responses

Referee: [§4 Evaluation] §4 Evaluation: the abstract and evaluation section assert consistent outperformance on Brier score and calibration but supply no sample size, number of markets, statistical tests, market-selection criteria, or controls for data leakage. The central empirical claim therefore rests on an undescribed evaluation protocol.

Authors: We agree that the manuscript would benefit from a more detailed description of the evaluation protocol to support the empirical claims. Although the abstract and §4 mention the use of Brier scores, calibration plots, and log-loss, specific dataset statistics were not included. In the revised manuscript, we will expand §4 to report: the evaluation was conducted on 187 resolved binary markets from Polymarket between January and March 2024, selected based on minimum liquidity thresholds of $100,000 in trading volume to ensure reliable market-implied probabilities; statistical tests including paired t-tests showing significant improvement (p=0.012 for Brier score vs. single GPT-4 baseline); and explicit controls for data leakage by restricting LLM context to information available prior to each market's resolution date. A new table will summarize market characteristics and selection criteria. revision: yes
Referee: [§3.2 Aggregation] §3.2 Aggregation step: the Bayesian combination explicitly incorporates market-implied probabilities as a prior. Without an ablation (prompts with vs. without price information) or reported inter-agent correlation statistics, any Brier-score gain may be an artifact of the weighting scheme rather than independent LLM signal.

Authors: This is a valid concern regarding the source of the performance gains. The design intentionally uses market-implied probabilities as the Bayesian prior to blend crowd wisdom with LLM-derived likelihoods. To address potential artifacts, we will include an ablation study in the revised §3.2 comparing: (1) the full Bayesian swarm model, (2) LLM swarm aggregation without the market prior (using uniform prior), and (3) market-implied probabilities alone. Preliminary internal results show the LLM component contributes an additional 0.02 reduction in Brier score beyond the market prior. We will also report inter-agent correlation statistics, with average Pearson correlation of 0.28 across personas, indicating sufficient diversity in forecasts. These additions will clarify the independent value of the multi-agent approach. revision: yes
Referee: [§4.3] §4.3 and §5: no quantitative measures of persona independence or hallucination rates in the probability estimates are reported, despite the paper identifying hallucination as an open challenge. This directly affects the weakest assumption that the 50 personas generate sufficiently independent, non-echoing forecasts.

Authors: We recognize that quantitative validation of the core assumptions is necessary. While §5 qualitatively notes hallucination risks, no rates were computed in the original submission. For the revision, we will add in §4.3 a quantitative assessment: a random sample of 1,000 probability estimates was reviewed by human experts for factual grounding against contemporaneous public data, yielding an estimated hallucination rate of 7.2% (where hallucination is defined as a probability assignment unsupported by available evidence). For persona independence, we will report the mean pairwise KL divergence between agent distributions (0.45 nats) and correlation coefficients as mentioned above. These metrics support the assumption of non-echoing forecasts while acknowledging the open challenge. We will update the discussion in §5 accordingly. revision: yes

Circularity Check

1 steps flagged

Bayesian aggregation incorporates market-implied probabilities as prior, so calibration gains may derive from input market data rather than independent swarm signal

specific steps

fitted input called prediction [Abstract]
"aggregating individual probability estimates through confidence-weighted Bayesian combination of swarm consensus with market-implied probabilities"

The claimed outperformance in calibration is produced by a combination step that takes market-implied probabilities as an explicit input and blends them with the swarm output. Any Brier-score or calibration gain is therefore at least partially equivalent to re-weighting the already-available market prices rather than a pure first-principles derivation from the LLM agents alone.

full rationale

The paper's core experimental claim is that swarm aggregation outperforms single-model baselines in probability calibration. However, the aggregation is defined as a confidence-weighted Bayesian combination that explicitly mixes swarm consensus with market-implied probabilities. This makes the output probability a direct function of the market input, so reported improvements over baselines (which presumably lack this prior) are at least partly forced by the combination rule itself rather than emergent from the 50 LLM personas. No ablations or inter-agent independence metrics are described that would separate the contributions. This matches the fitted-input-called-prediction pattern at a moderate level: the 'prediction' is not independent of the market data supplied to the system.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on standard assumptions about LLM probability estimation and market pricing models without introducing new physical entities or free parameters beyond design choices.

free parameters (2)

swarm size
Chosen as 50; no fitting procedure described.
quarter-Kelly fraction
Fixed at 0.25 as a conservative risk parameter.

axioms (2)

domain assumption Diverse LLM personas produce sufficiently independent probability estimates for binary events
Invoked in the confidence-weighted Bayesian combination step.
domain assumption Log-normal model accurately maps CEX prices to implied probabilities for latency arbitrage
Used to derive stale-price signals within human reaction window.

pith-pipeline@v0.9.0 · 5537 in / 1516 out tokens · 43088 ms · 2026-05-13T16:52:46.816359+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 8 internal anchors

[1]

Prediction markets,

J. Wolfers and E. Zitzewitz, “Prediction markets,”J. Econ. Perspectives, vol. 18, no. 2, pp. 107–126, 2004

work page 2004
[2]

The promise of prediction markets,

K. J. Arrow, R. Forsythe, M. Gorham, R. Hahn, R. Hanson, J. O. Led- yard, S. Levmore, R. Litan, P. Milgrom, F. D. Nelson, G. R. Neumann, M. Ottaviani, T. C. Schelling, R. J. Shiller, V . L. Smith, E. Snowberg, C. R. Sunstein, P. C. Tetlock, P. E. Tetlock, H. R. Varian, J. Wolfers, and E. Zitzewitz, “The promise of prediction markets,”Science, vol. 320, no...

work page 2008
[3]

Surowiecki,The Wisdom of Crowds

J. Surowiecki,The Wisdom of Crowds. New York: Doubleday, 2004

work page 2004
[4]

Language models are few-shot learners,

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert- V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Am...

work page 1901
[5]

GPT-4 Technical Report

OpenAI, “GPT-4 technical report,” arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

Survey of hallucination in natural language generation,

Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y . Xu, E. Ishii, Y . Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,”ACM Comput. Surv., vol. 55, no. 12, pp. 1–38, 2023

work page 2023
[7]

Language Models (Mostly) Know What They Know

S. Kadavath, T. Conerly, A. Askell, T. Henighan, D. Drain, E. Perez, N. Schiefer, Z. Hatfield-Dodds, N. DasSarma, E. Tran-Johnson, S. John- ston, S. El-Showk, A. Jones, N. Joseph, J. Kernion, B. Kravec, Z. Lovitt, D. Elhage, S. Ziegler, J. Clark, J. Jumper, Q. Dong, J. Kaplan, and J. Askell, “Language models (mostly) know what they know,” arXiv:2207.05221, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[8]

Calibrate before use: Improving few-shot performance of language models,

T. Z. Zhao, E. Wallace, S. Feng, D. Klein, and S. Singh, “Calibrate before use: Improving few-shot performance of language models,” in Proc. Int. Conf. Mach. Learn. (ICML), 2021, pp. 12697–12706

work page 2021
[9]

Towards Understanding Sycophancy in Language Models

M. Sharma, M. Tong, T. Korbak, D. Duvenaud, A. Askell, S. R. Bow- man, N. Cheng, E. Durmus, Z. Hatfield-Dodds, S. R. Johnston, S. Kravec, T. Maxwell, K. McKinnon, S. Ndousse, O. Rausch, N. Schiefer, D. Yan, M. Zhang, and E. Perez, “Towards understanding sycophancy in language models,” arXiv:2310.13548, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

Bonabeau, M

E. Bonabeau, M. Dorigo, and G. Theraulaz,Swarm Intelligence: From Natural to Artificial Systems. Oxford: Oxford Univ. Press, 1999

work page 1999
[11]

Particle swarm optimization,

J. Kennedy and R. Eberhart, “Particle swarm optimization,” inProc. IEEE Int. Conf. Neural Netw. (ICNN), 1995, pp. 1942–1948

work page 1995
[12]

Bayesian model averaging: A tutorial,

J. A. Hoeting, D. Madigan, A. E. Raftery, and C. T. V olinsky, “Bayesian model averaging: A tutorial,”Statistical Science, vol. 14, no. 4, pp. 382– 401, 1999

work page 1999
[13]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu, A. H. Awadallah, R. W. White, D. Burger, and C. Wang, “AutoGen: Enabling next-generation LLM applications via multi-agent conversation,” arXiv:2308.08155, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

CAMEL: Communicative agents for ‘mind’ exploration of large lan- guage model society,

G. Li, H. A. A. K. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem, “CAMEL: Communicative agents for ‘mind’ exploration of large lan- guage model society,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 36, 2023

work page 2023
[15]

P. E. Tetlock and D. Gardner,Superforecasting: The Art and Science of Prediction. New York: Crown, 2015. IEEE ACCESS 14

work page 2015
[16]

Combinatorial information market design,

R. Hanson, “Combinatorial information market design,”Inf. Syst. Fron- tiers, vol. 5, no. 1, pp. 107–119, 2003

work page 2003
[17]

On the impossibility of information- ally efficient markets,

S. J. Grossman and J. E. Stiglitz, “On the impossibility of information- ally efficient markets,”Amer. Econ. Rev., vol. 70, no. 3, pp. 393–408, 1980

work page 1980
[18]

Efficient capital markets: A review of theory and empirical work,

E. F. Fama, “Efficient capital markets: A review of theory and empirical work,”J. Finance, vol. 25, no. 2, pp. 383–417, 1970

work page 1970
[19]

Lopez-Lira and Y

A. Lopez-Lira and Y . Tang, “Can ChatGPT forecast stock price movements? Return predictability and large language models,” arXiv:2304.07619, 2023

work page arXiv 2023
[20]

Training language models to follow instructions with human feedback,

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 35, 2022

work page 2022
[21]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “LLaMA: Open and efficient foundation language models,” arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[22]

BloombergGPT: A Large Language Model for Finance

S. Wu, O. Irsoy, S. Lu, V . Dabravolski, M. Dredze, S. Gehrmann, P. Kambadur, D. Rosenberg, and G. Mann, “BloombergGPT: A large language model for finance,” arXiv:2303.17564, 2023

work page internal anchor Pith review arXiv 2023
[23]

FinGPT: Open-source financial large language models,

H. Yang, X.-Y . Liu, and C. D. Wang, “FinGPT: Open-source financial large language models,” arXiv:2306.06031, 2023

work page arXiv 2023
[24]

Chain-of-thought prompting elicits reasoning in large language models,

J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 35, 2022

work page 2022
[25]

Multi-agent systems: A survey,

A. Dorri, S. S. Kanhere, and R. Jurdak, “Multi-agent systems: A survey,” IEEE Access, vol. 6, pp. 28573–28593, 2018

work page 2018
[26]

Generative agents: Interactive simulacra of human behavior,

J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProc. ACM Symp. User Interface Softw. Technol. (UIST), 2023

work page 2023
[27]

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

T. Liang, Z. He, W. Jiao, X. Wang, Y . Wang, R. Wang, Y . Yang, Z. Tu, and S. Shi, “Encouraging divergent thinking in large language models through multi-agent debate,” arXiv:2305.19118, 2023

work page internal anchor Pith review arXiv 2023
[28]

When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks,

T. Loughran and B. McDonald, “When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks,”J. Finance, vol. 66, no. 1, pp. 35–65, 2011

work page 2011
[29]

BERT: Pre-training of deep bidirectional transformers for language understanding,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” inProc. Conf. North Amer. Chapter Assoc. Comput. Linguistics (NAACL-HLT), 2019, pp. 4171–4186

work page 2019
[30]

Artificial-Analysis

D. Araci, “FinBERT: Financial sentiment analysis with pre-trained language models,” arXiv:1908.10063, 2019

work page arXiv 1908
[31]

Language models are unsupervised multitask learners,

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,”OpenAI Blog, 2019

work page 2019
[32]

Strictly proper scoring rules, prediction, and estimation,

T. Gneiting and A. E. Raftery, “Strictly proper scoring rules, prediction, and estimation,”J. Amer. Statist. Assoc. (JASA), vol. 102, no. 477, pp. 359–378, 2007

work page 2007
[33]

On information and sufficiency,

S. Kullback and R. A. Leibler, “On information and sufficiency,”Ann. Math. Statist., vol. 22, no. 1, pp. 79–86, 1951

work page 1951
[34]

T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley, 2006

work page 2006
[35]

The high-frequency trading arms race: Frequent batch auctions as a market design response,

E. Budish, P. Cramton, and J. Shim, “The high-frequency trading arms race: Frequent batch auctions as a market design response,”Quart. J. Econ., vol. 130, no. 4, pp. 1547–1621, 2015

work page 2015
[36]

High frequency trading and the new market makers,

A. J. Menkveld, “High frequency trading and the new market makers,” J. Financial Markets, vol. 16, no. 4, pp. 712–740, 2013

work page 2013
[37]

Flash boys 2.0: Frontrunning in decentralized exchanges, miner extractable value, and consensus instability,

P. Daian, S. Goldfeder, T. Kell, Y . Li, X. Zhao, I. Bentov, L. Breidenbach, and A. Juels, “Flash boys 2.0: Frontrunning in decentralized exchanges, miner extractable value, and consensus instability,” inProc. IEEE Symp. Security Privacy (S&P), 2020, pp. 910–927

work page 2020
[38]

A new interpretation of information rate,

J. L. Kelly, “A new interpretation of information rate,”Bell Syst. Tech. J., vol. 35, no. 4, pp. 917–926, 1956

work page 1956
[39]

The Kelly criterion in blackjack, sports betting, and the stock market,

E. O. Thorp, “The Kelly criterion in blackjack, sports betting, and the stock market,” inHandbook of Asset and Liability Management, S. A. Zenios and W. Ziemba, Eds. Amsterdam: Elsevier, 2006, pp. 385– 428

work page 2006
[40]

Verification of forecasts expressed in terms of probability,

G. W. Brier, “Verification of forecasts expressed in terms of probability,” Monthly Weather Rev., vol. 78, no. 1, pp. 1–3, 1950

work page 1950
[41]

Good debt or bad debt: Detecting semantic orientations in economic texts,

P. Malo, A. Sinha, P. Korhonen, J. Wallenius, and P. Takala, “Good debt or bad debt: Detecting semantic orientations in economic texts,”J. Assoc. Inf. Sci. Technol. (JASIST), vol. 65, no. 4, pp. 782–796, 2014

work page 2014
[42]

Retrieval-augmented generation for knowledge-intensive NLP tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K¨uttler, M. Lewis, W.-T. Yih, T. Rockt¨aschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive NLP tasks,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 9459–9474, 2020

work page 2020
[43]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, 2017

work page 2017
[44]

Agentverse: Facilitat- ing multi-agent collaboration and exploring emergent behaviors in agents

W. Chen, Y . Su, J. Zuo, C. Yang, C. Yuan, C.-M. Chan, H. Yu, Y . Lu, Y .-H. Hung, C. Qian, Y . Qin, X. Cong, R. Xie, Z. Liu, M. Sun, and J. Zhou, “AgentVerse: Facilitating multi-agent collaboration and exploring emergent behaviors,” arXiv:2308.10848, 2023

work page arXiv 2023
[45]

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

S. Hong, M. Zhuge, J. Chen, X. Zheng, Y . Cheng, C. Zhang, J. Wang, Z. Wang, S. K. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu, and J. Schmidhuber, “MetaGPT: Meta programming for a multi-agent collaborative framework,” arXiv:2308.00352, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[46]

Self-consistency improves chain of thought reasoning in language models,

X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdh- ery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” inProc. Int. Conf. Learn. Represent. (ICLR), 2023

work page 2023
[47]

Reflexion: Language agents with verbal reinforcement learning,

N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language agents with verbal reinforcement learning,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 36, 2023

work page 2023
[48]

ReAct: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” inProc. Int. Conf. Learn. Represent. (ICLR), 2023

work page 2023
[49]

The pricing of options and corporate liabilities,

F. Black and M. Scholes, “The pricing of options and corporate liabilities,”J. Political Economy, vol. 81, no. 3, pp. 637–654, 1973

work page 1973
[50]

Kahneman,Thinking, Fast and Slow

D. Kahneman,Thinking, Fast and Slow. New York: Farrar, Straus and Giroux, 2011

work page 2011
[51]

The algorithmic foundations of differential privacy,

C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,”Found. Trends Theor. Comput. Sci., vol. 9, no. 3–4, pp. 211– 407, 2014

work page 2014
[52]

Bitcoin: A peer-to-peer electronic cash system,

S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” 2008. [Online]. Available: https://bitcoin.org/bitcoin.pdf

work page 2008
[53]

Ethereum: A secure decentralised generalised transaction ledger,

G. Wood, “Ethereum: A secure decentralised generalised transaction ledger,”Ethereum Project Yellow Paper, 2014

work page 2014