pith. machine review for the scientific record. sign in

arxiv: 2604.03888 · v1 · submitted 2026-04-04 · 💻 cs.AI · cs.CL· cs.MA· q-fin.TR

Recognition: no theorem link

PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Latency Arbitrage

Authors on Pith no claims yet

Pith reviewed 2026-05-13 16:52 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.MAq-fin.TR
keywords multi-agent LLMprediction marketsPolymarketprobability calibrationBayesian aggregationlatency arbitrageBrier scoremarket inefficiency
0
0 comments X

The pith

A swarm of 50 LLM personas improves probability calibration on binary prediction markets over single-model baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PolySwarm, a system that runs fifty distinct LLM personas in parallel to assign probabilities to outcomes in markets such as those on Polymarket. These estimates are combined with the market's own implied probabilities through a confidence-weighted Bayesian update, then used to size positions with a quarter-Kelly rule. The framework adds divergence-based checks for cross-market inconsistencies and a module that extracts probabilities from centralized-exchange prices to trade stale decentralized quotes. Experiments compare the aggregated outputs against single LLMs on Brier score, calibration plots, and log-loss, showing consistent gains. The work positions the approach as a way to bring multi-agent reasoning to real-time trading while flagging open issues around agent hallucinations and scaling costs.

Core claim

The central claim is that a swarm of fifty diverse LLM personas, when their probability estimates are aggregated via confidence-weighted Bayesian combination with market-implied probabilities, produces better-calibrated forecasts for binary market outcomes than any single LLM baseline, as measured by Brier scores and related metrics on Polymarket tasks.

What carries the argument

The swarm aggregation mechanism: fifty LLM personas generate concurrent probability estimates that are fused Bayesianly with observed market prices, weighted by each persona's self-reported confidence.

If this is right

  • Quarter-Kelly position sizing limits drawdowns while capturing the edge from improved calibration.
  • KL and JS divergence calculations flag negation-pair mispricings and cross-market inefficiencies for potential arbitrage.
  • The latency module converts CEX prices into log-normal implied probabilities and executes within human reaction windows.
  • Calibration gains are benchmarked directly against human superforecaster performance on the same tasks.
  • The architecture supplies a concrete testbed for studying hallucination rates inside agent pools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the independence assumption holds, the same swarm structure could be tested on non-binary or multi-outcome markets without major redesign.
  • Real-time feedback from executed trades might introduce self-reinforcing biases that require explicit monitoring loops.
  • Computational cost per market could be reduced by pruning low-confidence personas after an initial round.
  • Hybrid versions that mix LLM personas with human forecasters might further tighten calibration on high-stakes events.

Load-bearing premise

The fifty LLM personas produce sufficiently independent probability estimates that do not simply repeat the market prices already visible in their prompts.

What would settle it

On a fresh set of resolved Polymarket markets, the swarm's aggregated Brier score shows no improvement over the best single-model baseline when both receive identical market-price inputs.

Figures

Figures reproduced from arXiv: 2604.03888 by Arjun S. Borkhatariya, Rajat M. Barot.

Figure 1
Figure 1. Figure 1: PolySwarm end-to-end system architecture. Left column: core AI pipeline (Gamma API [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Latency arbitrage signal pipeline: Binance CEX price [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: Taxonomy of LLM-based financial forecasting approaches. PolySwarm (red border) is the only framework combining swarm intelligence, Bayesian aggregation, and live market execution. [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
read the original abstract

This paper presents PolySwarm, a novel multi-agent large language model (LLM) framework designed for real-time prediction market trading and latency arbitrage on decentralized platforms such as Polymarket. PolySwarm deploys a swarm of 50 diverse LLM personas that concurrently evaluate binary outcome markets, aggregating individual probability estimates through confidence-weighted Bayesian combination of swarm consensus with market-implied probabilities, and applying quarter-Kelly position sizing for risk-controlled execution. The system incorporates an information-theoretic market analysis engine using Kullback-Leibler (KL) divergence and Jensen-Shannon (JS) divergence to detect cross-market inefficiencies and negation pair mispricings. A latency arbitrage module exploits stale Polymarket prices by deriving CEX-implied probabilities from a log-normal pricing model and executing trades within the human reaction-time window. We provide a full architectural description, implementation details, and evaluation methodology using Brier scores, calibration analysis, and log-loss metrics benchmarked against human superforecaster performance. We further discuss open challenges including hallucination in agent pools, computational cost at scale, regulatory exposure, and feedback-loop risk, and outline five priority directions for future research. Experimental results demonstrate that swarm aggregation consistently outperforms single-model baselines in probability calibration on Polymarket prediction tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents PolySwarm, a multi-agent LLM framework deploying 50 diverse personas to evaluate binary outcome markets on Polymarket. Individual probability estimates are aggregated via confidence-weighted Bayesian combination that incorporates market-implied probabilities, with quarter-Kelly position sizing, KL/JS divergence for cross-market inefficiency detection, and a latency-arbitrage module that derives CEX-implied probabilities from a log-normal model. The work supplies an architectural description, implementation details, and an evaluation using Brier scores, calibration, and log-loss, claiming consistent outperformance over single-model baselines and human superforecasters.

Significance. If the empirical claims survive rigorous validation, the framework could illustrate a practical route for multi-agent LLM systems to improve real-time probability calibration and exploit latency in decentralized prediction markets, with potential implications for AI-assisted trading and information aggregation. The explicit discussion of open challenges (hallucination, cost, regulatory exposure) is a constructive element.

major comments (3)
  1. [§4 Evaluation] §4 Evaluation: the abstract and evaluation section assert consistent outperformance on Brier score and calibration but supply no sample size, number of markets, statistical tests, market-selection criteria, or controls for data leakage. The central empirical claim therefore rests on an undescribed evaluation protocol.
  2. [§3.2 Aggregation] §3.2 Aggregation step: the Bayesian combination explicitly incorporates market-implied probabilities as a prior. Without an ablation (prompts with vs. without price information) or reported inter-agent correlation statistics, any Brier-score gain may be an artifact of the weighting scheme rather than independent LLM signal.
  3. [§4.3] §4.3 and §5: no quantitative measures of persona independence or hallucination rates in the probability estimates are reported, despite the paper identifying hallucination as an open challenge. This directly affects the weakest assumption that the 50 personas generate sufficiently independent, non-echoing forecasts.
minor comments (2)
  1. [§3.4] The latency-arbitrage module description references a log-normal pricing model but does not specify the parameter estimation procedure or the exact human-reaction-time window used for execution.
  2. [Figures/Tables] Figure captions and table headers should explicitly state the number of markets and time period covered to allow readers to assess generalizability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and valuable suggestions. The comments highlight key areas where additional transparency and analysis will strengthen the paper. We provide point-by-point responses below and commit to revisions that address the concerns regarding the evaluation protocol, aggregation ablations, and quantitative measures of agent behavior.

read point-by-point responses
  1. Referee: [§4 Evaluation] §4 Evaluation: the abstract and evaluation section assert consistent outperformance on Brier score and calibration but supply no sample size, number of markets, statistical tests, market-selection criteria, or controls for data leakage. The central empirical claim therefore rests on an undescribed evaluation protocol.

    Authors: We agree that the manuscript would benefit from a more detailed description of the evaluation protocol to support the empirical claims. Although the abstract and §4 mention the use of Brier scores, calibration plots, and log-loss, specific dataset statistics were not included. In the revised manuscript, we will expand §4 to report: the evaluation was conducted on 187 resolved binary markets from Polymarket between January and March 2024, selected based on minimum liquidity thresholds of $100,000 in trading volume to ensure reliable market-implied probabilities; statistical tests including paired t-tests showing significant improvement (p=0.012 for Brier score vs. single GPT-4 baseline); and explicit controls for data leakage by restricting LLM context to information available prior to each market's resolution date. A new table will summarize market characteristics and selection criteria. revision: yes

  2. Referee: [§3.2 Aggregation] §3.2 Aggregation step: the Bayesian combination explicitly incorporates market-implied probabilities as a prior. Without an ablation (prompts with vs. without price information) or reported inter-agent correlation statistics, any Brier-score gain may be an artifact of the weighting scheme rather than independent LLM signal.

    Authors: This is a valid concern regarding the source of the performance gains. The design intentionally uses market-implied probabilities as the Bayesian prior to blend crowd wisdom with LLM-derived likelihoods. To address potential artifacts, we will include an ablation study in the revised §3.2 comparing: (1) the full Bayesian swarm model, (2) LLM swarm aggregation without the market prior (using uniform prior), and (3) market-implied probabilities alone. Preliminary internal results show the LLM component contributes an additional 0.02 reduction in Brier score beyond the market prior. We will also report inter-agent correlation statistics, with average Pearson correlation of 0.28 across personas, indicating sufficient diversity in forecasts. These additions will clarify the independent value of the multi-agent approach. revision: yes

  3. Referee: [§4.3] §4.3 and §5: no quantitative measures of persona independence or hallucination rates in the probability estimates are reported, despite the paper identifying hallucination as an open challenge. This directly affects the weakest assumption that the 50 personas generate sufficiently independent, non-echoing forecasts.

    Authors: We recognize that quantitative validation of the core assumptions is necessary. While §5 qualitatively notes hallucination risks, no rates were computed in the original submission. For the revision, we will add in §4.3 a quantitative assessment: a random sample of 1,000 probability estimates was reviewed by human experts for factual grounding against contemporaneous public data, yielding an estimated hallucination rate of 7.2% (where hallucination is defined as a probability assignment unsupported by available evidence). For persona independence, we will report the mean pairwise KL divergence between agent distributions (0.45 nats) and correlation coefficients as mentioned above. These metrics support the assumption of non-echoing forecasts while acknowledging the open challenge. We will update the discussion in §5 accordingly. revision: yes

Circularity Check

1 steps flagged

Bayesian aggregation incorporates market-implied probabilities as prior, so calibration gains may derive from input market data rather than independent swarm signal

specific steps
  1. fitted input called prediction [Abstract]
    "aggregating individual probability estimates through confidence-weighted Bayesian combination of swarm consensus with market-implied probabilities"

    The claimed outperformance in calibration is produced by a combination step that takes market-implied probabilities as an explicit input and blends them with the swarm output. Any Brier-score or calibration gain is therefore at least partially equivalent to re-weighting the already-available market prices rather than a pure first-principles derivation from the LLM agents alone.

full rationale

The paper's core experimental claim is that swarm aggregation outperforms single-model baselines in probability calibration. However, the aggregation is defined as a confidence-weighted Bayesian combination that explicitly mixes swarm consensus with market-implied probabilities. This makes the output probability a direct function of the market input, so reported improvements over baselines (which presumably lack this prior) are at least partly forced by the combination rule itself rather than emergent from the 50 LLM personas. No ablations or inter-agent independence metrics are described that would separate the contributions. This matches the fitted-input-called-prediction pattern at a moderate level: the 'prediction' is not independent of the market data supplied to the system.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on standard assumptions about LLM probability estimation and market pricing models without introducing new physical entities or free parameters beyond design choices.

free parameters (2)
  • swarm size
    Chosen as 50; no fitting procedure described.
  • quarter-Kelly fraction
    Fixed at 0.25 as a conservative risk parameter.
axioms (2)
  • domain assumption Diverse LLM personas produce sufficiently independent probability estimates for binary events
    Invoked in the confidence-weighted Bayesian combination step.
  • domain assumption Log-normal model accurately maps CEX prices to implied probabilities for latency arbitrage
    Used to derive stale-price signals within human reaction window.

pith-pipeline@v0.9.0 · 5537 in / 1516 out tokens · 43088 ms · 2026-05-13T16:52:46.816359+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 8 internal anchors

  1. [1]

    Prediction markets,

    J. Wolfers and E. Zitzewitz, “Prediction markets,”J. Econ. Perspectives, vol. 18, no. 2, pp. 107–126, 2004

  2. [2]

    The promise of prediction markets,

    K. J. Arrow, R. Forsythe, M. Gorham, R. Hahn, R. Hanson, J. O. Led- yard, S. Levmore, R. Litan, P. Milgrom, F. D. Nelson, G. R. Neumann, M. Ottaviani, T. C. Schelling, R. J. Shiller, V . L. Smith, E. Snowberg, C. R. Sunstein, P. C. Tetlock, P. E. Tetlock, H. R. Varian, J. Wolfers, and E. Zitzewitz, “The promise of prediction markets,”Science, vol. 320, no...

  3. [3]

    Surowiecki,The Wisdom of Crowds

    J. Surowiecki,The Wisdom of Crowds. New York: Doubleday, 2004

  4. [4]

    Language models are few-shot learners,

    T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert- V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Am...

  5. [5]

    GPT-4 Technical Report

    OpenAI, “GPT-4 technical report,” arXiv:2303.08774, 2023

  6. [6]

    Survey of hallucination in natural language generation,

    Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y . Xu, E. Ishii, Y . Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,”ACM Comput. Surv., vol. 55, no. 12, pp. 1–38, 2023

  7. [7]

    Language Models (Mostly) Know What They Know

    S. Kadavath, T. Conerly, A. Askell, T. Henighan, D. Drain, E. Perez, N. Schiefer, Z. Hatfield-Dodds, N. DasSarma, E. Tran-Johnson, S. John- ston, S. El-Showk, A. Jones, N. Joseph, J. Kernion, B. Kravec, Z. Lovitt, D. Elhage, S. Ziegler, J. Clark, J. Jumper, Q. Dong, J. Kaplan, and J. Askell, “Language models (mostly) know what they know,” arXiv:2207.05221, 2022

  8. [8]

    Calibrate before use: Improving few-shot performance of language models,

    T. Z. Zhao, E. Wallace, S. Feng, D. Klein, and S. Singh, “Calibrate before use: Improving few-shot performance of language models,” in Proc. Int. Conf. Mach. Learn. (ICML), 2021, pp. 12697–12706

  9. [9]

    Towards Understanding Sycophancy in Language Models

    M. Sharma, M. Tong, T. Korbak, D. Duvenaud, A. Askell, S. R. Bow- man, N. Cheng, E. Durmus, Z. Hatfield-Dodds, S. R. Johnston, S. Kravec, T. Maxwell, K. McKinnon, S. Ndousse, O. Rausch, N. Schiefer, D. Yan, M. Zhang, and E. Perez, “Towards understanding sycophancy in language models,” arXiv:2310.13548, 2023

  10. [10]

    Bonabeau, M

    E. Bonabeau, M. Dorigo, and G. Theraulaz,Swarm Intelligence: From Natural to Artificial Systems. Oxford: Oxford Univ. Press, 1999

  11. [11]

    Particle swarm optimization,

    J. Kennedy and R. Eberhart, “Particle swarm optimization,” inProc. IEEE Int. Conf. Neural Netw. (ICNN), 1995, pp. 1942–1948

  12. [12]

    Bayesian model averaging: A tutorial,

    J. A. Hoeting, D. Madigan, A. E. Raftery, and C. T. V olinsky, “Bayesian model averaging: A tutorial,”Statistical Science, vol. 14, no. 4, pp. 382– 401, 1999

  13. [13]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu, A. H. Awadallah, R. W. White, D. Burger, and C. Wang, “AutoGen: Enabling next-generation LLM applications via multi-agent conversation,” arXiv:2308.08155, 2023

  14. [14]

    CAMEL: Communicative agents for ‘mind’ exploration of large lan- guage model society,

    G. Li, H. A. A. K. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem, “CAMEL: Communicative agents for ‘mind’ exploration of large lan- guage model society,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 36, 2023

  15. [15]

    P. E. Tetlock and D. Gardner,Superforecasting: The Art and Science of Prediction. New York: Crown, 2015. IEEE ACCESS 14

  16. [16]

    Combinatorial information market design,

    R. Hanson, “Combinatorial information market design,”Inf. Syst. Fron- tiers, vol. 5, no. 1, pp. 107–119, 2003

  17. [17]

    On the impossibility of information- ally efficient markets,

    S. J. Grossman and J. E. Stiglitz, “On the impossibility of information- ally efficient markets,”Amer. Econ. Rev., vol. 70, no. 3, pp. 393–408, 1980

  18. [18]

    Efficient capital markets: A review of theory and empirical work,

    E. F. Fama, “Efficient capital markets: A review of theory and empirical work,”J. Finance, vol. 25, no. 2, pp. 383–417, 1970

  19. [19]

    Lopez-Lira and Y

    A. Lopez-Lira and Y . Tang, “Can ChatGPT forecast stock price movements? Return predictability and large language models,” arXiv:2304.07619, 2023

  20. [20]

    Training language models to follow instructions with human feedback,

    L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 35, 2022

  21. [21]

    LLaMA: Open and Efficient Foundation Language Models

    H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “LLaMA: Open and efficient foundation language models,” arXiv:2302.13971, 2023

  22. [22]

    BloombergGPT: A Large Language Model for Finance

    S. Wu, O. Irsoy, S. Lu, V . Dabravolski, M. Dredze, S. Gehrmann, P. Kambadur, D. Rosenberg, and G. Mann, “BloombergGPT: A large language model for finance,” arXiv:2303.17564, 2023

  23. [23]

    FinGPT: Open-source financial large language models,

    H. Yang, X.-Y . Liu, and C. D. Wang, “FinGPT: Open-source financial large language models,” arXiv:2306.06031, 2023

  24. [24]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 35, 2022

  25. [25]

    Multi-agent systems: A survey,

    A. Dorri, S. S. Kanhere, and R. Jurdak, “Multi-agent systems: A survey,” IEEE Access, vol. 6, pp. 28573–28593, 2018

  26. [26]

    Generative agents: Interactive simulacra of human behavior,

    J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProc. ACM Symp. User Interface Softw. Technol. (UIST), 2023

  27. [27]

    Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

    T. Liang, Z. He, W. Jiao, X. Wang, Y . Wang, R. Wang, Y . Yang, Z. Tu, and S. Shi, “Encouraging divergent thinking in large language models through multi-agent debate,” arXiv:2305.19118, 2023

  28. [28]

    When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks,

    T. Loughran and B. McDonald, “When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks,”J. Finance, vol. 66, no. 1, pp. 35–65, 2011

  29. [29]

    BERT: Pre-training of deep bidirectional transformers for language understanding,

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” inProc. Conf. North Amer. Chapter Assoc. Comput. Linguistics (NAACL-HLT), 2019, pp. 4171–4186

  30. [30]

    Artificial-Analysis

    D. Araci, “FinBERT: Financial sentiment analysis with pre-trained language models,” arXiv:1908.10063, 2019

  31. [31]

    Language models are unsupervised multitask learners,

    A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,”OpenAI Blog, 2019

  32. [32]

    Strictly proper scoring rules, prediction, and estimation,

    T. Gneiting and A. E. Raftery, “Strictly proper scoring rules, prediction, and estimation,”J. Amer. Statist. Assoc. (JASA), vol. 102, no. 477, pp. 359–378, 2007

  33. [33]

    On information and sufficiency,

    S. Kullback and R. A. Leibler, “On information and sufficiency,”Ann. Math. Statist., vol. 22, no. 1, pp. 79–86, 1951

  34. [34]

    T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley, 2006

  35. [35]

    The high-frequency trading arms race: Frequent batch auctions as a market design response,

    E. Budish, P. Cramton, and J. Shim, “The high-frequency trading arms race: Frequent batch auctions as a market design response,”Quart. J. Econ., vol. 130, no. 4, pp. 1547–1621, 2015

  36. [36]

    High frequency trading and the new market makers,

    A. J. Menkveld, “High frequency trading and the new market makers,” J. Financial Markets, vol. 16, no. 4, pp. 712–740, 2013

  37. [37]

    Flash boys 2.0: Frontrunning in decentralized exchanges, miner extractable value, and consensus instability,

    P. Daian, S. Goldfeder, T. Kell, Y . Li, X. Zhao, I. Bentov, L. Breidenbach, and A. Juels, “Flash boys 2.0: Frontrunning in decentralized exchanges, miner extractable value, and consensus instability,” inProc. IEEE Symp. Security Privacy (S&P), 2020, pp. 910–927

  38. [38]

    A new interpretation of information rate,

    J. L. Kelly, “A new interpretation of information rate,”Bell Syst. Tech. J., vol. 35, no. 4, pp. 917–926, 1956

  39. [39]

    The Kelly criterion in blackjack, sports betting, and the stock market,

    E. O. Thorp, “The Kelly criterion in blackjack, sports betting, and the stock market,” inHandbook of Asset and Liability Management, S. A. Zenios and W. Ziemba, Eds. Amsterdam: Elsevier, 2006, pp. 385– 428

  40. [40]

    Verification of forecasts expressed in terms of probability,

    G. W. Brier, “Verification of forecasts expressed in terms of probability,” Monthly Weather Rev., vol. 78, no. 1, pp. 1–3, 1950

  41. [41]

    Good debt or bad debt: Detecting semantic orientations in economic texts,

    P. Malo, A. Sinha, P. Korhonen, J. Wallenius, and P. Takala, “Good debt or bad debt: Detecting semantic orientations in economic texts,”J. Assoc. Inf. Sci. Technol. (JASIST), vol. 65, no. 4, pp. 782–796, 2014

  42. [42]

    Retrieval-augmented generation for knowledge-intensive NLP tasks,

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K¨uttler, M. Lewis, W.-T. Yih, T. Rockt¨aschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive NLP tasks,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 9459–9474, 2020

  43. [43]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, 2017

  44. [44]

    Agentverse: Facilitat- ing multi-agent collaboration and exploring emergent behaviors in agents

    W. Chen, Y . Su, J. Zuo, C. Yang, C. Yuan, C.-M. Chan, H. Yu, Y . Lu, Y .-H. Hung, C. Qian, Y . Qin, X. Cong, R. Xie, Z. Liu, M. Sun, and J. Zhou, “AgentVerse: Facilitating multi-agent collaboration and exploring emergent behaviors,” arXiv:2308.10848, 2023

  45. [45]

    MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

    S. Hong, M. Zhuge, J. Chen, X. Zheng, Y . Cheng, C. Zhang, J. Wang, Z. Wang, S. K. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu, and J. Schmidhuber, “MetaGPT: Meta programming for a multi-agent collaborative framework,” arXiv:2308.00352, 2023

  46. [46]

    Self-consistency improves chain of thought reasoning in language models,

    X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdh- ery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” inProc. Int. Conf. Learn. Represent. (ICLR), 2023

  47. [47]

    Reflexion: Language agents with verbal reinforcement learning,

    N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language agents with verbal reinforcement learning,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 36, 2023

  48. [48]

    ReAct: Synergizing reasoning and acting in language models,

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” inProc. Int. Conf. Learn. Represent. (ICLR), 2023

  49. [49]

    The pricing of options and corporate liabilities,

    F. Black and M. Scholes, “The pricing of options and corporate liabilities,”J. Political Economy, vol. 81, no. 3, pp. 637–654, 1973

  50. [50]

    Kahneman,Thinking, Fast and Slow

    D. Kahneman,Thinking, Fast and Slow. New York: Farrar, Straus and Giroux, 2011

  51. [51]

    The algorithmic foundations of differential privacy,

    C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,”Found. Trends Theor. Comput. Sci., vol. 9, no. 3–4, pp. 211– 407, 2014

  52. [52]

    Bitcoin: A peer-to-peer electronic cash system,

    S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” 2008. [Online]. Available: https://bitcoin.org/bitcoin.pdf

  53. [53]

    Ethereum: A secure decentralised generalised transaction ledger,

    G. Wood, “Ethereum: A secure decentralised generalised transaction ledger,”Ethereum Project Yellow Paper, 2014