Markets with Heterogeneous Agents: Dynamics and Survival of Bayesian vs. No-Regret Learners
Pith reviewed 2026-05-23 03:52 UTC · model grok-4.3
The pith
Bayesian learners can drive no-regret learners out of markets despite the latter achieving logarithmic regret.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that in asset markets, an agent's long-run survival is governed by its relative performance in predicting payoffs compared to others. Surprisingly, no-regret learners can be eliminated even when they attain logarithmic regret bounds if pitted against Bayesian learners with finite priors that include the correct payoff-generating process. While Bayesian learning excels when the prior is accurate, it is vulnerable to misspecification, making no-regret learning more adaptable to shifts in distributions.
What carries the argument
The market selection mechanism based on wealth shares updated by realized payoffs, which equates survival to outpredicting competitors in a repeated stochastic game.
If this is right
- Regret minimization alone does not ensure positive long-run market share against informed Bayesian agents.
- Bayesian learners with correct priors dominate but fail under distribution shifts.
- Hybrid strategies that blend Bayesian updates with no-regret elements provide improved robustness.
- No-regret learning requires less environment knowledge than full Bayesian approaches.
Where Pith is reading between the lines
- This result implies that in uncertain or changing markets, agents might benefit from prioritizing robustness over precise Bayesian inference.
- The unification of regret and survival concepts could extend to algorithmic trading environments where learning types compete over finite horizons.
- Varying the support size of the Bayesian prior in simulations would reveal thresholds where logarithmic regret becomes sufficient for survival.
Load-bearing premise
Market survival depends on relative wealth growth determined by prediction accuracy against a Bayesian competitor whose prior includes the true model.
What would settle it
A market simulation or observation in which a logarithmic-regret agent maintains positive wealth share indefinitely against a Bayesian learner with the true model in its finite prior.
Figures
read the original abstract
We analyze the performance of heterogeneous learning agents in asset markets with stochastic payoffs. Our main focus is on comparing Bayesian learners and no-regret learners who compete in markets and identifying the conditions under which each approach is more effective. We formally relate the notions of survival and market dominance studied in economics and the framework of regret minimization, thereby bridging these theories. A central finding is that regret plays a key role in market selection, but low regret alone does not guarantee survival: surprisingly, an agent may achieve even logarithmic regret and yet be driven out of the market when competing against a Bayesian learner with a finite prior that assigns positive probability to the correct model. At the same time, we show that Bayesian learning is highly fragile, while no-regret learning requires less knowledge of the environment and is therefore more robust. Motivated by this contrast, we propose two simple hybrid strategies that incorporate Bayesian updates while improving robustness and adaptability to distribution shifts, taking a step toward a best-of-both-worlds learning approach. More broadly, our work contributes to the understanding of dynamics of heterogeneous learning agents and their impact on markets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes heterogeneous learning in asset markets with stochastic payoffs, relating survival/market dominance from economics to regret minimization. It claims that logarithmic regret does not guarantee survival against a Bayesian learner whose finite prior places positive mass on the true model; Bayesian learning is fragile to shifts while no-regret is more robust; and two hybrid strategies are proposed that combine Bayesian updates with improved robustness.
Significance. If the central claims are established with explicit wealth-update and market-clearing rules that keep the correct model exogenous, the work would usefully bridge regret minimization and market-selection theories and motivate hybrid learners. The abstract alone supplies no derivations, proofs, or simulation details, so soundness cannot yet be assessed.
major comments (2)
- [model definition / wealth-update rules (abstract and §3)] The central survival claim (abstract) requires an exogenous 'correct model' to which the Bayesian prior assigns positive mass and against which regret is measured. If asset returns are determined by market clearing (aggregate demand affects prices), the return distribution depends on both agents' strategies, rendering the correct model endogenous. This fixed-point issue is load-bearing for the comparison between Bayesian and no-regret survival and is not addressed by merely positing a finite prior.
- [abstract and main theorems] The statement that 'an agent may achieve even logarithmic regret and yet be driven out' is presented as a central finding, yet no explicit wealth-update equation, market-clearing condition, or regret bound is supplied in the abstract. Without these, it is impossible to verify whether the claimed separation between regret and survival follows from the model assumptions rather than from an implicit exogenous-payoff assumption.
minor comments (2)
- [hybrid strategies section] Notation for the two hybrid strategies is introduced only in the abstract; their precise update rules and robustness guarantees should be stated explicitly in the main text.
- [conclusion / discussion] The paper would benefit from a short table contrasting the knowledge requirements and fragility properties of pure Bayesian, pure no-regret, and hybrid learners.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments. We address each major comment below and indicate planned revisions to clarify the model and strengthen the presentation.
read point-by-point responses
-
Referee: The central survival claim (abstract) requires an exogenous 'correct model' to which the Bayesian prior assigns positive mass and against which regret is measured. If asset returns are determined by market clearing (aggregate demand affects prices), the return distribution depends on both agents' strategies, rendering the correct model endogenous. This fixed-point issue is load-bearing for the comparison between Bayesian and no-regret survival and is not addressed by merely positing a finite prior.
Authors: We appreciate the referee highlighting this modeling consideration. In the paper, asset payoffs are drawn from a fixed exogenous stochastic distribution that defines the 'correct model' (to which the Bayesian prior assigns positive mass and against which regret is measured). Market clearing determines equilibrium prices from aggregate demand, but wealth updates depend on realized payoffs from the exogenous distribution; the true distribution itself does not depend on agents' strategies. We will revise Section 3 to include the explicit wealth-update equation and market-clearing condition, and add a clarifying sentence on exogeneity. This construction ensures the fixed-point issue does not arise. revision: yes
-
Referee: The statement that 'an agent may achieve even logarithmic regret and yet be driven out' is presented as a central finding, yet no explicit wealth-update equation, market-clearing condition, or regret bound is supplied in the abstract. Without these, it is impossible to verify whether the claimed separation between regret and survival follows from the model assumptions rather than from an implicit exogenous-payoff assumption.
Authors: The abstract summarizes the main findings at a high level; the wealth-update rules, market-clearing conditions, and regret bounds appear explicitly in Sections 3 and 4, where the theorems establishing the separation (under the exogenous-payoff model) are proved. We will revise the abstract to briefly reference the exogenous stochastic payoffs assumption, improving verifiability while respecting abstract conventions. revision: partial
Circularity Check
No significant circularity; derivation self-contained against external benchmarks
full rationale
The provided abstract and context present a comparison of Bayesian and no-regret learners via market survival and regret bounds, relating existing economic and algorithmic frameworks without any quoted equations or steps that reduce a claimed prediction to a fitted input, self-definition, or self-citation chain. No load-bearing uniqueness theorem or ansatz is invoked from prior author work in a way that collapses the central contrast (low regret not guaranteeing survival against a finite-prior Bayesian) to an input by construction. The modeling assumptions about exogenous payoffs and correct models are stated as primitives for the comparison rather than derived from the result itself, satisfying the criteria for an independent derivation.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Asset markets have stochastic payoffs
- domain assumption Survival and market dominance are well-defined outcomes of repeated trading
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 3.1: agent survives iff lim (R_n(T) - R_m(T)) < ∞ for every competitor m
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
wealth ratio log(r_nm_T) expressed via relative entropies I_q(α)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
G. Aggarwal, A. Gupta, A. Perlroth, and G. Velegkas. Randomized truthful auctions with learning agents.arXiv preprint arXiv:2411.09517, 2024
-
[2]
A. Alchian. Uncertainty, evolution and economic theory.Journal of Political Economy, 58: 211–221, 1950
work page 1950
-
[3]
K. J. Arrow. The role of securities in the optimal allocation of risk-bearing1.The Review of Economic Studies, 31(2):91–96, 04 1964
work page 1964
- [4]
- [5]
-
[6]
E. R. Arunachaleswaran, N. Collina, and J. Schneider. Pareto-optimal algorithms for learning in games. InProceedings of the 25th ACM Conference on Economics and Computation, pages 490–510, 2024
work page 2024
-
[7]
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem.SIAM J. Comput., 32(1):48–77, 2002
work page 2002
-
[8]
M. Babaioff, Y. Kolumbus, and E. Winter. Optimal collaterals in multi-enterprise investment networks. InProceedings of the ACM Web Conference 2022, pages 79–89, 2022
work page 2022
-
[9]
S. R. Balseiro and Y. Gur. Learning in repeated auctions with budgets: Regret minimization and equilibrium.Management Science, 65(9), 2019
work page 2019
-
[10]
M. Banchio and G. Mantegazza. Adaptive algorithms and collusion via coupling. InEC, page 208, 2023
work page 2023
-
[11]
R. Battalio, B. Hatch, and M. Sağlam. The cost of exposing large institutional orders to electronic liquidity providers.Management Science, 70(6):3597–3618, 2024
work page 2024
-
[12]
M. Bichler, S. B. Lunowa, M. Oberlechner, F. R. Pieroth, and B. Wohlmuth. On the convergence of learning algorithms in bayesian auction games.arXiv preprint arXiv:2311.15398, 2023
- [13]
- [14]
-
[15]
A. Blum and A. Kalai. Universal portfolios with and without transaction costs.Mach. Learn., 35 (3):193–205, 1999
work page 1999
-
[16]
A. Blum, M. Hajiaghayi, K. Ligett, and A. Roth. Regret minimization and the price of total anarchy. InProceedings of the fortieth annual ACM symposium on Theory of computing, pages 373–382, 2008
work page 2008
-
[17]
L. Blume and D. Easley. Evolution and market behavior.Journal of Economic Theory, 58(1): 9–40, Oct. 1992
work page 1992
-
[18]
L. Blume and D. Easley. If you’re so smart, why aren’t you rich? belief selection in complete and incomplete markets.Econometrica, 74(4):929–966, 2006
work page 2006
-
[19]
S. Brânzei. Exchange markets: proportional response dynamics and beyond.ACM SIGecom Exchanges, 19(2):37–45, 2021. 27
work page 2021
-
[20]
S. Branzei, R. Mehta, and N. Nisan. Universal growth in production economies.Advances in Neural Information Processing Systems, 31, 2018
work page 2018
-
[21]
S. Brânzei, N. Devanur, and Y. Rabani. Proportional dynamics in exchange economies. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 180–201, 2021
work page 2021
-
[22]
S. Branzei, R. Mehta, and N. Nisan. Tit-for-tat strategies drive growth and inequality in production economies. InProceedings A, volume 481. The Royal Society, 2025
work page 2025
-
[23]
M. Braverman, J. Mao, J. Schneider, and S. M. Weinberg. Selling to a no-regret buyer. InACM Conference on Economics and Computation, EC, pages 523–538, 2018
work page 2018
-
[24]
G. W. Brown. Iterative solution of games by fictitious play.Activity analysis of production and allocation, 13(1):374–376, 1951
work page 1951
-
[25]
Z. Y. Brown and A. MacKay. Competition in pricing algorithms.American Economic Journal: Microeconomics, 15(2):109–156, 2023
work page 2023
- [26]
-
[27]
N. Cesa-Bianchi and G. Lugosi.Prediction, learning, and games. Cambridge university press, 2006
work page 2006
-
[28]
N. Cesa-Bianchi, T. R. Cesari, R. Colomboni, F. Fusco, and S. Leonardi. A regret analysis of bilateral trade. InProceedings of the 22nd ACM Conference on Economics and Computation, pages 289–309, 2021
work page 2021
-
[29]
N. Cesa-Bianchi, T. Cesari, R. Colomboni, F. Fusco, and S. Leonardi. Bilateral trade: A regret minimization perspective.Mathematics of Operations Research, 2023
work page 2023
-
[30]
Y. K. Cheung and R. Cole. Amortized analysis of asynchronous price dynamics.arXiv:1806.10952, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[31]
Y. K. Cheung, R. Cole, and Y. Tao. Dynamics of distributed updating in fisher markets. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 351–368, 2018
work page 2018
-
[32]
B. S. Clarke and A. R. Barron. Information theoretic asymptotics of bayes methods.IEEE Transactions on Information Theory, 36(3):453–71, 1990
work page 1990
-
[33]
R. Cole and L. Fleischer. Fast-converging tatonnement algorithms for one-time and ongoing market problems. InProceedings of the fortieth annual ACM symposium on Theory of computing, pages 315–324, 2008
work page 2008
-
[34]
N. Collina, V. Gupta, and A. Roth. Repeated contracting with multiple non-myopic agents: Policy regret and limited liability. InProceedings of the 25th ACM Conference on Economics and Computation, pages 640–668, 2024
work page 2024
-
[35]
C. Daskalakis and V. Syrgkanis. Learning in auctions: Regret is hard, envy is easy. In2016 ieee 57th annual symposium on foundations of computer science (focs), pages 219–228. IEEE, 2016
work page 2016
-
[36]
M. L. de Prado.Advances in Financial Machine Learning. Wiley, 2018
work page 2018
- [37]
-
[38]
X. Deng, X. Hu, T. Lin, and W. Zheng. Nash convergence of mean-based learning algorithms in first price auctions. InProceedings of the ACM Web Conference 2022, pages 141–150, 2022
work page 2022
-
[39]
E. Fama. The behavior of stock market prices.Journal of Business, 38(1):34–105, Jan. 1965. 28
work page 1965
-
[40]
J. Farrell and E. Maskin. Renegotiation in repeated games.Games and economic behavior, 1(4): 327–360, 1989
work page 1989
-
[41]
Y. Feng, B. Lucier, and A. Slivkins. Strategic budget selection in a competitive autobidding world. InProceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 213–224, 2024
work page 2024
- [42]
-
[43]
G. Fikioris and É. Tardos. Liquid welfare guarantees for no-regret learning in sequential budgeted auctions. InProceedings of the 24th ACM Conference on Economics and Computation, pages 678–698, 2023
work page 2023
-
[44]
G. Fikioris, R. Kleinberg, Y. Kolumbus, R. Kumar, Y. Mansour, and É. Tardos. Learning in budgeted auctions with spacing objectives.arXiv preprint arXiv:2411.04843, 2024
-
[45]
D. P. Foster and R. V. Vohra. Calibrated learning and correlated equilibrium.Games and Economic Behavior, 21(1-2):40–55, 1997
work page 1997
-
[46]
Friedman.Essays in Positive Economics
M. Friedman.Essays in Positive Economics. University of Chicago Press, Chicago, 1953
work page 1953
-
[47]
D. Fudenberg and D. K. Levine. Consistency and cautious fictitious play.Journal of Economic Dynamics and Control, 19(5-7):1065–1089, 1995
work page 1995
-
[48]
E. Gofer and Y. Mansour. Lower bounds on individual sequence regret.Machine Learning, 103: 1–26, 2016
work page 2016
-
[49]
M. Goldstein, A. Kwan, and R. Philip. High-frequency trading strategies.Management Science, 69(8):4413–4434, 2023
work page 2023
-
[50]
W. Guo, M. Jordan, and E. Vitercik. No-regret learning in partially-informed auctions. In International Conference on Machine Learning, pages 8039–8055. PMLR, 2022
work page 2022
-
[51]
G. Guruganesh, Y. Kolumbus, J. Schneider, I. Talgam-Cohen, E.-V. Vlatakis-Gkaragkounis, J. Wang, and S. Weinberg. Contracting with a learning agent.Advances in Neural Information Processing Systems, 37:77366–77408, 2024
work page 2024
- [52]
-
[53]
J. Hannan. Approximation to Bayes risk in repeated play. InContributions to the Theory of Games (AM-39), Volume III, pages 97–139. Princeton University Press, 1957
work page 1957
-
[54]
Harris.Trading and Exchanges: Market Microstructure for Practitioners
L. Harris.Trading and Exchanges: Market Microstructure for Practitioners. Oxford University Press, 2003
work page 2003
-
[55]
S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000
work page 2000
-
[56]
S. Hart and A. Mas-Colell.Simple adaptive strategies: from regret-matching to uncoupled dynamics, volume 4. World Scientific, 2013
work page 2013
-
[57]
J. D. Hartline, S. Long, and C. Zhang. Regulation of algorithmic collusion. InProceedings of the Symposium on Computer Science and Law, pages 98–108, 2024
work page 2024
- [58]
-
[59]
E. Hazan and S. Kale. An online portfolio selection algorithm with regret logarithmic in price 29 variation.Mathematical Finance, 25(2):288–310, 2015
work page 2015
-
[60]
E. Hazan and C. Seshadhri. Efficient learning algorithms for changing environments. InProceedings of the 26th annual international conference on machine learning, pages 393–400, 2009
work page 2009
-
[61]
E. Hazan et al. Introduction to online convex optimization.Foundations and Trends®in Optimization, 2(3-4):157–325, 2016
work page 2016
-
[62]
T. Hens and K. Schenk-Hoppe.Handbook of Financial Markets: Dynamics and Evolution. 01 2009
work page 2009
-
[63]
M. O. Jackson and A. Pernoud. Systemic risk in financial networks: A survey.Annual Review of Economics, 13(1):171–202, 2021
work page 2021
-
[64]
A. Kalai and S. Vempala. Efficient algorithms for online decision problems.Journal of Computer and System Sciences, 71(3):291–307, 2005
work page 2005
-
[65]
J. L. Kelly. A new interpretation of information rate.Bell system technical Journal, 35:917–926., 1956
work page 1956
-
[66]
Y. Kolumbus and N. Nisan. How and why to manipulate your own agent: On the incentives of users of learning agents.Advances in Neural Information Processing Systems, 35:28080–28094, 2022
work page 2022
-
[67]
Y. Kolumbus and N. Nisan. Auctions between regret-minimizing agents. InACM Web Conference, WebConf, pages 100–111, 2022
work page 2022
-
[68]
Y. Kolumbus, M. Levy, and N. Nisan. Asynchronous proportional response dynamics: convergence in markets with adversarial scheduling.Advances in Neural Information Processing Systems, 36: 25409–25434, 2023
work page 2023
-
[69]
Y. Kolumbus, J. Halpern, and É. Tardos. Paying to do better: Games with payments between learning agents.arXiv preprint arXiv:2405.20880, 2024
-
[70]
S. S. Kozat and A. C. Singer. Universal constant rebalanced portfolios with switching. In2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, volume 3, pages III–1129. IEEE, 2007
work page 2007
-
[71]
X. Li, B. Shou, and Z. Qin. An expected regret minimization portfolio selection model.European Journal of Operational Research, 218(2):484–492, 2012
work page 2012
- [72]
-
[73]
Y. Mansour, M. Mohri, J. Schneider, and B. Sivan. Strategizing against learners in bayesian games. InConference on Learning Theory, pages 5221–5252. PMLR, 2022
work page 2022
- [74]
-
[75]
R. Marto and H. Le. The rise of digital advertising and its economic implications.St. Louis Fed On the Economy, Oct 2024. URLhttps://www.stlouisfed.org/on-the-economy
work page 2024
-
[76]
A. Mas-Colell, M. Whinston, and J. Green.Microeconomic Theory. Oxford University Press, 1995
work page 1995
-
[77]
M. O’Hara. High-frequency trading and its impact on markets.Financial Analysts Journal, 70 30 (3):18–27, 2014
work page 2014
-
[78]
M. S. Pinsker. Information and information stability of random variables and processes.Holden- Day, 1964
work page 1964
-
[79]
Y. Polyanskiy and Y. Wu.Information theory: From coding to learning. Cambridge university press, 2024
work page 2024
- [80]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.