pith. sign in

arxiv: 2606.06882 · v1 · pith:ME7FR6E3new · submitted 2026-06-05 · 💻 cs.GT · cs.CE

Learning to Strategically Acquire Resources in Competition

Pith reviewed 2026-06-27 20:43 UTC · model grok-4.3

classification 💻 cs.GT cs.CE
keywords Bayesian Nash equilibriumresource competitionlearning dynamicsprice of anarchygame theorypartial informationfinancial trading
0
0 comments X

The pith

In competition for a divisible resource, agents share a unique Bayesian Nash equilibrium that is efficiently computable and reachable by learning from market feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models multiple agents buying shares of a costly resource over time, where price follows standard dynamics. Under partial information and a shared prior, it proves a unique Bayesian Nash equilibrium exists, can be computed efficiently, and has bounded price of anarchy. It also supplies conditions under which repeated simultaneous learning by the agents produces last-iterate convergence to that equilibrium. The results cover both complete-information cases and realistic trading or compute-acquisition settings, with simulations on financial data.

Core claim

Under partial-information with a common prior, we establish the existence, uniqueness, and efficient computability of the Bayesian Nash equilibrium (BNE), and bound the price of anarchy. Next and more generally, we consider agents with no common prior learning to act optimally given realistic market feedback from repeated interactions. We provide sufficient conditions on agents doing simultaneous learning dynamics for last-iterate convergence to the BNE.

What carries the argument

Bayesian Nash equilibrium of the resource-acquisition game under partial information and a common prior, together with last-iterate convergence conditions on simultaneous learning dynamics.

If this is right

  • Agents can compute their equilibrium strategies in polynomial time given the common prior.
  • The inefficiency of the resulting allocation, measured by price of anarchy, remains bounded.
  • Repeated play with market feedback alone suffices for convergence when the learning rules satisfy the stated conditions.
  • The same equilibrium description covers both complete-information and partial-information versions of the game.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The model could be tested on non-financial resources such as cloud compute slots to check whether observed bidding matches the predicted BNE.
  • If the common-prior assumption is relaxed further, the learning dynamics might still converge but to a different limit point whose efficiency would need separate analysis.
  • In multi-agent systems for resource sharing, the convergence result suggests that simple gradient or best-response updates can replace explicit equilibrium calculation.

Load-bearing premise

The price process in the market follows the standard dynamics model that the analysis uses but does not derive.

What would settle it

Run the proposed learning dynamics in a controlled market simulation or real trading environment and check whether acquisition strategies converge to the predicted unique BNE rather than cycling or settling elsewhere.

Figures

Figures reproduced from arXiv: 2606.06882 by Anderson Schneider, Andrew Bennett, Michael Kearns, Mirah Shi, Neil Andrew Chriss, Safwan Hossain, Yuriy Nevmyvaka.

Figure 1
Figure 1. Figure 1: Cumulative position over time for agents under the BNE. The type conditioned expected reserves [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of Algorithm 1 (over 500 rounds) to exact BNE strategies: (left) we plot the last-iterate [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Regression performance on test data with [PITH_FULL_IMAGE:figures/full_fig_p038_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Varying β over its 95% confidence interval [3.10e −7 , 3.35e −7 ] for fixed α ∗ . Plotted is the deviation from baseline position as a % of the final baseline position. On the left is the effect on the last iterate from Algorithm 1. On the right is the effect on the empirical BNE [PITH_FULL_IMAGE:figures/full_fig_p039_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Varying α over its 95% confidence interval [4.40e −7 , 4.9e −7 ] for fixed β ∗ . Plotted is the deviation from baseline position as a % of the final baseline position. On the left is the effect on the last iterate from Algorithm 1. On the right is the effect on the empirical BNE. E Comparisons and Connections to VWAP [PITH_FULL_IMAGE:figures/full_fig_p039_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Cumulative position over time for 5 agents when all are being strategic and playing Nash Equilib [PITH_FULL_IMAGE:figures/full_fig_p039_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The strategies of the 5 players when agents 2 and 3 are play VWAP and the rest play the induced [PITH_FULL_IMAGE:figures/full_fig_p040_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: On the left is the ratio of cost between playing VWAP and playing strategically for agents 2 and [PITH_FULL_IMAGE:figures/full_fig_p041_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Ratio of cumulative costs be￾tween a subset of agents playing VWAP and all agents being strategic. 41 [PITH_FULL_IMAGE:figures/full_fig_p041_10.png] view at source ↗
read the original abstract

We consider multiple agents competing to acquire some costly divisible resource (e.g. shares of a financial asset, compute resources, etc.) over time. Leveraging a standard model for price dynamics, we propose a novel game-theoretic model for this problem, generalizing settings studied in diverse literatures. Our analysis considers different assumptions on the information available to agents. Under partial-information with a common prior (which subsumes complete information as a special case), we establish the existence, uniqueness, and efficient computability of the Bayesian Nash equilibrium (BNE), and bound the price of anarchy. Next and more generally, we consider agents with no common prior learning to act optimally given realistic market feedback from repeated interactions. We provide sufficient conditions on agents doing simultaneous learning dynamics for last-iterate convergence to the BNE. For all settings, we provide simulations based on real financial data to illustrate our theoretical results and offer new insights on strategic behavior in the context of trading and resource acquisition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper proposes a game-theoretic model for agents competing over time to acquire a costly divisible resource, leveraging a standard price dynamics model. Under partial information with a common prior (including complete information as a special case), it claims to establish existence, uniqueness, and efficient computability of the Bayesian Nash equilibrium (BNE) while bounding the price of anarchy. It then considers agents without a common prior who learn from repeated market interactions, providing sufficient conditions on simultaneous learning dynamics for last-iterate convergence to the BNE. Simulations on real financial data are used to illustrate the results and strategic behavior in trading contexts.

Significance. If the central claims hold, the work bridges game-theoretic equilibrium analysis with learning dynamics in resource markets, offering both theoretical guarantees (BNE characterization, PoA bound, convergence conditions) and empirical illustrations from financial data. The explicit use of real data for simulations is a strength that supports applicability claims in settings like asset trading or compute allocation.

major comments (1)
  1. [Model section] Model section: The price-update rule is imported as a 'standard model' and 'leveraged' without derivation from the underlying market primitives (supply, demand, or agent bids). All stated results—BNE existence/uniqueness/computability and PoA bound under common prior, plus sufficient conditions for last-iterate convergence of learning dynamics—depend on this fixed process; any change in functional form or stochasticity would invalidate the equilibrium characterization and convergence guarantees.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for highlighting this modeling point. We respond to the major comment below.

read point-by-point responses
  1. Referee: [Model section] Model section: The price-update rule is imported as a 'standard model' and 'leveraged' without derivation from the underlying market primitives (supply, demand, or agent bids). All stated results—BNE existence/uniqueness/computability and PoA bound under common prior, plus sufficient conditions for last-iterate convergence of learning dynamics—depend on this fixed process; any change in functional form or stochasticity would invalidate the equilibrium characterization and convergence guarantees.

    Authors: We agree that the price-update rule is adopted as a standard model from the literature on price dynamics rather than re-derived from market primitives in the current manuscript. Our focus is on the equilibrium and learning analysis conditional on this dynamics. In the revision we will expand the model section to include a short justification relating the update rule to standard linear price-impact models based on net demand and bids, citing the relevant references. This will make the modeling assumptions and the scope of the results more explicit while leaving the technical claims unchanged. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation uses standard techniques on external price model

full rationale

The paper's BNE existence/uniqueness/computability, PoA bound, and learning convergence results are derived from game-theoretic primitives (common prior, partial information) applied to a leveraged standard price dynamics model. No quoted steps show self-definitional reduction, fitted inputs renamed as predictions, load-bearing self-citations, uniqueness imported from authors, ansatz smuggled via citation, or renaming of known results. The central claims remain independent of the paper's own fitted values or prior self-referential work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claims rest on an external standard model for price dynamics and standard assumptions from game theory; no free parameters, invented entities, or ad-hoc axioms are explicitly introduced in the provided text.

axioms (1)
  • domain assumption Standard model for price dynamics
    Leveraged to define the resource acquisition game; invoked in the model construction paragraph of the abstract.

pith-pipeline@v0.9.1-grok · 5709 in / 1234 out tokens · 14178 ms · 2026-06-27T20:43:28.241506+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 13 canonical work pages · 1 internal anchor

  1. [1]

    Journal of Political Economy , volume=

    Walras's theories of tatonnement , author=. Journal of Political Economy , volume=. 1987 , publisher=

  2. [2]

    Communications of the ACM , volume=

    Algorithmic game theory , author=. Communications of the ACM , volume=. 2010 , publisher=

  3. [3]

    2024 , note =

    Gabriele Farina , title =. 2024 , note =

  4. [4]

    Journal of the ACM (JACM) , volume=

    Intrinsic robustness of the price of anarchy , author=. Journal of the ACM (JACM) , volume=. 2015 , publisher=

  5. [5]

    2004 , publisher=

    Fair division and collective welfare , author=. 2004 , publisher=

  6. [6]

    Proceedings of the ACM Symposium on Cloud Computing , pages=

    Cloud index tracking: Enabling predictable costs in cloud spot markets , author=. Proceedings of the ACM Symposium on Cloud Computing , pages=

  7. [7]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Online fair division: A survey , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  8. [8]

    Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence , pages=

    Proportionally fair online allocation of public goods with predictions , author=. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence , pages=

  9. [9]

    1995 , publisher=

    Microeconomic theory , author=. 1995 , publisher=

  10. [10]

    1959 , publisher=

    Theory of value: An axiomatic analysis of economic equilibrium , author=. 1959 , publisher=

  11. [11]

    Econometrica: Journal of the Econometric Society , pages=

    Continuous auctions and insider trading , author=. Econometrica: Journal of the Econometric Society , pages=. 1985 , publisher=

  12. [12]

    Advances in Neural Information Processing Systems , volume=

    No-regret learning and mixed nash equilibria: They do not mix , author=. Advances in Neural Information Processing Systems , volume=

  13. [13]

    2009 , publisher=

    Variational analysis , author=. 2009 , publisher=

  14. [14]

    Journal of Statistical Mechanics: Theory and Experiment , volume=

    A gentle introduction to gradient-based optimization and variational inequalities for machine learning , author=. Journal of Statistical Mechanics: Theory and Experiment , volume=. 2024 , publisher=

  15. [15]

    1963 , publisher=

    Topological spaces , author=. 1963 , publisher=

  16. [16]

    Journal of Financial markets , volume=

    Optimal trading strategy and supply/demand dynamics , author=. Journal of Financial markets , volume=. 2013 , publisher=

  17. [17]

    Foundations and Trends® in Machine Learning , author =

    Shalev-Shwartz, Shai , title =. 2012 , issue_date =. doi:10.1561/2200000018 , journal =

  18. [18]

    2007 , issue_date =

    Hazan, Elad and Agarwal, Amit and Kale, Satyen , title =. 2007 , issue_date =. doi:10.1007/s10994-007-5016-8 , journal =

  19. [19]

    Mind the Duality Gap: Logarithmic regret algorithms for online optimization , url =

    Shalev-shwartz, Shai and Kakade, Sham M , booktitle =. Mind the Duality Gap: Logarithmic regret algorithms for online optimization , url =

  20. [20]

    2000 , publisher=

    An introduction to variational inequalities and their applications , author=. 2000 , publisher=

  21. [21]

    2003 , publisher=

    Finite-dimensional variational inequalities and complementarity problems , author=. 2003 , publisher=

  22. [22]

    Arora and E

    S. Arora and E. Hazan and S. Kale , title =. Theory of computing , volume =

  23. [23]

    No-regret learning in Bayesian games , year =

    Hartline, Jason and Syrgkanis, Vasilis and Tardos, \'. No-regret learning in Bayesian games , year =. Proceedings of the 29th International Conference on Neural Information Processing Systems - Volume 2 , pages =

  24. [24]

    2024 , issue_date =

    Jordan, Michael and Lin, Tianyi and Zhou, Zhengyuan , title =. 2024 , issue_date =. doi:10.1287/opre.2022.0446 , journal =

  25. [25]

    Journal of Risk , volume=

    Optimal execution of portfolio transactions , author=. Journal of Risk , volume=

  26. [26]

    arXiv preprint arXiv:2409.03586 , year=

    Optimal position-building strategies in competition , author=. arXiv preprint arXiv:2409.03586 , year=

  27. [27]

    arXiv preprint arXiv:2409.15459 , year=

    Position-building in competition with real-world constraints , author=. arXiv preprint arXiv:2409.15459 , year=

  28. [28]

    arXiv preprint arXiv:2410.13583 , year=

    Competitive equilibria in trading , author=. arXiv preprint arXiv:2410.13583 , year=

  29. [29]

    arXiv preprint arXiv:2501.01241 , year=

    Position building in competition is a game with incomplete information , author=. arXiv preprint arXiv:2501.01241 , year=

  30. [30]

    arXiv preprint arXiv:2502.07606 , year=

    Algorithmic Aspects of Strategic Trading , author=. arXiv preprint arXiv:2502.07606 , year=

  31. [31]

    Matecon , volume=

    The extragradient method for finding saddle points and other problems , author=. Matecon , volume=

  32. [32]

    Proceedings of the Twentieth International Conference on International Conference on Machine Learning , pages =

    Zinkevich, Martin , title =. Proceedings of the Twentieth International Conference on International Conference on Machine Learning , pages =. 2003 , isbn =

  33. [33]

    Kakade and M

    S. Kakade and M. Kearns and Y. Mansour and L. Ortiz , title =. Proceedings of the ACM Conference on Electronic Commerce , year =

  34. [34]

    Even-Dar and S

    E. Even-Dar and S. Kakade and M. Kearns and Y. Mansour , title =. Proceedings of the ACM Conference on Electronic Commerce , year =

  35. [35]

    Nevmyvaka and M

    Y. Nevmyvaka and M. Kearns and Y. Feng , title =. Proceedings of the International Conference on Machine Learning , year =

  36. [36]

    Ganchev and M

    K. Ganchev and M. Kearns and J. Wortman , title =. Communications of the ACM , year =

  37. [37]

    Almgren and N

    R. Almgren and N. Chriss , title =. Journal of Risk , year =

  38. [38]

    Gatheral , title =

    J. Gatheral , title =. Quantitative Finance , year =

  39. [39]

    Market Microstructure and High Frequency Data , year =

    J.Gatheral , title =. Market Microstructure and High Frequency Data , year =

  40. [40]

    Journal of Economic Dynamics and Control , year =

    Nikolaus Hautsch and Ruihong Huang , title =. Journal of Economic Dynamics and Control , year =

  41. [41]

    Webster , title =

    K. Webster , title =

  42. [42]

    Doyne Farmer and Fabrizio Lillo , title =

    Elia Zarinelli and Michele Treccani and J. Doyne Farmer and Fabrizio Lillo , title =. Market Microstructure and Liquidity , year =

  43. [43]

    Mézard and M

    JP Bouchaud and M. Mézard and M. Potters , title =. Quantitative Finance , year =

  44. [44]

    Wu, Y ., Akin, E

    Carlin, Bruce Ian and Lobo, Miguel Sousa and Viswanathan, S. , title =. The Journal of Finance , volume =. doi:https://doi.org/10.1111/j.1540-6261.2007.01274.x , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1540-6261.2007.01274.x , abstract =

  45. [45]

    Finance and Stochastics , volume =

    Cont, Rama and Micheli, Alessandro and Neuman, Eyal , title =. Finance and Stochastics , volume =. doi:https://doi.org/10.1007/s00780-025-00560-w , year =

  46. [46]

    Journal of Financial Economics , year =

    Sadka, Ronnie , title =. Journal of Financial Economics , year =

  47. [47]

    Market impact and trading profile of hidden orders in stock markets , author =. Phys. Rev. E , volume =. 2009 , month =. doi:10.1103/PhysRevE.80.066102 , url =

  48. [48]

    Market Microstructure and Liquidity , volume =

    Bacry, Emmanuel and Iuga, Adrian and Lasnier, Matthieu and Lehalle, Charles-Albert , title =. Market Microstructure and Liquidity , volume =. 2015 , doi =

  49. [49]

    Quantitative finance , volume=

    Fluctuations and response in financial markets: thesubtle nature ofrandom'price changes , author=. Quantitative finance , volume=. 2003 , publisher=

  50. [50]

    Handbook on Systemic Risk, Jean-Pierre Fouque, Joseph A

    Dynamical models of market impact and algorithms for order execution , author=. Handbook on Systemic Risk, Jean-Pierre Fouque, Joseph A. Langsam, eds , pages=

  51. [51]

    Quantitative Finance , volume=

    Do price trajectory data increase the efficiency of market impact estimation? , author=. Quantitative Finance , volume=. 2024 , publisher=

  52. [52]

    , title =

    Rosenthal, Robert W. , title =. International Journal of Game Theory , year =

  53. [53]

    Risk , volume=

    Direct estimation of equity market impact , author=. Risk , volume=

  54. [54]

    1998 , publisher=

    Reinforcement learning: An introduction , author=. 1998 , publisher=

  55. [55]

    Applied Mathematical Finance , volume=

    Optimal execution: A review , author=. Applied Mathematical Finance , volume=. 2022 , publisher=

  56. [56]

    Journal of Machine Learning Research , volume=

    Contextual bandits with continuous actions: Smoothing, zooming, and adapting , author=. Journal of Machine Learning Research , volume=

  57. [57]

    Operations Research , volume=

    Adaptive discretization in online reinforcement learning , author=. Operations Research , volume=. 2023 , publisher=

  58. [58]

    2007 International Joint Conference on Neural Networks , pages=

    Intrinsic dimension of a dataset: what properties does one expect? , author=. 2007 International Joint Conference on Neural Networks , pages=. 2007 , organization=

  59. [59]

    Advances in neural information processing systems , volume=

    k-NN regression adapts to local intrinsic dimension , author=. Advances in neural information processing systems , volume=

  60. [60]

    Measuring the Intrinsic Dimension of Objective Landscapes

    Measuring the intrinsic dimension of objective landscapes , author=. arXiv preprint arXiv:1804.08838 , year=

  61. [61]

    arXiv preprint arXiv:2104.08894 , year=

    The intrinsic dimension of images and its impact on learning , author=. arXiv preprint arXiv:2104.08894 , year=