pith. sign in

arxiv: 2604.20029 · v1 · submitted 2026-04-21 · 🧮 math.OC · cs.SY· eess.SY

Forward-looking evolutionary game dynamics subject to exploration cost

Pith reviewed 2026-05-10 01:28 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SY
keywords evolutionary game dynamicsmean field gamesHamilton-Jacobi-Bellman equationexploration costforward-looking behaviorpairwise comparison protocolsreplicator dynamicslogit models
0
0 comments X

The pith

Forward-looking behavior in evolutionary games is modeled by coupling dynamics to static Hamilton-Jacobi-Bellman equations as a mean field game.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends standard evolutionary game models, which rely on agents' immediate action choices, to include forward-looking decisions and the costs of exploring new strategies. Agents update actions by paying an exploration cost to maximize a utility or its relative difference under pairwise comparison protocols such as replicator and logit dynamics. This extension is achieved by linking the evolutionary game dynamics to the static Hamilton-Jacobi-Bellman equation, resulting in a mean field game where the exploration cost enters as a constraint governed by the optimal Lagrangian multiplier as relaxation parameter. The framework proves that the resulting system admits a unique solution under certain conditions and supports numerical investigation in low dimensions. A reader would care because it supplies a tractable way to analyze agent behaviors that anticipate future payoffs rather than reacting only to the present state.

Core claim

We extend classical evolutionary game dynamics based on the momentary action choices of agents by accounting for two elements: forward-looking behavior and exploration cost. We focus on pairwise comparison protocols that cover major evolutionary game dynamics, such as replicator and logit models. In the proposed mathematical framework, agents update their actions by paying a cost so that a utility or its relative difference is maximized. We show that forward-looking behavior can be modeled as a coupling between the evolutionary game dynamic and static Hamilton-Jacobi-Bellman equation: a mean field game. The exploration cost and its constraint are naturally related to these equations as a is

What carries the argument

The coupling between evolutionary game dynamics and the static Hamilton-Jacobi-Bellman equation that forms a mean field game, with the optimal Lagrangian multiplier serving as the relaxation parameter that incorporates the exploration cost constraint.

Load-bearing premise

Agents update actions by paying a cost to maximize a utility or its relative difference, with the exploration cost and constraint naturally related to the HJB equations via the optimal Lagrangian multiplier serving as relaxation parameter.

What would settle it

A concrete case satisfying the stated conditions in which the coupled mean-field system fails to possess a unique solution, or a one- or two-dimensional numerical example whose computed trajectories diverge from the predicted equilibrium.

Figures

Figures reproduced from arXiv: 2604.20029 by Hidekazu Yoshioka.

Figure 1
Figure 1. Figure 1: Computed time histories of the optimal Lagrangian multiplier    = ( t ) : (a) U1 and (b) U2 . The color legends represent the results for  = 0.150 (red),  = 0.225 (green),  = 0.300 (blue), and  = 0.375 (magenta) [PITH_FULL_IMAGE:figures/full_fig_p022_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Computed time histories of the probability densities p p x = t ( ) : (a) U1 and (b) U2 . The color legends represent the results for t = 0 (red), t =1 (green), t = 2 (blue), and t =10 (magenta) [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Computed time histories of the optimal Lagrangian multiplier    = ( t ) : (a) U1 and (b) U2 . The color legends represent the results for  =1 (red),  = 2 (green),  =10 (blue), and  =100 (magenta) [PITH_FULL_IMAGE:figures/full_fig_p022_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Computed time histories of the probability densities p p x = t ( ) : (a) BNN model and (b) replicator model. The color legends represent the results for t = 0 (red), t =1 (green), t = 2 (blue), and t =10 (magenta). The computed probability densities are close to equilibria at t =10 [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Computed time histories of the optimal Lagrangian multiplier    = ( t ) : (a) BNN model and (b) replicator model. The color legends represent the results for 2  10− = (red), 3  10− = (green), 4  10− = (blue), and 5  10− = (magenta) [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Computed time histories of the true exploration cost EE = t : (a) BNN model and (b) replicator model. The color legends represent the results for 2  10− = (red), 3  10− = (green), 4  10− = (blue), and 5  10− = (magenta) [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Computed time histories of the (a) optimal Lagrangian multiplier    = ( t ) and (b) true exploration cost EE = t for the BNN model against different values of  :  = 2 (red),  =1 (green),  = 0 (blue) [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
read the original abstract

We extend classical evolutionary game dynamics based on the momentary action choices of agents by accounting for two elements: forward-looking behavior and exploration cost. We focus on pairwise comparison protocols that cover major evolutionary game dynamics, such as replicator and logit models. In the proposed mathematical framework, agents update their actions by paying a cost so that a utility or its relative difference is maximized. We show that forward-looking behavior can be modeled as a coupling between the evolutionary game dynamic and static Hamilton-Jacobi-Bellman equation: a mean field game. The exploration cost and its constraint are naturally related to these equations as a function of the optimal Lagrangian multiplier serving as a relaxation parameter, and it is incorporated into the game as a constraint. We show that under certain conditions, our evolutionary game dynamic admits a unique solution. Finally, we computationally investigate one- and two-dimensional problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript extends classical evolutionary game dynamics based on pairwise comparison protocols (such as replicator and logit dynamics) by incorporating forward-looking behavior and exploration costs. Agents update actions by paying a cost to maximize a utility or its relative difference; the forward-looking aspect is modeled by coupling the evolutionary dynamics to a static Hamilton-Jacobi-Bellman equation, yielding a mean-field game. The exploration cost enters via the optimal Lagrangian multiplier, which serves as a relaxation parameter and is incorporated as a constraint. The central claim is that, under certain conditions, the resulting evolutionary game dynamic admits a unique solution. The authors support this with computational investigations of one- and two-dimensional problems.

Significance. If the uniqueness result can be placed on a rigorous footing with explicitly stated hypotheses, the work would usefully connect evolutionary game theory to mean-field games and allow forward-looking population dynamics to be analyzed via coupled ODE-PDE systems. The numerical examples in low dimensions provide concrete illustrations of equilibria, but the absence of detailed conditions and supporting analysis limits the immediate theoretical impact.

major comments (3)
  1. [Abstract] Abstract: the uniqueness claim for the coupled evolutionary-HJB system is asserted to hold 'under certain conditions,' yet no precise regularity, monotonicity, or convexity hypotheses on the utility function or the exploration-cost function are supplied that would guarantee uniqueness of the stationary distribution for the ODE-PDE system.
  2. [Modeling section] Modeling section: the passage from the Lagrangian-relaxed cost to uniqueness is not accompanied by a theorem or derivation that states the necessary assumptions; the abstract and modeling description assert the result without demonstrating that the stated conditions are sufficient or necessary.
  3. [Computational results] Computational results: the one- and two-dimensional numerical examples demonstrate existence of equilibria but do not test uniqueness (for instance by varying the relaxation parameter and checking whether the same initial measure can converge to distinct stationary states) nor do they verify that the static-HJB assumption remains valid once the evolutionary dynamics are permitted to be time-dependent.
minor comments (1)
  1. [Abstract] The abstract states that the exploration cost 'is naturally related' to the HJB equations via the Lagrangian multiplier; a short paragraph clarifying the precise functional dependence would aid readers who are not already familiar with the relaxation approach.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive suggestions. We address each major comment below and indicate the planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the uniqueness claim for the coupled evolutionary-HJB system is asserted to hold 'under certain conditions,' yet no precise regularity, monotonicity, or convexity hypotheses on the utility function or the exploration-cost function are supplied that would guarantee uniqueness of the stationary distribution for the ODE-PDE system.

    Authors: We agree that the abstract would benefit from explicit hypotheses. The modeling section states that the utility is continuously differentiable and the exploration cost is strictly convex; these ensure uniqueness of the stationary measure via a contraction argument on the coupled ODE-PDE system. In revision we will insert the precise conditions (C^1 utility, strong convexity of cost with modulus greater than zero) directly into the abstract. revision: yes

  2. Referee: [Modeling section] Modeling section: the passage from the Lagrangian-relaxed cost to uniqueness is not accompanied by a theorem or derivation that states the necessary assumptions; the abstract and modeling description assert the result without demonstrating that the stated conditions are sufficient or necessary.

    Authors: The derivation appears in the modeling section after the Lagrangian relaxation is introduced, but we acknowledge that a compact theorem statement would improve readability. We will add a formal theorem in the revised modeling section that lists the assumptions (Lipschitz continuity of payoffs, strict convexity of the cost, and bounded action space) and sketches the fixed-point argument establishing uniqueness. revision: yes

  3. Referee: [Computational results] Computational results: the one- and two-dimensional numerical examples demonstrate existence of equilibria but do not test uniqueness (for instance by varying the relaxation parameter and checking whether the same initial measure can converge to distinct stationary states) nor do they verify that the static-HJB assumption remains valid once the evolutionary dynamics are permitted to be time-dependent.

    Authors: The examples are meant to illustrate qualitative behavior rather than exhaustive verification. We will augment the computational section with additional runs that vary the relaxation parameter and initial measures to confirm convergence to the same stationary distribution. The static HJB is a deliberate modeling choice for the stationary mean-field equilibrium; we will add a clarifying remark that time-dependent extensions lie beyond the present scope. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper extends classical pairwise evolutionary dynamics (replicator, logit) by coupling them to a static HJB equation to represent forward-looking behavior, with exploration cost entering via the optimal Lagrangian multiplier as a relaxation parameter. This coupling is introduced as a modeling choice that produces a mean-field game, not as a quantity derived from or fitted to the target result itself. The uniqueness statement is conditioned on unspecified 'certain conditions' whose verification is left open, but the abstract and modeling description contain no self-definitional loop, no fitted parameter relabeled as a prediction, and no load-bearing self-citation that collapses the central claim. The 1D/2D numerics are presented as computational illustrations rather than the source of the existence or uniqueness assertions. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Limited to abstract; relies on standard mean field game assumptions for coupling and uniqueness.

free parameters (1)
  • Lagrangian multiplier
    Serves as relaxation parameter for the exploration cost constraint.
axioms (1)
  • domain assumption Existence of unique solution under certain conditions for the coupled evolutionary-HJB system
    Invoked to guarantee well-posedness of the forward-looking dynamic.

pith-pipeline@v0.9.0 · 5439 in / 1134 out tokens · 31115 ms · 2026-05-10T01:28:41.134627+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 41 canonical work pages

  1. [1]

    Aïd, R., Bonesini, O., Callegaro, G., & Campi, L. (2025). Continuous -time persuasion by filtering. Journal of Economic Dynamics and Control, 176, 105100. https://doi.org/10.1016/j.jedc.2025.105100

  2. [2]

    G., Li, J., Sojoudi, S., & Arcak, M

    Anderson, B. G., Li, J., Sojoudi, S., & Arcak, M. (2025). Evolutionary games on infinite strategy sets: Convergence to Nash equilibria via dissipativity. IEEE Transactions on Automatic Control, 71(1), 411 -426. https://doi.org/10.1109/TAC.2025.3591602

  3. [3]

    Antoci, A., Borghesi, S., Galdi, G., Sodini, M., & Ticci, E. (2024). Maladaptation in an unequal world: an evolutionary model with heterogeneous agents. Annals of Operations Research, 337(3), 1089 -1110. https://doi.org/10.1007/s10479-024-05863-3

  4. [4]

    Aydın, B., Parmaksız, E., & Sircar, R. (2026). Fare Game: A Mean Field Model of Stochastic Intensity Control in Dynamic Ticket Pricing. Mathematics and Financial Economics. Published online. https://doi.org/10.1007/s11579 -026-00412-x

  5. [5]

    Barbu, V., & Precupanu, T. (2012). Convexity and optimization in Banach spaces. Springer, Dordrecht

  6. [6]

    Barker, M., Degond, P., & Wolfram, M. T. (2022). Comparing the best-reply strategy and mean-field games: the stationary case. European Journal of Applied Mathematics, 33(1), 79 -110. https://doi.org/10.1017/S0956792520000376

  7. [7]

    Bellman, R. (1954). The theory of dynamic programming. Bulletin of the American Mathematical Society, 60(6), 503 -515. https://doi.org/10.1090/S0002-9904-1954-09848-8

  8. [8]

    Bellomo, N., Brezzi, F., & Fabregas, R. (2026). New frontiers of kinetic theory and active particle methods toward a mathematical theory of living systems. Mathematical Models and Methods in Applied Sciences. Published online. https://doi.org/10.1142/S0218202526020021

  9. [9]

    Brezis, H. (2011). Functional analysis, Sobolev spaces and partial differential equations, Springer, New York

  10. [10]

    Cardaliaguet, P., & Porretta, A. (2021). An introduction to mean field game theory. In Mean Field Games: Cetraro, Italy 2019 (pp. 1-158). Springer, Cham. https://doi.org/10.1007/978 -3-030-59837-2_1

  11. [11]

    Chang, Y., Firoozi, D., & Benatia, D. (2025). Large banks and systemic risk: Insights from a mean -field game model. Journal of Systems Science and Complexity, 38(1), 460 -494. https://doi.org/10.1007/s11424-025-4387-x

  12. [12]

    Cheung, M. W. (2014). Pairwise comparison dynamics for games with continuous strategy space. Journal of Economic Theory, 153, 344-375. https://doi.org/10.1016/j.jet.2014.07.001

  13. [13]

    W., & Lahkar, R

    Cheung, M. W., & Lahkar, R. (2018). Nonatomic potential games: the continuous strategy case. Games and Economic Behavior, 108, 341-362. https://doi.org/10.1016/j.geb.2017.12.004

  14. [14]

    De Pretis, F., Pompa, L., & Tortoli, D. (2026). Quantifying the population exposure to employer branding strategies using a Maxwell–Boltzmann distribution -based evolutionary model. Journal of Economic Interaction and Coordination. Published online. https://doi.org/10.1007/s11403-026-00475-5

  15. [15]

    Dutta, P., Samanta, G., & Nieto, J. J. (2026). The role of cooperation in epidemic control: insights from an SIRS model with evolutionary games. Nonlinear Dynamics, 114(5), 361. https://doi.org/10.1007/s11071 -026-12240-1

  16. [16]

    Escribe, C., Garnier, J., & Gobet, E. (2024). A mean field game model for renewable investment under long -term uncertainty and risk aversion. Dynamic Games and Applications, 14(5), 1093 -1130. https://doi.org/10.1007/s13235-024-00554-x

  17. [17]

    Ferreira, R., Gomes, D., & Tada, T. (2026). An introduction to monotonicity methods in mean -field games. In Differential and Algorithmic Intelligent Game Theory: Methods and Applications (pp. 91-132). Springer, Cham. https://doi.org/10.1007/978-3- 031-97733-6_4

  18. [20]

    Hofbauer, J., Oechssler, J., & Riedel, F. (2009). Brown –von Neumann–Nash dynamics: the continuous strategy case. Games and Economic Behavior, 65(2), 406 -429. https://doi.org/10.1016/j.geb.2008.03.006

  19. [21]

    Hofbauer, J., & Sigmund, K. (2003). Evolutionary game dynamics. Bulletin of the American mathematical society, 40(4), 479-

  20. [22]

    https://pure.iiasa.ac.at/id/eprint/7010/1/IR -03-078.pdf

  21. [23]

    Hwang, C. R. (1980). Laplace’s method revisited: weak convergence of probability measures. The Annals of Probability, 1177-

  22. [24]

    https://www.jstor.org/stable/2243019

  23. [25]

    Iijima, R., & Oyama, D. (2025). Mean -field approximation of forward -looking population dynamics. Journal of Economic Theory, 106079. https://doi.org/10.1016/j.jet.2025.106079

  24. [26]

    Jaćimović, V. (2023). The fundamental theorem of natural selection in optimization and games. Biosystems, 230, 104956. https://doi.org/10.1016/j.biosystems.2023.104956

  25. [27]

    Jaćimović, V. (2024). Natural gradient ascent in evolutionary games. Biosystems, 236, 105127. https://doi.org/10.1016/j.biosystems.2024.105127

  26. [28]

    Jia, J., Yang, L., Yang, D., & Zhang, L. (2026). Memory capacity and decision preference coshape cooperation in public goods games. Applied Mathematics and Computation, 521, 129956. https://doi.org/10.1016/j.amc.2026.129956

  27. [29]

    Jin, X., Li, H., Yu, D., Wang, Z., & Li, X. (2024). Topological optimization of continuous action iterated dilemma based on finite-time strategy using DQN. Pattern Recognition Letters, 182, 133 -139. https://doi.org/10.1016/j.patrec.2024.04.010

  28. [30]

    Lahkar, R. (2025). Evolutionary implementation with partially effective institutions. Mathematical Social Sciences, 134, 1 -13. https://doi.org/10.1016/j.mathsocsci.2024.12.004

  29. [31]

    Lahkar, R., Mukherjee, S., & Roy, S. (2022). Generalized perturbed best response dynamics with a continuum of strategies. Journal of Economic Theory, 200, 105398. https://doi.org/10.1016/j.jet.2021.105398

  30. [33]

    Lahkar, R., & Riedel, F. (2015). The logit dynamic for games with continuous strategy sets. Games and Economic Behavior, 91, 268-282. https://doi.org/10.1016/j.geb.2015.03.009

  31. [35]

    D., & Shaiju, A

    Lewis, K. D., & Shaiju, A. J. (2026). Invariance of exponential measures under the replicator dynamics on bilinear games. Annals of Operations Research. Published online. https://doi.org/10.1007/s10479 -026-07029-9

  32. [37]

    Lu, Z., Hua, S., Wang, L., & Liu, L. (2024). Hybrid reward -punishment in feedback -evolving game for common resource governance. Physical Review E, 110(3), 034301. https://doi.org/10.1103/PhysRevE.110.034301

  33. [38]

    Mendoza-Palacios, S., & Hernández-Lerma, O. (2024). Evolutionary games and the replicator dynamics. Cambridge University Press, Cambridge

  34. [39]

    Z., & Zine-Dine, K

    Moussa, F. Z., & Zine-Dine, K. (2026). The impact of adaptive trust reinforcement in a multistate Public Goods Game. Chaos, Solitons & Fractals, 204, 117725. https://doi.org/10.1016/j.chaos.2025.117725

  35. [40]

    Oechssler, J., & Riedel, F. (2002). On the dynamic foundation of evolutionary stability in continuous models. Journal of Economic Theory, 107(2), 223-252. https://doi.org/10.1006/jeth.2001.2950

  36. [41]

    Øksendal, B., & Sulem, A. (2019). Applied stochastic control of jump diffusions. Springer, Cham

  37. [42]

    Prawitz, H., Schwarz, L., & Donges, J. F. (2026). Modeling social norms in social -ecological systems: A systematic literature review. Environmental Research Letters. Published online. https://doi.org/10.1088/1748 -9326/ae3b55

  38. [43]

    Qian, J., & Zhou, Y. (2026). Environmental regulation and green technology innovation: an evolutionary game analysis between government and high energy consuming enterprises. Computational Economics, 67(2), 1259 -1289. https://doi.org/10.1007/s10614 -025-10915-2

  39. [44]

    Qu, X., Kurokawa, S., & Han, T. A. (2026). The evolution of cooperation and tolerance under conditional dissociation in cohesive population. Chaos, Solitons & Fractals, 208, 118214. https://doi.org/10.1016/j.chaos.2026.118214

  40. [45]

    Varga, T. (2025). Replicator dynamics generalized for evolutionary matrix games under time constraints. Journal of Mathematical Biology, 90(1), 6. https://doi.org/10.1007/s00285 -024-02170-0

  41. [46]

    Wang, C., & Su, Q. (2026). Public goods games on any population structure. Science Advances, 12(10), eaeb1263. https://doi.org/10.1126/sciadv.aeb1263

  42. [48]

    Xie, K., Liu, X., Wang, H., & Jiang, Y. (2023). Multiheterogeneity public goods evolutionary game on lattice. Chaos, Solitons & Fractals, 172, 113562. https://doi.org/10.1016/j.chaos.2023.113562

  43. [49]

    Yao, T., Xu, C., & Cooney, D. B. (2026). Pattern Formation in Agent -Based and PDE Models for Evolutionary Games with Payoff-Driven Motion. Bulletin of Mathematical Biology, 88(4), 46. https://doi.org/10.1007/s11538 -026-01595-6

  44. [50]

    Yin, X., & Wang, H. (2025). Dynamic portfolio choice with information -processing constraints and finite investment horizon. Economics Letters, 251, 112318. https://doi.org/10.1016/j.econlet.2025.112318

  45. [51]

    Yoshioka, H. (2025). Generalized replicator dynamics based on mean -field pairwise comparison dynamic. Mathematics and Computers in Simulation, 236, 200 -220. https://doi.org/10.1016/j.matcom.2025.04.010

  46. [52]

    Yoshioka, H., & Tsujimura, M. (2025). Generalized pairwise logit dynamic and its connection to a mean field game: theoretical and computational investigations focusing on resource management. Dynamic Games and Applications, 15(3), 789 -830. https://doi.org/10.1007/s13235-024-00569-4

  47. [54]

    A., Miehling, E., & Başar, T

    uz Zaman, M. A., Miehling, E., & Başar, T. (2023). Reinforcement learning for nonstationary discrete -time linear–quadratic mean-field games in multiple populations. Dynamic Games and Applications, 13(1), 118 -164. https://doi.org/10.1007/s13235 - 022-00448-w

  48. [55]

    Zhang, Y., Niu, Y., & Wu, T. (2020). Stochastic interest rates under rational inattention. The North American Journal of Economics and Finance, 54, 101258. https://doi.org/10.1016/j.najef.2020.101258