Choose Your Battles: Distributed Learning Over Multiple Tug of War Games

Ilai Bistritz; Nicholas Bambos; Siddharth Chandak

arxiv: 2509.20147 · v2 · submitted 2025-09-24 · 💻 cs.GT · cs.LG· cs.MA· cs.SY· eess.SY

Choose Your Battles: Distributed Learning Over Multiple Tug of War Games

Siddharth Chandak , Ilai Bistritz , Nicholas Bambos This is my paper

Pith reviewed 2026-05-18 14:03 UTC · model grok-4.3

classification 💻 cs.GT cs.LGcs.MAcs.SYeess.SY

keywords Meta Tug-of-Wardistributed learningstochastic approximationquality of servicegame theoryequilibrium convergencepower controlsensor networks

0 comments

The pith

In Meta Tug-of-War games, a distributed algorithm converges to equilibria that satisfy players' target quality-of-service rewards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

N players simultaneously engage in K Tug-of-War games, where each player selects one game per time step along with an action whose reward falls for everyone else when any participant raises theirs. The Meta Tug-of-Peace algorithm updates actions inside the chosen game by stochastic approximation and decides when to switch games using only rare single-bit messages. The paper proves that this procedure converges to an equilibrium whose long-run rewards match any feasible target quality-of-service vector. Such games capture power control, task allocation, and sensor activation, so the result supplies a lightweight distributed method for meeting performance goals without central coordination.

Core claim

We prove that in Meta-ToW games, our algorithm converges to an equilibrium that satisfies a target Quality of Service reward vector for the players. Action updates rely on a simple stochastic approximation rule while game switches use infrequent 1-bit communication.

What carries the argument

The Meta Tug-of-Peace algorithm, which performs stochastic approximation on actions within a chosen game and uses infrequent 1-bit signals to select which game to join.

If this is right

The algorithm achieves the prescribed rewards in wireless power-control settings.
It likewise satisfies targets in distributed task-allocation problems.
Sensor-network activation scenarios can use the same procedure to meet performance goals.
Convergence holds whenever the chosen quality-of-service vector lies inside the feasible region.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The low communication rate suggests the method scales to large player populations with modest overhead.
Relaxing the strict tug-of-war condition while keeping the same update rules might still yield useful convergence results.
Hardware experiments on real wireless testbeds would directly test whether the simulated convergence translates to physical channels.

Load-bearing premise

The target quality-of-service reward vector must be feasible under some joint selection of games and actions, and each individual game must be a strict tug-of-war in which one player's higher action strictly reduces rewards for all others.

What would settle it

In a two-game power-control simulation with a known feasible target reward vector, run the algorithm for a large number of steps and check whether the players' empirical average rewards approach the target vector within a small tolerance.

read the original abstract

Consider $N$ players and $K$ games taking place simultaneously. Each of these games is modeled as a Tug-of-War (ToW) game where increasing the action of one player decreases the reward for all other players. Each player participates in only one game at any given time. At each time step, a player decides the game in which they wish to participate in and the action they take in that game. Their reward depends on the actions of all players that are in the same game. This system of $K$ games is termed a 'Meta Tug-of-War' (Meta-ToW) game. These games can model scenarios such as power control, distributed task allocation, and activation in sensor networks. We propose the Meta Tug-of-Peace algorithm, a distributed algorithm where the action updates are done using a simple stochastic approximation algorithm, and the decision to switch games is made using an infrequent 1-bit communication between the players. We prove that in Meta-ToW games, our algorithm converges to an equilibrium that satisfies a target Quality of Service reward vector for the players. We then demonstrate the efficacy of our algorithm through simulations for the scenarios mentioned above.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a lightweight distributed algorithm for players to switch across multiple Tug-of-War games and converge to a target QoS vector.

read the letter

The main point is a distributed learning algorithm for a collection of Tug-of-War games that players can switch between. The goal is to reach a long-run reward vector that matches a given Quality of Service target. What is new is the Meta-ToW model with K simultaneous games and the Meta Tug-of-Peace algorithm. Players update their action in the current game with stochastic approximation and use a 1-bit communication rule to decide on switching games. This setup is meant for applications like power control in wireless systems or activation in sensor networks. The convergence to the QoS equilibrium is the central claim, and the simulations are said to support it for those cases. The algorithm is straightforward and keeps communication low, which is a practical strength. It builds directly on existing stochastic approximation methods without new primitives. The soft spots are around the proof. The abstract asserts convergence but does not provide the derivation or bounds, so it is hard to assess how tight the feasibility condition on the QoS vector is or how the strict ToW property is used. If the target is not feasible, the behavior is unclear. The switching rule is simple, which is good, but it might not handle all edge cases in larger systems. This paper is for engineers and researchers working on distributed resource allocation in competitive environments. A reader who knows stochastic approximation in games would see how the multi-game extension works and could try the algorithm in their own setting. It is worth a serious referee to go over the proof and the simulation results. I recommend sending it for peer review.

Referee Report

1 major / 1 minor

Summary. The paper models a Meta Tug-of-War (Meta-ToW) game with N players and K simultaneous Tug-of-War games, where each player selects one game and an action per time step, with rewards depending on co-participants' actions. It proposes the Meta Tug-of-Peace distributed algorithm using stochastic approximation for actions and infrequent 1-bit communication for game switches. The central claim is a proof that the algorithm converges to an equilibrium satisfying a target QoS reward vector, illustrated via simulations in power control, task allocation, and sensor networks.

Significance. If the convergence result holds with the stated assumptions on strict ToW structure and QoS feasibility, the work provides a scalable distributed learning method for multi-game competitive systems with low communication cost. This is relevant for applications in wireless networks and distributed resource allocation, where achieving specified QoS without central coordination is valuable. The simplicity of the updates is a practical strength.

major comments (1)

Abstract and convergence statement: the claim of convergence to a QoS-satisfying equilibrium is central, yet the abstract (and visible summary) provides no derivation steps, error bounds, or explicit feasibility conditions on the target vector; this makes the main result difficult to verify and requires detailed exposition of the proof, including how the stochastic approximation and switching rule interact with the Meta-ToW structure.

minor comments (1)

Abstract: the modeling assumption that each game is a strict Tug-of-War (one player's higher action strictly lowers others' rewards) could be stated more explicitly when defining Meta-ToW to clarify the equilibrium notion.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive feedback on our manuscript. We address the major comment below and have revised the manuscript to enhance the clarity and exposition of the convergence result.

read point-by-point responses

Referee: Abstract and convergence statement: the claim of convergence to a QoS-satisfying equilibrium is central, yet the abstract (and visible summary) provides no derivation steps, error bounds, or explicit feasibility conditions on the target vector; this makes the main result difficult to verify and requires detailed exposition of the proof, including how the stochastic approximation and switching rule interact with the Meta-ToW structure.

Authors: We agree that the abstract is necessarily concise and omits detailed derivation steps or error bounds. The full proof appears in the manuscript's convergence analysis section, where we first establish that the stochastic approximation updates drive each player's action to a best response within the current Tug-of-War game, and then show that the infrequent 1-bit communication rule for game selection ensures the joint process converges to a fixed point satisfying the target QoS vector. The feasibility condition requires the target vector to lie in the interior of the achievable reward set under the strict ToW structure (explicitly stated in our assumptions). To improve verifiability, we will add a high-level proof sketch to the introduction that highlights the interaction between the two update mechanisms and restate the feasibility conditions more prominently. Error bounds follow from standard stochastic approximation results under our Lipschitz and step-size assumptions. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and context describe a distributed algorithm using standard stochastic approximation for action updates and a simple switching rule based on infrequent 1-bit communication. The convergence claim is to an equilibrium satisfying a target QoS vector in Meta-ToW games, with the modeling assumptions (strict Tug-of-War structure and feasibility) stated upfront as part of the game definition rather than derived from fitted parameters or self-referential equations. No load-bearing step reduces by construction to its own inputs, no self-citation chain is invoked for uniqueness, and the derivation appears self-contained against external benchmarks like standard stochastic approximation results. This yields a normal non-finding of circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The convergence claim rests on standard stochastic approximation theory and the structural assumption that each component game is a Tug-of-War; no new entities are postulated and no parameters appear to be fitted inside the proof sketch.

axioms (2)

standard math Stochastic approximation updates converge to equilibrium under standard step-size and boundedness conditions.
Invoked implicitly for the action-update part of the algorithm.
domain assumption Each individual game satisfies the Tug-of-War property that higher action by one player strictly decreases rewards for others in the same game.
This is the modeling premise that defines the Meta-ToW setting and enables the QoS target to be reachable.

pith-pipeline@v0.9.0 · 5752 in / 1396 out tokens · 34533 ms · 2026-05-18T14:03:58.145031+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

pull” from the players than others (e.g., more energy or power). Our algorithm selects with high probability the “minimal equilibrium

Moreover, we show that the game switches happen only a finite number of times. Finally, we demonstrate in simulations the performance of our algorithms for power control, distributed task allocation and sensor activation. A. Related Work QoS is a common objective in transmission power control in wireless networks [1]–[5]. This application is a special cas...

work page
[2]

large enough

=g n(t)with probability1−φ. End which player triggered the switch. Similarly, another possible mechanism is where all players randomly decide their next game when they receive a boundary signal. Our following result gives the convergence guarantees for the MC-ToP algorithm: Theorem 3.Under assumptions 1, 2, and 3, and for 0< ρ, φ >1, the action profiles g...

work page
[3]

The curveZhang et al

cannot handle noise, [2] assumes an unbiased estimator of the inverse SINR, i.e., the inverse reward, which is unavailable to our algorithms. The curveZhang et al. (2007)++denotes the setting in [2], i.e., where Gaussian noise is added to the inverse SINR. As expected, our algorithms converge to the minimal point that satisfies the QoS requirements. Fig. ...

work page 2007
[4]

A simple distributed autonomous power control algorithm and its convergence,

G. Foschini and Z. Miljanic, “A simple distributed autonomous power control algorithm and its convergence,”IEEE Transactions on Vehicular Technology, vol. 42, no. 4, pp. 641–646, 1993

work page 1993
[5]

A stochastic approx- imation approach to the power-control problem,

H. Zhang, W. S. Wong, W. Ge, and P. E. Caines, “A stochastic approx- imation approach to the power-control problem,”IEEE Transactions on Communications, vol. 55, no. 5, pp. 878–886, 2007

work page 2007
[6]

Distributed power control in cellular commu- nication systems concerning inaccurate sinr reports,

M. Biguesh and S. Gazor, “Distributed power control in cellular commu- nication systems concerning inaccurate sinr reports,”IEEE transactions on vehicular technology, vol. 60, no. 8, pp. 3657–3666, 2011

work page 2011
[7]

Stochastic power control for cellular radio sys- tems,

S. Ulukus and R. Yates, “Stochastic power control for cellular radio sys- tems,”IEEE Transactions on Communications, vol. 46, no. 6, pp. 784– 798, 1998

work page 1998
[8]

A framework for uplink power control in cellular radio systems,

R. Yates, “A framework for uplink power control in cellular radio systems,”IEEE Journal on Selected Areas in Communications, vol. 13, no. 7, pp. 1341–1347, 1995

work page 1995
[9]

One for all and all for one: Distributed learning of fair allocations with multi-player bandits,

I. Bistritz, T. Z. Baharav, A. Leshem, and N. Bambos, “One for all and all for one: Distributed learning of fair allocations with multi-player bandits,”IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 2, pp. 584–598, 2021

work page 2021
[10]

Queue up your regrets: Achieving the dynamic capacity region of multiplayer bandits,

I. Bistritz and N. Bambos, “Queue up your regrets: Achieving the dynamic capacity region of multiplayer bandits,” inAdvances in Neural Information Processing Systems, vol. 35, Curran Associates, Inc., 2022

work page 2022
[11]

Decentralised q-learning for multi-agent markov decision processes with a satisfiability criterion,

K. P. Keval and V . S. Borkar, “Decentralised q-learning for multi-agent markov decision processes with a satisfiability criterion,” 2023

work page 2023
[12]

Optimization and learning in open multi-agent systems,

D. Deplano, N. Bastianello, M. Franceschelli, and K. H. Johansson, “Optimization and learning in open multi-agent systems,”arXiv preprint arXiv:2501.16847, 2025

work page arXiv 2025
[13]

Gamekeeper: Online learning for admission control of networked open multiagent systems,

I. Bistritz and N. Bambos, “Gamekeeper: Online learning for admission control of networked open multiagent systems,”IEEE Transactions on Automatic Control, 2024

work page 2024
[14]

Borkar,Stochastic Approximation: A Dynamical Systems Viewpoint: Second Edition

V . Borkar,Stochastic Approximation: A Dynamical Systems Viewpoint: Second Edition. Texts and Readings in Mathematics, Hindustan Book Agency, 2022

work page 2022
[15]

Monotone maps: a review,

M. W. Hirsch and H. Smith, “Monotone maps: a review,”Journal of Difference Equations and Applications, vol. 11, no. 4-5, pp. 379–398, 2005

work page 2005
[16]

Global stability for the seir model in epidemiology,

M. Y . Li and J. S. Muldowney, “Global stability for the seir model in epidemiology,”Mathematical Biosciences, vol. 125, no. 2, pp. 155–164, 1995

work page 1995
[17]

Analysis of a delayed epi- demic model with pulse vaccination and saturation incidence,

S. Gao, L. Chen, J. J. Nieto, and A. Torres, “Analysis of a delayed epi- demic model with pulse vaccination and saturation incidence,”Vaccine, vol. 24, no. 35, pp. 6037–6045, 2006

work page 2006
[18]

Dynamics of opinion forming in structurally balanced social networks,

C. Altafini, “Dynamics of opinion forming in structurally balanced social networks,”PloS one, vol. 7, no. 6, p. e38135, 2012

work page 2012
[19]

Consensus problems on networks with antagonistic interac- tions,

C. Altafini, “Consensus problems on networks with antagonistic interac- tions,”IEEE transactions on automatic control, vol. 58, no. 4, pp. 935– 946, 2012

work page 2012
[20]

Charge-based control of diffserv-like queues,

V . S. Borkar and D. Manjunath, “Charge-based control of diffserv-like queues,”Automatica, vol. 40, no. 12, pp. 2043–2057, 2004

work page 2043
[21]

Tug of peace: Distributed learn- ing for quality of service guarantees,

S. Chandak, I. Bistritz, and N. Bambos, “Tug of peace: Distributed learn- ing for quality of service guarantees,” in2023 62nd IEEE Conference on Decision and Control (CDC), pp. 2346–2351, IEEE, 2023

work page 2023
[22]

Proportional allo- cation: Simple, distributed, and diverse matching with high entropy,

S. Agrawal, M. Zadimoghaddam, and V . Mirrokni, “Proportional allo- cation: Simple, distributed, and diverse matching with high entropy,” inInternational Conference on Machine Learning, pp. 99–108, PMLR, 2018

work page 2018
[23]

Distributed multi-player bandits - a game of thrones approach,

I. Bistritz and A. Leshem, “Distributed multi-player bandits - a game of thrones approach,” inAdvances in Neural Information Processing Systems, vol. 31, Curran Associates, Inc., 2018

work page 2018
[24]

M. W. Hirsch, S. Smale, and R. L. Devaney,Differential equations, dynamical systems, and an introduction to chaos. Academic press, 2012

work page 2012
[25]

H. L. Smith,Monotone dynamical systems: an introduction to the theory of competitive and cooperative systems. No. 41, American Mathematical Soc., 2008

work page 2008
[26]

Systems of differential equations that are competitive or cooperative ii: Convergence almost everywhere,

M. W. Hirsch, “Systems of differential equations that are competitive or cooperative ii: Convergence almost everywhere,”SIAM Journal on Mathematical Analysis, vol. 16, no. 3, pp. 423–439, 1985

work page 1985
[27]

Competitive and cooperative systems: A mini-review,

M. W. Hirsch and H. L. Smith, “Competitive and cooperative systems: A mini-review,” inPositive Systems(L. Benvenuti, A. De Santis, and L. Farina, eds.), (Berlin, Heidelberg), pp. 183–190, Springer Berlin Heidelberg, 2003

work page 2003
[28]

The measure of the critical values of differentiable maps,

A. Sard, “The measure of the critical values of differentiable maps,” Bulletin of the American Mathematical Society, vol. 48, no. 12, pp. 883 – 890, 1942

work page 1942
[29]

Prospect-theoretic Q-learning,

V . S. Borkar and S. Chandak, “Prospect-theoretic Q-learning,”Systems & Control Letters, vol. 156, p. 105009, 2021

work page 2021
[30]

H. J. Kushner and G. Yin,Stochastic Approximation and Recursive Algorithms and Applications. Springer Science & Business Media, 2nd ed., 2003

work page 2003
[31]

A concentration bound for stochastic ap- proximation via Alekseev’s formula,

G. Thoppe and V . Borkar, “A concentration bound for stochastic ap- proximation via Alekseev’s formula,”Stochastic Systems, vol. 9, no. 1, pp. 1–26, 2019

work page 2019
[32]

Breiman,Probability

L. Breiman,Probability. Addison-Wesley series in statistics, Addison- Wesley Publishing Company, 1968

work page 1968
[33]

A comparison of snr estimation techniques for the awgn channel,

D. R. Pauluzzi and N. C. Beaulieu, “A comparison of snr estimation techniques for the awgn channel,”IEEE Transactions on communica- tions, vol. 48, no. 10, pp. 1681–1691, 2000

work page 2000
[34]

Distributed task alloca- tion for self-interested agents with partially unknown rewards,

N. Mandal, M. Khajenejad, and S. Mart ´ınez, “Distributed task alloca- tion for self-interested agents with partially unknown rewards,”IEEE Transactions on Automatic Control, 2025

work page 2025
[35]

Distributed greedy algorithm for multi- agent task assignment problem with submodular utility functions,

G. Qu, D. Brown, and N. Li, “Distributed greedy algorithm for multi- agent task assignment problem with submodular utility functions,” Automatica, vol. 105, pp. 206–215, 2019

work page 2019
[36]

Stochastic sensor activation for distributed state estimation over a sensor network,

W. Yang, G. Chen, X. Wang, and L. Shi, “Stochastic sensor activation for distributed state estimation over a sensor network,”Automatica, vol. 50, no. 8, pp. 2070–2076, 2014

work page 2070
[37]

Data collection for widely distributed mass of sensors,

A. Hideg, L. Bl ´azovics, K. Csorba, and M. Gotzy, “Data collection for widely distributed mass of sensors,” in2016 7th IEEE International Con- ference on Cognitive Infocommunications (CogInfoCom), pp. 000193– 000198, 2016

work page 2016

[1] [1]

pull” from the players than others (e.g., more energy or power). Our algorithm selects with high probability the “minimal equilibrium

Moreover, we show that the game switches happen only a finite number of times. Finally, we demonstrate in simulations the performance of our algorithms for power control, distributed task allocation and sensor activation. A. Related Work QoS is a common objective in transmission power control in wireless networks [1]–[5]. This application is a special cas...

work page

[2] [2]

large enough

=g n(t)with probability1−φ. End which player triggered the switch. Similarly, another possible mechanism is where all players randomly decide their next game when they receive a boundary signal. Our following result gives the convergence guarantees for the MC-ToP algorithm: Theorem 3.Under assumptions 1, 2, and 3, and for 0< ρ, φ >1, the action profiles g...

work page

[3] [3]

The curveZhang et al

cannot handle noise, [2] assumes an unbiased estimator of the inverse SINR, i.e., the inverse reward, which is unavailable to our algorithms. The curveZhang et al. (2007)++denotes the setting in [2], i.e., where Gaussian noise is added to the inverse SINR. As expected, our algorithms converge to the minimal point that satisfies the QoS requirements. Fig. ...

work page 2007

[4] [4]

A simple distributed autonomous power control algorithm and its convergence,

G. Foschini and Z. Miljanic, “A simple distributed autonomous power control algorithm and its convergence,”IEEE Transactions on Vehicular Technology, vol. 42, no. 4, pp. 641–646, 1993

work page 1993

[5] [5]

A stochastic approx- imation approach to the power-control problem,

H. Zhang, W. S. Wong, W. Ge, and P. E. Caines, “A stochastic approx- imation approach to the power-control problem,”IEEE Transactions on Communications, vol. 55, no. 5, pp. 878–886, 2007

work page 2007

[6] [6]

Distributed power control in cellular commu- nication systems concerning inaccurate sinr reports,

M. Biguesh and S. Gazor, “Distributed power control in cellular commu- nication systems concerning inaccurate sinr reports,”IEEE transactions on vehicular technology, vol. 60, no. 8, pp. 3657–3666, 2011

work page 2011

[7] [7]

Stochastic power control for cellular radio sys- tems,

S. Ulukus and R. Yates, “Stochastic power control for cellular radio sys- tems,”IEEE Transactions on Communications, vol. 46, no. 6, pp. 784– 798, 1998

work page 1998

[8] [8]

A framework for uplink power control in cellular radio systems,

R. Yates, “A framework for uplink power control in cellular radio systems,”IEEE Journal on Selected Areas in Communications, vol. 13, no. 7, pp. 1341–1347, 1995

work page 1995

[9] [9]

One for all and all for one: Distributed learning of fair allocations with multi-player bandits,

I. Bistritz, T. Z. Baharav, A. Leshem, and N. Bambos, “One for all and all for one: Distributed learning of fair allocations with multi-player bandits,”IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 2, pp. 584–598, 2021

work page 2021

[10] [10]

Queue up your regrets: Achieving the dynamic capacity region of multiplayer bandits,

I. Bistritz and N. Bambos, “Queue up your regrets: Achieving the dynamic capacity region of multiplayer bandits,” inAdvances in Neural Information Processing Systems, vol. 35, Curran Associates, Inc., 2022

work page 2022

[11] [11]

Decentralised q-learning for multi-agent markov decision processes with a satisfiability criterion,

K. P. Keval and V . S. Borkar, “Decentralised q-learning for multi-agent markov decision processes with a satisfiability criterion,” 2023

work page 2023

[12] [12]

Optimization and learning in open multi-agent systems,

D. Deplano, N. Bastianello, M. Franceschelli, and K. H. Johansson, “Optimization and learning in open multi-agent systems,”arXiv preprint arXiv:2501.16847, 2025

work page arXiv 2025

[13] [13]

Gamekeeper: Online learning for admission control of networked open multiagent systems,

I. Bistritz and N. Bambos, “Gamekeeper: Online learning for admission control of networked open multiagent systems,”IEEE Transactions on Automatic Control, 2024

work page 2024

[14] [14]

Borkar,Stochastic Approximation: A Dynamical Systems Viewpoint: Second Edition

V . Borkar,Stochastic Approximation: A Dynamical Systems Viewpoint: Second Edition. Texts and Readings in Mathematics, Hindustan Book Agency, 2022

work page 2022

[15] [15]

Monotone maps: a review,

M. W. Hirsch and H. Smith, “Monotone maps: a review,”Journal of Difference Equations and Applications, vol. 11, no. 4-5, pp. 379–398, 2005

work page 2005

[16] [16]

Global stability for the seir model in epidemiology,

M. Y . Li and J. S. Muldowney, “Global stability for the seir model in epidemiology,”Mathematical Biosciences, vol. 125, no. 2, pp. 155–164, 1995

work page 1995

[17] [17]

Analysis of a delayed epi- demic model with pulse vaccination and saturation incidence,

S. Gao, L. Chen, J. J. Nieto, and A. Torres, “Analysis of a delayed epi- demic model with pulse vaccination and saturation incidence,”Vaccine, vol. 24, no. 35, pp. 6037–6045, 2006

work page 2006

[18] [18]

Dynamics of opinion forming in structurally balanced social networks,

C. Altafini, “Dynamics of opinion forming in structurally balanced social networks,”PloS one, vol. 7, no. 6, p. e38135, 2012

work page 2012

[19] [19]

Consensus problems on networks with antagonistic interac- tions,

C. Altafini, “Consensus problems on networks with antagonistic interac- tions,”IEEE transactions on automatic control, vol. 58, no. 4, pp. 935– 946, 2012

work page 2012

[20] [20]

Charge-based control of diffserv-like queues,

V . S. Borkar and D. Manjunath, “Charge-based control of diffserv-like queues,”Automatica, vol. 40, no. 12, pp. 2043–2057, 2004

work page 2043

[21] [21]

Tug of peace: Distributed learn- ing for quality of service guarantees,

S. Chandak, I. Bistritz, and N. Bambos, “Tug of peace: Distributed learn- ing for quality of service guarantees,” in2023 62nd IEEE Conference on Decision and Control (CDC), pp. 2346–2351, IEEE, 2023

work page 2023

[22] [22]

Proportional allo- cation: Simple, distributed, and diverse matching with high entropy,

S. Agrawal, M. Zadimoghaddam, and V . Mirrokni, “Proportional allo- cation: Simple, distributed, and diverse matching with high entropy,” inInternational Conference on Machine Learning, pp. 99–108, PMLR, 2018

work page 2018

[23] [23]

Distributed multi-player bandits - a game of thrones approach,

I. Bistritz and A. Leshem, “Distributed multi-player bandits - a game of thrones approach,” inAdvances in Neural Information Processing Systems, vol. 31, Curran Associates, Inc., 2018

work page 2018

[24] [24]

M. W. Hirsch, S. Smale, and R. L. Devaney,Differential equations, dynamical systems, and an introduction to chaos. Academic press, 2012

work page 2012

[25] [25]

H. L. Smith,Monotone dynamical systems: an introduction to the theory of competitive and cooperative systems. No. 41, American Mathematical Soc., 2008

work page 2008

[26] [26]

Systems of differential equations that are competitive or cooperative ii: Convergence almost everywhere,

M. W. Hirsch, “Systems of differential equations that are competitive or cooperative ii: Convergence almost everywhere,”SIAM Journal on Mathematical Analysis, vol. 16, no. 3, pp. 423–439, 1985

work page 1985

[27] [27]

Competitive and cooperative systems: A mini-review,

M. W. Hirsch and H. L. Smith, “Competitive and cooperative systems: A mini-review,” inPositive Systems(L. Benvenuti, A. De Santis, and L. Farina, eds.), (Berlin, Heidelberg), pp. 183–190, Springer Berlin Heidelberg, 2003

work page 2003

[28] [28]

The measure of the critical values of differentiable maps,

A. Sard, “The measure of the critical values of differentiable maps,” Bulletin of the American Mathematical Society, vol. 48, no. 12, pp. 883 – 890, 1942

work page 1942

[29] [29]

Prospect-theoretic Q-learning,

V . S. Borkar and S. Chandak, “Prospect-theoretic Q-learning,”Systems & Control Letters, vol. 156, p. 105009, 2021

work page 2021

[30] [30]

H. J. Kushner and G. Yin,Stochastic Approximation and Recursive Algorithms and Applications. Springer Science & Business Media, 2nd ed., 2003

work page 2003

[31] [31]

A concentration bound for stochastic ap- proximation via Alekseev’s formula,

G. Thoppe and V . Borkar, “A concentration bound for stochastic ap- proximation via Alekseev’s formula,”Stochastic Systems, vol. 9, no. 1, pp. 1–26, 2019

work page 2019

[32] [32]

Breiman,Probability

L. Breiman,Probability. Addison-Wesley series in statistics, Addison- Wesley Publishing Company, 1968

work page 1968

[33] [33]

A comparison of snr estimation techniques for the awgn channel,

D. R. Pauluzzi and N. C. Beaulieu, “A comparison of snr estimation techniques for the awgn channel,”IEEE Transactions on communica- tions, vol. 48, no. 10, pp. 1681–1691, 2000

work page 2000

[34] [34]

Distributed task alloca- tion for self-interested agents with partially unknown rewards,

N. Mandal, M. Khajenejad, and S. Mart ´ınez, “Distributed task alloca- tion for self-interested agents with partially unknown rewards,”IEEE Transactions on Automatic Control, 2025

work page 2025

[35] [35]

Distributed greedy algorithm for multi- agent task assignment problem with submodular utility functions,

G. Qu, D. Brown, and N. Li, “Distributed greedy algorithm for multi- agent task assignment problem with submodular utility functions,” Automatica, vol. 105, pp. 206–215, 2019

work page 2019

[36] [36]

Stochastic sensor activation for distributed state estimation over a sensor network,

W. Yang, G. Chen, X. Wang, and L. Shi, “Stochastic sensor activation for distributed state estimation over a sensor network,”Automatica, vol. 50, no. 8, pp. 2070–2076, 2014

work page 2070

[37] [37]

Data collection for widely distributed mass of sensors,

A. Hideg, L. Bl ´azovics, K. Csorba, and M. Gotzy, “Data collection for widely distributed mass of sensors,” in2016 7th IEEE International Con- ference on Cognitive Infocommunications (CogInfoCom), pp. 000193– 000198, 2016

work page 2016