pith. sign in

arxiv: 2306.04498 · v3 · submitted 2023-06-07 · 💻 cs.LG · cs.CY· cs.DC

Near-Optimal Privacy-Preserving Learning for Max-Min Fair Multi-Agent Bandits

Pith reviewed 2026-05-24 08:39 UTC · model grok-4.3

classification 💻 cs.LG cs.CYcs.DC
keywords multi-agent banditsmax-min fairnessprivacy-preserving learningdistributed algorithmscollision modelregret boundsbandit exploration
0
0 comments X

The pith

A fully distributed algorithm learns max-min fair allocations in multi-agent bandits with O(N^3 f(log T) log T) regret while keeping all reward data private to each agent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a protocol for multiple agents to coordinate on a max-min fair arm allocation in a bandit setting where they can only communicate through collisions and must not share their reward observations. This is achieved by combining local exploration with distributed mechanisms for ordering agents and testing fairness thresholds via auctions. A sympathetic reader would care because it reduces the dependence on the number of agents from exponential to cubic while maintaining privacy, making large-scale fair learning feasible without a central collector of data. The regret bound shows near-logarithmic scaling with time horizon under mild conditions on the function f.

Core claim

We propose a fully distributed algorithm for bounded rewards with unknown support, achieving regret O(N^3 f(log T)log T), where f is any nondecreasing diverging function satisfying f(k-1)/f(k)→1. The algorithm combines distributed agent ordering, cumulative round-robin exploration, endpoint-revalidated warm-started bisection, and a collision-based distributed auction for threshold-feasibility tests. No agent collects the reward observations, empirical estimates, or preferences of the others, thus preserving reward privacy while coordinating only through collision outcomes.

What carries the argument

Endpoint-revalidated warm-started bisection combined with collision-based distributed auction for threshold-feasibility tests.

If this is right

  • The regret scales polynomially with the number of agents N rather than exponentially as in prior privacy-preserving methods.
  • All reward samples and empirical estimates remain local to each agent with no sharing.
  • The protocol applies to bounded rewards with unknown support.
  • Coordination occurs solely through collision outcomes without explicit communication or a leader.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This coordination style might extend to other distributed decision problems that rely on limited feedback like collisions.
  • One could check whether alternative ordering rules reduce the cubic dependence on N.
  • Real deployments would need to verify if collision detection noise affects the stability of the bisection step.

Load-bearing premise

The concentration of cumulative empirical estimates and the stability of endpoint-revalidated bisection must hold under the collision-only model with bounded but unknown-support rewards.

What would settle it

A simulation or analysis showing that the bisection search fails to identify the correct max-min threshold for some bounded reward distributions, or that the observed regret grows faster than N^3 log T with increasing number of agents.

Figures

Figures reproduced from arXiv: 2306.04498 by Amir Leshem.

Figure 1
Figure 1. Figure 1: Convergence of the algorithm in the setup of [ [PITH_FULL_IMAGE:figures/full_fig_p014_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Evolution of the regret vs. time. N, as well as the polynomial dependence on the number of agents [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Evolution of the regret vs. number of agents. [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
read the original abstract

We study fair multi-agent multi-armed bandit learning under collision-only coordination. Agents cannot communicate explicitly during learning and observe only their own rewards and whether collisions occur when several agents access the same arm. The goal is to learn a max-min fair allocation while keeping each agent's reward samples and empirical reward estimates local. We propose a fully distributed algorithm for bounded rewards with unknown support, achieving regret $O\!\left(N^3 f(\log T)\log T\right)$, where $f$ is any nondecreasing diverging function satisfying $f(k-1)/f(k)\to 1$. The algorithm combines distributed agent ordering, cumulative round-robin exploration, endpoint-revalidated warm-started bisection, and a collision-based distributed auction for threshold-feasibility tests. Unlike leader-based optimal algorithms, no agent collects the reward observations, empirical estimates, or preferences of the others. Thus, the protocol preserves reward privacy in the operational sense of avoiding reward sharing, while coordinating only through collision outcomes. Compared with previous privacy-preserving algorithms for max--min fair bandits, which have exponential dependence on the number of agents, our method achieves polynomial $N^3$ dependence while retaining near-logarithmic dependence on $T$. The analysis uses concentration of cumulative empirical estimates and stability of endpoint-revalidated bisection. Simulations confirm the predicted scaling with horizon, number of agents, and max--min gap across representative numerical settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a fully distributed algorithm for max-min fair multi-agent multi-armed bandit learning under collision-only coordination (no explicit communication, agents observe only own rewards and collisions). For bounded rewards with unknown support, it achieves regret O(N³ f(log T) log T) where f is any nondecreasing diverging function with f(k-1)/f(k)→1. The algorithm uses distributed agent ordering, cumulative round-robin exploration, endpoint-revalidated warm-started bisection, and a collision-based distributed auction; no agent shares reward observations, empirical estimates, or preferences. The analysis relies on concentration of cumulative empirical estimates and stability of endpoint-revalidated bisection. Simulations are reported to confirm scaling with T, N, and max-min gap.

Significance. If the stated regret bound holds with the claimed polynomial-N and near-log-T dependence while preserving operational privacy (no reward sharing), the result would improve substantially on prior privacy-preserving max-min fair bandit algorithms that exhibit exponential N dependence. The combination of fully distributed coordination via collisions only and the specific technical components (cumulative exploration + warm-started bisection + auction) would constitute a meaningful technical contribution to scalable fair multi-agent learning.

major comments (2)
  1. [Abstract] Abstract: The central O(N³ f(log T) log T) regret claim rests on concentration of cumulative empirical estimates holding under the collision-only model (which induces dependence and missing observations) with unknown reward support. The abstract states that the analysis uses this concentration but provides no indication of how the unknown bounds enter the deviation probabilities or how collision-induced sample loss is bounded in the regret decomposition; this is load-bearing for both the polynomial-N and near-log-T terms.
  2. [Abstract] Abstract: The stability of endpoint-revalidated warm-started bisection for the distributed auction threshold-feasibility tests is invoked to support the regret bound, yet the abstract does not detail how stability is maintained when the reward support is unknown (i.e., no a-priori bounds for revalidation). If this stability fails to hold at the required rate, the auction component cannot guarantee the claimed coordination without reward sharing.
minor comments (2)
  1. The definition of f (nondecreasing, diverging, f(k-1)/f(k)→1) is given but no concrete examples (e.g., iterated logarithms) are supplied to illustrate the class of functions for which the bound applies.
  2. The abstract mentions simulations confirming scaling but does not specify the number of runs, error-bar construction, or data-exclusion rules used to generate the reported plots.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for identifying points where the abstract could better signal the technical handling of key components. We address each major comment below and will revise the abstract accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central O(N³ f(log T) log T) regret claim rests on concentration of cumulative empirical estimates holding under the collision-only model (which induces dependence and missing observations) with unknown reward support. The abstract states that the analysis uses this concentration but provides no indication of how the unknown bounds enter the deviation probabilities or how collision-induced sample loss is bounded in the regret decomposition; this is load-bearing for both the polynomial-N and near-log-T terms.

    Authors: We agree that the abstract would benefit from a concise indication of these elements. In the revised manuscript we will add a short clause noting that the concentration analysis bounds the effective number of collision-free samples per agent-arm pair (via a union bound over collision patterns) and handles unknown support through an initial range-estimation phase whose cost is absorbed into the f(log T) factor. The full decomposition appears in the regret analysis sections; the abstract revision will make this explicit without altering the claimed bound. revision: yes

  2. Referee: [Abstract] Abstract: The stability of endpoint-revalidated warm-started bisection for the distributed auction threshold-feasibility tests is invoked to support the regret bound, yet the abstract does not detail how stability is maintained when the reward support is unknown (i.e., no a-priori bounds for revalidation). If this stability fails to hold at the required rate, the auction component cannot guarantee the claimed coordination without reward sharing.

    Authors: We acknowledge the abstract does not currently address stability under unknown support. We will revise the abstract to state that endpoint revalidation employs adaptively estimated bounds derived from observed samples, with stability shown via a high-probability argument that the revalidation error remains controlled by the same concentration tools used elsewhere. The detailed stability proof is contained in the auction analysis; the abstract change will reference this mechanism at a high level. revision: yes

Circularity Check

0 steps flagged

No circularity: regret bound derived from algorithm components and stated concentration properties

full rationale

The provided abstract and description present the O(N^3 f(log T) log T) regret as following from the proposed distributed algorithm (agent ordering, round-robin exploration, endpoint-revalidated bisection, collision-based auction) together with two technical properties: concentration of cumulative empirical estimates and stability of endpoint-revalidated bisection. These properties are invoked as analysis assumptions rather than being defined in terms of the target bound or fitted to data. No equations or steps are shown that reduce the claimed regret to a self-referential quantity, a fitted parameter renamed as prediction, or a self-citation chain. The derivation is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard concentration inequalities for empirical averages and on the stability property of the endpoint-revalidated bisection procedure under the collision model; no free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (2)
  • standard math Concentration inequalities apply to the cumulative empirical estimates built by round-robin exploration
    Invoked for the analysis of local reward estimates
  • domain assumption Endpoint-revalidated bisection remains stable for threshold-feasibility tests
    Central to the distributed auction and fairness threshold search

pith-pipeline@v0.9.0 · 5782 in / 1409 out tokens · 22989 ms · 2026-05-24T08:39:24.651986+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 3 internal anchors

  1. [1]

    Distributed multi-player bandits-a game of thrones approach,

    I. Bistritz and A. Leshem, “Distributed multi-player bandits-a game of thrones approach,” in Advances in Neural Information Processing Systems , pp. 7222–7232, 2018

  2. [2]

    Distributed learning in multi-armed bandit with multiple players,

    K. Liu and Q. Zhao, “Distributed learning in multi-armed bandit with multiple players,” IEEE Transactions on Signal Processing , vol. 58, no. 11, pp. 5667–5681, 2010

  3. [3]

    Distributed multi-agent online learning based on global feedback,

    J. Xu, C. Tekin, S. Zhang, and M. Van Der Schaar, “Distributed multi-agent online learning based on global feedback,” IEEE Transactions on Signal Processing, vol. 63, no. 9, pp. 2225–2238, 2015

  4. [4]

    Deterministic sequencing of exploration and exploitation for multi-armed bandit problems,

    S. Vakili, K. Liu, and Q. Zhao, “Deterministic sequencing of exploration and exploitation for multi-armed bandit problems,” IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 5, pp. 759–767, 2013

  5. [5]

    Medium access in cognitive radio networks: A competitive multi-armed bandit framework,

    L. Lai, H. Jiang, and H. V. Poor, “Medium access in cognitive radio networks: A competitive multi-armed bandit framework,” in Signals, Systems and Computers, 2008 42nd Asilomar Conference on , pp. 98–102, 2008

  6. [6]

    Distributed algorithms for learning and cognitive medium access with logarithmic regret,

    A. Anandkumar, N. Michael, A. K. Tang, and A. Swami, “Distributed algorithms for learning and cognitive medium access with logarithmic regret,” 16 IEEE Journal on Selected Areas in Communications , vol. 29, no. 4, pp. 731– 745, 2011

  7. [7]

    Competing bandits in matching markets,

    L. T. Liu, H. Mania, and M. I. Jordan, “Competing bandits in matching markets,” arXiv preprint arXiv:1906.05363 , 2019

  8. [8]

    Learning in a changing world: Restless multi- armed bandit with unknown dynamics,

    H. Liu, K. Liu, and Q. Zhao, “Learning in a changing world: Restless multi- armed bandit with unknown dynamics,” IEEE Transactions on Information Theory, vol. 59, no. 3, pp. 1902–1916, 2013

  9. [9]

    Concurrent bandits and cognitive radio net- works,

    O. Avner and S. Mannor, “Concurrent bandits and cognitive radio net- works,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 66–81, 2014

  10. [10]

    On regret-optimal learning in decen- tralized multiplayer multiarmed bandits,

    N. Nayyar, D. Kalathil, and R. Jain, “On regret-optimal learning in decen- tralized multiplayer multiarmed bandits,” IEEE Trans. Control Netw. Syst. , vol. 5, no. 1, pp. 597–606, 2016

  11. [11]

    The Effect of Communication on Noncooperative Multiplayer Multi-Armed Bandit Problems

    N. Evirgen and A. Kose, “The effect of communication on noncoop- erative multiplayer multi-armed bandit problems,” in arXiv preprint arXiv:1711.01628, 2017, 2017

  12. [12]

    Learning with bandit feedback in potential games,

    J. Cohen, A. H´ eliou, and P. Mertikopoulos, “Learning with bandit feedback in potential games,” in Proceedings of the 31th International Conference on Neural Information Processing Systems, 2017

  13. [13]

    Multi-user lax communications: a multi-armed bandit approach,

    O. Avner and S. Mannor, “Multi-user lax communications: a multi-armed bandit approach,” in INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications, IEEE , pp. 1–9, 2016

  14. [14]

    Distributed learning for channel allocation over a shared spectrum,

    S. Zafaruddin, I. Bistritz, A. Leshem, and D. Niyato, “Distributed learning for channel allocation over a shared spectrum,”IEEE J. Sel. Areas Commun., vol. 37, no. 10, pp. 2337–2349, 2019

  15. [15]

    Multi-Player Bandits: A Trekking Approach

    M. K. Hanawal and S. J. Darak, “Multi-player bandits: A trekking approach,” arXiv preprint arXiv:1809.06040 , 2018

  16. [16]

    Cooperative multi-player bandit optimization,

    I. Bistritz and N. Bambos, “Cooperative multi-player bandit optimization,” Advances in Neural Information Processing Systems , vol. 33, 2020

  17. [17]

    Multi-player bandits–a musical chairs approach,

    J. Rosenski, O. Shamir, and L. Szlak, “Multi-player bandits–a musical chairs approach,” in International Conference on Machine Learning , pp. 155–163, 2016

  18. [18]

    Sic-mmab: Synchronisation involves commu- nication in multiplayer multi-armed bandits,

    E. Boursier and V. Perchet, “Sic-mmab: Synchronisation involves commu- nication in multiplayer multi-armed bandits,” in Adv Neural Inf Process Syst., pp. 12048–12057, 2019

  19. [19]

    Multi-player bandits: The adversarial case,

    P. Alatur, K. Y. Levy, and A. Krause, “Multi-player bandits: The adversarial case,” J. Mach. Learn. Res. , vol. 21, no. 77, pp. 1–23, 2020. 17

  20. [20]

    Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

    S. Bubeck, Y. Li, Y. Peres, and M. Sellke, “Non-stochastic multi-player multi-armed bandits: Optimal rate with collision information, sublinear without,” arXiv preprint arXiv:1904.12233 , 2019

  21. [21]

    Regret analysis of stochastic and nonstochastic multi-armed bandit problems,

    S. Bubeck, N. Cesa-Bianchi, et al. , “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” Foundations and Trends® in Machine Learning, vol. 5, no. 1, pp. 1–122, 2012

  22. [22]

    Fully distributed optimal channel assignment for open spectrum access,

    O. Naparstek and A. Leshem, “Fully distributed optimal channel assignment for open spectrum access,” IEEE Trans. Signal Process. , vol. 62, no. 2, pp. 283–294, 2014

  23. [23]

    Opportunistic carrier sensing for energy-efficient information retrieval in sensor networks,

    Q. Zhao and L. Tong, “Opportunistic carrier sensing for energy-efficient information retrieval in sensor networks,” EURASIP J. Wirel. Commun. Netw., vol. 2005, no. 2, pp. 1–11, 2005

  24. [24]

    Distributed learning for opti- mal spectrum access in dense device-to-device ad-hoc networks,

    T. Boyarski, W. Wang, and A. Leshem, “Distributed learning for opti- mal spectrum access in dense device-to-device ad-hoc networks,” IEEE Transactions on Signal Processing, 2023

  25. [25]

    Multi-player bandits revisited,

    L. Besson and E. Kaufmann, “Multi-player bandits revisited,” inAlgorithmic Learning Theory, pp. 56–92, 2018

  26. [26]

    Distributed learning and optimal assignment in multiplayer heterogeneous networks,

    H. Tibrewal, S. Patchala, M. K. Hanawal, and S. J. Darak, “Distributed learning and optimal assignment in multiplayer heterogeneous networks,” in IEEE INFOCOM 2019-IEEE Conference on Computer Communications , pp. 1693–1701, IEEE, 2019

  27. [27]

    Game of thrones: Fully distributed learning for multiplayer bandits,

    I. Bistritz and A. Leshem, “Game of thrones: Fully distributed learning for multiplayer bandits,” Mathematics of Operations Research, vol. 46, no. 1, pp. 159–178, 2021

  28. [28]

    Decentralized learning for multiplayer multiarmed bandits,

    D. Kalathil, N. Nayyar, and R. Jain, “Decentralized learning for multiplayer multiarmed bandits,” IEEE Transactions on Information Theory , vol. 60, no. 4, pp. 2331–2345, 2014

  29. [29]

    A practical algorithm for multiplayer bandits when arm means vary among players,

    A. Mehrabian, E. Boursier, E. Kaufmann, and V. Perchet, “A practical algorithm for multiplayer bandits when arm means vary among players,” in 23rd AISTATS, (online), pp. 1211–1221, PMLR, Aug. 2020

  30. [30]

    The bargaining problem,

    J. F. Nash Jr, “The bargaining problem,” Econometrica: Journal of the econometric society, pp. 155–162, 1950

  31. [31]

    Kubiak, Proportional optimization and fairness , vol

    W. Kubiak, Proportional optimization and fairness , vol. 127. Springer Science & Business Media, 2008

  32. [32]

    Other solutions to nash’s bargaining prob- lem,

    E. Kalai and M. Smorodinsky, “Other solutions to nash’s bargaining prob- lem,” Econometrica: Journal of the Econometric Society , pp. 513–518, 1975. 18

  33. [33]

    Properties of pareto optimal allocations of resources to activities,

    K. M. Mjelde, “Properties of pareto optimal allocations of resources to activities,” modeling, identification and control, vol. 4, no. 3, pp. 167–173, 1983

  34. [34]

    Weighted max-min resource allocation for frequency selective channels,

    E. Zehavi, A. Leshem, R. Levanda, and Z. Han, “Weighted max-min resource allocation for frequency selective channels,” IEEE transactions on signal processing, vol. 61, no. 15, pp. 3723–3732, 2013

  35. [35]

    A unified framework for max-min and min-max fairness with applications,

    B. Radunovic and J.-Y. Le Boudec, “A unified framework for max-min and min-max fairness with applications,” IEEE/ACM Transactions on networking, vol. 15, no. 5, pp. 1073–1083, 2007

  36. [36]

    An approximation algorithm for max-min fair allocation of indivisible goods,

    A. Asadpour and A. Saberi, “An approximation algorithm for max-min fair allocation of indivisible goods,” SIAM Journal on Computing , vol. 39, no. 7, pp. 2970–2989, 2010

  37. [37]

    My fair bandit: Distributed learning of max-min fairness with multi-player bandits,

    I. Bistritz, T. Baharav, A. Leshem, and N. Bambos, “My fair bandit: Distributed learning of max-min fairness with multi-player bandits,” in International Conference on Machine Learning , pp. 930–940, PMLR, 2020

  38. [38]

    One for all and all for one: Distributed learning of fair allocations with multi-player bandits,

    I. Bistritz, T. Z. Baharav, A. Leshem, and N. Bambos, “One for all and all for one: Distributed learning of fair allocations with multi-player bandits,” IEEE J. Sel. Areas Inf. Theory , vol. 2, no. 2, pp. 584–598, 2021

  39. [39]

    Multi-player multi-armed bandits for stable allocation in heterogeneous ad-hoc networks,

    S. J. Darak and M. K. Hanawal, “Multi-player multi-armed bandits for stable allocation in heterogeneous ad-hoc networks,” IEEE Journal on Selected Areas in Communications, vol. 37, no. 10, pp. 2350–2363, 2019

  40. [40]

    Individual regret in cooperative nonstochas- tic multi-armed bandits,

    Y. Bar-On and Y. Mansour, “Individual regret in cooperative nonstochas- tic multi-armed bandits,” in Adv Neural Inf Process Syst. (H. Wallach, H. Larochelle, A. Beygelzimer, F. d 'Alch´ e-Buc, E. Fox, and R. Garnett, eds.), vol. 32, pp. 3116–3126, Curran Associates, Inc., 2019

  41. [41]

    Fairness in reinforcement learning,

    S. Jabbari, M. Joseph, M. Kearns, J. Morgenstern, and A. Roth, “Fairness in reinforcement learning,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1617–1626, JMLR. org, 2017

  42. [42]

    Fairness in learn- ing: Classic and contextual bandits,

    M. Joseph, M. Kearns, J. H. Morgenstern, and A. Roth, “Fairness in learn- ing: Classic and contextual bandits,” in Advances in Neural Information Processing Systems, pp. 325–333, 2016

  43. [43]

    Group retention when using machine learning in sequential decision making: the interplay between user dynamics and fairness,

    X. Zhang, M. Khaliligarekani, C. Tekin, et al., “Group retention when using machine learning in sequential decision making: the interplay between user dynamics and fairness,” Advances in neural information processing systems , vol. 32, 2019

  44. [44]

    Fair end-to-end window-based congestion control,

    J. Mo and J. Walrand, “Fair end-to-end window-based congestion control,” IEEE/ACM Transactions on networking , no. 5, pp. 556–567, 2000. 19

  45. [45]

    Asymptotically optimal allocation of treatments in sequential experiments,

    T. L. Lai and H. Robbins, “Asymptotically optimal allocation of treatments in sequential experiments,” Design of Experiments: Ranking and Selection , pp. 127–142, 1984

  46. [46]

    The true sample complexity of identify- ing good arms,

    J. Katz-Samuels and K. Jamieson, “The true sample complexity of identify- ing good arms,” in International Conference on Artificial Intelligence and Statistics, pp. 1781–1791, 2020

  47. [47]

    Rom and M

    R. Rom and M. Sidi, Multiple access protocols: Performance and analysis . Springer Science & Business Media, 2012

  48. [48]

    A distributed algorithm for the assignment problem,

    D. P. Bertsekas, “A distributed algorithm for the assignment problem,” Lab. for Information and Decision Systems Working Paper, MIT , 1979

  49. [49]

    Expected time complexity of the auction algorithm and the push relabel algorithm for maximum bipartite matching on random graphs,

    O. Naparstek and A. Leshem, “Expected time complexity of the auction algorithm and the push relabel algorithm for maximum bipartite matching on random graphs,” Random Structures & Algorithms, vol. 48, no. 2, pp. 384– 395, 2016

  50. [50]

    An efficient cost scaling algorithm for the assignment problem,

    A. V. Goldberg and R. Kennedy, “An efficient cost scaling algorithm for the assignment problem,” Mathematical Programming, vol. 71, no. 2, pp. 153– 177, 1995

  51. [51]

    A forward/reverse auction algorithm for asymmetric assignment problems,

    D. P. Bertsekas and D. A. Castanon, “A forward/reverse auction algorithm for asymmetric assignment problems,” Computational Optimization and Applications, vol. 1, pp. 277–297, 1992

  52. [52]

    The auction algorithm: A distributed relaxation method for the assignment problem,

    D. P. Bertsekas, “The auction algorithm: A distributed relaxation method for the assignment problem,” Annals of operations research, vol. 14, no. 1, pp. 105–123, 1988. A Proof of Lemma 4.2 First we compute the probability that our overestimate of the support of one of the arm fails. Let γ = maxm,nF (Bm,n/2) (20) ˆB(K) = maxm,n ˆBm,n(K). (21) By definition...