pith. sign in

arxiv: 2604.10744 · v1 · submitted 2026-04-12 · 💻 cs.DC · cs.NI

Bipartite matching under communication constraints

Pith reviewed 2026-05-10 15:23 UTC · model grok-4.3

classification 💻 cs.DC cs.NI
keywords bipartite matchingcommunication constraintsdata center networksrandom thinningdegree-biased samplingnetwork stabilitylocal informationprobabilistic scheduling
0
0 comments X

The pith

Thinning sender reports to degree two with greedy selection creates a parameter-free algorithm that extends the stability region of data center networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Data center scheduling is cast as a bipartite matching problem in which senders can convey only limited local information to receivers. The authors develop single-round probabilistic algorithms that combine degree-biased sampling with random thinning of the reported connections. Random-graph analysis shows that degree-biased sampling improves expected matching size in sparse regimes while thinning can increase it in denser regimes. The practical contribution is an algorithm that thins every sender to exactly two reports, lets receivers pick greedily, and needs no tuning parameters; packet-level simulations driven by production traces demonstrate that this method materially widens the range of stable network loads. The result matters because centralized schedulers become infeasible at the scale of modern data centers, so effective local-information methods could support larger, more efficient operation.

Core claim

In bipartite matching under communication constraints, senders bias their random selections toward higher-degree receivers and then thin their reports to a random subset; when this thinning is fixed at degree two and receivers apply greedy selection, the resulting algorithm requires no parameter tuning and, in packet-level simulations with production traffic traces, significantly extends the network stability region while analytical guarantees hold on random graph models.

What carries the argument

Random thinning of each sender's reported connections to a fixed small degree (specifically two) followed by greedy selection at receivers.

If this is right

  • In sparse random graphs, degree-biased sampling produces a strictly higher expected matching size than earlier communication-constrained algorithms.
  • In denser random-graph regimes, deliberately thinning the set of reported connections can increase the expected number of matches.
  • Fixing the thinning threshold at two and applying greedy selection removes the need for any tunable parameters.
  • Packet-level simulations driven by real production traces show that this parameter-free algorithm enlarges the region of stable network operation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same local-information thinning framework could be tested in other resource-allocation settings such as wireless spectrum assignment or distributed task scheduling.
  • If the random-graph predictions generalize, analogous thinning rules might stabilize networks whose topologies change over time rather than remaining static.
  • Direct hardware implementation on a testbed would provide a concrete check on whether the simulated stability gains survive real queueing dynamics and control-loop delays.

Load-bearing premise

Performance guarantees derived on random graph models and trace-driven simulations transfer to the topologies and traffic patterns of actual deployed data center networks.

What would settle it

A measurement campaign on a production data center showing that the thinning-to-degree-two greedy algorithm produces no measurable extension of the stability region, or that its throughput deviates sharply from random-graph predictions, would falsify the central practical claim.

Figures

Figures reproduced from arXiv: 2604.10744 by Ayalvadi Ganesh, Gautham Bolar, Jean-Francois Chamberland, Moonmoon Mohanty, Parimal Parag, Preetam Patil.

Figure 1
Figure 1. Figure 1: Existing matching algorithm on a 6×6 system. Senders notify all feasible receivers; uniform random selection at the GRANT stage causes collisions at receivers 1 and 2, yielding only three matched pairs. with which it can feasibly connect. But in our proposed algorithm, if a sender has an excessive number of notifications to send (node 3 in our example in [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Degree-biased matching on the same 6 × 6 system. Sender 3 thins one edge (dashed) at the NOTIFY stage; degree-biased selection at the GRANT stage spreads grants across receivers, yielding six matched pairs. Definition 2 (DB(α) selection). Consider a bipartite graph G = (U, V, E) with senders U, receivers V , edges E ⊆ U × V , and a parameter α ∈ R ∪ {−∞}. The DB(α) algorithm defines a probability mass func… view at source ↗
Figure 3
Figure 3. Figure 3: Fraction of nodes matched versus mean degree for [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Fraction of nodes matched versus mean degree for [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Mean matching fraction versus exponent α in D-out random graphs. Under degree-biased selection, higher mean sender degree can counterintuitively reduce matching performance. In [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Mean matching fraction versus exponent α in D-out random graphs having ED = d with thinning for k ∈ {2, 3, 4}. Preemptive thinning to low out-degree consistently improves decentralized matching performance, with max(2) yielding the strongest results. We have depicted the effect of thinning on the size of the matching obtained by the DB(α) algorithm in [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Performance comparison of 2CGS, 1r-dcPIM, and iSLIP for IMC10 workload in leaf-spine architecture. throughput and network load are equal. Inspecting Figs. 7b and 8b suggests that the stability region extends to loads of around 0.4–0.5 for iSLIP, around 0.5 for 1r-dcPIM and around 0.6 for 2CGS, but it is hard to pinpoint the precise onset of instability. It is easier to discern in Figs. 7d and 8d, where we … view at source ↗
Figure 8
Figure 8. Figure 8: Performance comparison of 2CGS, 1r-dcPIM, and iSLIP for SGD workload in leaf-spine architecture. On the theoretical side, we established closed-form asymptotic expressions for the mean matching fraction under both uniform selection, DB(0), and greedy selection, DB(−∞), for D-out random graph models. A notable finding is that the mean matching fraction under uniform selection depends on the out-degree distr… view at source ↗
Figure 9
Figure 9. Figure 9: Total number of control messages generated by [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
read the original abstract

In modern data center networks, thousands of hosts contend for shared link capacity; the scale of these systems makes centralized scheduling impractical. This article models such scheduling as a bipartite matching problem under communication constraints: senders express interest in forming connections, and receivers respond using only locally available information. A class of single-round probabilistic matching algorithms is proposed, built on two key ideas: degree-biased sampling, in which senders use receiver degrees to inform their random selection, and random thinning, in which senders report only a random subset of their connections. Analytical performance guarantees are established for random graph models. In sparse regimes, degree-biased sampling yields a higher expected matching size than prior communication-constrained algorithms; in denser settings, a counterintuitive phenomenon emerges where deliberately restricting available connections through thinning increases the expected number of matches. Combining thinning to degree two with greedy selection produces an algorithm that requires no parameter tuning and, in packet-level simulations with production traffic traces, significantly extends the network stability region. Although motivated by data center network scheduling, the underlying framework of bipartite matching under local information constraints is portable to other resource allocation settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript models data center scheduling as bipartite matching under communication constraints, where senders and receivers operate with only local information. It proposes single-round probabilistic algorithms based on degree-biased sampling and random thinning, derives analytical performance guarantees for random bipartite graphs (higher expected matching size in sparse regimes via biased sampling; counterintuitive gains from thinning in denser regimes), and evaluates a parameter-free algorithm (thinning to degree 2 plus greedy selection) in packet-level simulations using production traffic traces, claiming it significantly extends the network stability region. The framework is positioned as portable to other resource allocation problems.

Significance. If the analytical results hold and the simulation gains are attributable to the proven random-graph properties rather than trace-specific artifacts, the work would be significant for distributed systems by offering a simple, tuning-free scheduling method that improves stability without centralization. Credit is due for the use of production traces and the explicit no-parameter-tuning design. However, significance is limited by the unbridged gap between random-graph analysis and real data-center topologies, which often feature degree correlations and locality.

major comments (2)
  1. [Simulation results section] The central claim that thinning to degree 2 with greedy selection extends the stability region rests on the analytical guarantees derived for random bipartite graphs. However, real data-center topologies exhibit correlated degrees and locality that violate the independence and uniformity assumptions used in the proofs (e.g., for expected matching size in sparse/dense regimes). Without an explicit sensitivity analysis or mapping in the simulation section, it is unclear whether the reported gains follow from the random-graph results or are artifacts of the specific traces.
  2. [Analytical guarantees section] The abstract states that analytical performance guarantees are established and that the algorithm 'significantly extends the network stability region,' yet the manuscript provides no equations, theorem statements, or proof sketches for the key claims (higher expected matching size via degree-biased sampling; thinning benefit in dense regimes). This is load-bearing because the algorithm design and the interpretation of simulation outcomes are built directly on these guarantees.
minor comments (2)
  1. [Abstract] The abstract would benefit from including at least one quantitative simulation result (e.g., the factor by which the stability region is extended or the load values tested) to allow readers to assess practical impact without reading the full text.
  2. [Model section] Notation for the thinning parameter and degree-biased probabilities should be introduced with explicit definitions early in the model section to improve readability for readers unfamiliar with the prior communication-constrained matching literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address the two major comments below, clarifying the presentation of our analytical results and the relationship between the random-graph analysis and our trace-driven simulations. We propose targeted revisions to improve clarity and strengthen the connection between theory and practice.

read point-by-point responses
  1. Referee: [Simulation results section] The central claim that thinning to degree 2 with greedy selection extends the stability region rests on the analytical guarantees derived for random bipartite graphs. However, real data-center topologies exhibit correlated degrees and locality that violate the independence and uniformity assumptions used in the proofs (e.g., for expected matching size in sparse/dense regimes). Without an explicit sensitivity analysis or mapping in the simulation section, it is unclear whether the reported gains follow from the random-graph results or are artifacts of the specific traces.

    Authors: We acknowledge that the random bipartite graph model assumes independence and uniformity that do not fully capture degree correlations or locality present in production data-center topologies. Our simulations employ packet-level traces from operational data centers, which inherently reflect these real-world features rather than purely synthetic random graphs. The observed stability-region extensions are consistent with the sparse- and dense-regime predictions from the analysis. In revision we will add a dedicated paragraph in the simulation section discussing the potential effects of topology correlations, noting that the parameter-free algorithm remains robust across the evaluated traces and that the gains align with the analytically characterized regimes rather than appearing as trace-specific artifacts. revision: partial

  2. Referee: [Analytical guarantees section] The abstract states that analytical performance guarantees are established and that the algorithm 'significantly extends the network stability region,' yet the manuscript provides no equations, theorem statements, or proof sketches for the key claims (higher expected matching size via degree-biased sampling; thinning benefit in dense regimes). This is load-bearing because the algorithm design and the interpretation of simulation outcomes are built directly on these guarantees.

    Authors: We apologize for insufficient visibility of the analytical content. The full manuscript contains Section 3 ('Analytical Performance Guarantees') with Theorem 1 (higher expected matching size under degree-biased sampling in sparse regimes) and Theorem 2 (counterintuitive benefit of random thinning in denser regimes), each accompanied by proof sketches. The stability-region claim for the parameter-free algorithm follows directly from these results. To address the concern, we will revise the abstract to include a concise statement of the two main theorems and add a short summary paragraph at the end of the introduction that explicitly links the theorems to the algorithm design and simulation interpretation. revision: yes

Circularity Check

0 steps flagged

No circularity; analysis and simulations are independent

full rationale

The paper first derives analytical guarantees for degree-biased sampling and thinning on random bipartite graphs via probabilistic methods in sparse and dense regimes. It then evaluates the untuned thinning-to-degree-2 plus greedy algorithm through separate packet-level simulations on production traffic traces. No equation or claim reduces by construction to a fitted parameter, self-definition, or self-citation chain; the random-graph theorems and trace-based results remain distinct external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Claims rest on random graph models for analysis and empirical validation via simulations; the thinning-to-degree-two choice is presented as parameter-free but selected for performance.

free parameters (1)
  • thinning degree = 2
    Fixed at two to eliminate tuning while achieving reported stability gains in simulations.
axioms (1)
  • domain assumption Bipartite graphs follow standard random models allowing analytical matching size bounds
    Invoked to establish performance guarantees for the proposed algorithms.

pith-pipeline@v0.9.0 · 5513 in / 1188 out tokens · 85348 ms · 2026-05-10T15:23:43.571248+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

  1. [1]

    Data Center TCP (DCTCP),

    M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, “Data Center TCP (DCTCP),”ACM SIGCOMM Comp. Commun. Rev., vol. 40, no. 4, pp. 63–74, Aug. 2010

  2. [2]

    Deadline-aware datacenter TCP (D2TCP),

    B. Vamanan, J. Hasan, and T. N. Vijaykumar, “Deadline-aware datacenter TCP (D2TCP),”ACM SIGCOMM Comp. Commun. Rev., vol. 42, no. 4, pp. 115–126, Oct. 2012

  3. [3]

    TCP improvements for data center networks,

    T. Das and K. M. Sivalingam, “TCP improvements for data center networks,” inInter. Conf. COM. Sys. NET. (COMSNETS), vol. 5, Jan. 2013, pp. 1–10. 18 0.3 0.4 0.5 0.6 150 200 250 300 350 Network load Control messages (million) 1r-dcPIM 2CGS iSLIP (a) IMC10 workload 0.3 0.4 0.5 0.6 200 300 400 500 600 Network load Control messages (million) 1r-dcPIM 2CGS iSLI...

  4. [4]

    Fastpass: a centralized “zero-queue

    J. Perry, A. Ousterhout, H. Balakrishnan, D. Shah, and H. Fugal, “Fastpass: a centralized “zero-queue” datacenter network,”ACM SIGCOMM Comp. Commun. Rev., vol. 44, no. 4, pp. 307–318, Oct. 2014

  5. [5]

    It’s time to replace TCP in the datacenter,

    J. Ousterhout, “It’s time to replace TCP in the datacenter,” Jan. 2023

  6. [6]

    The Hungarian method for the assignment problem,

    H. W. Kuhn, “The Hungarian method for the assignment problem,”Naval res. logist. quart., vol. 2, no. 1-2, pp. 83–97, Mar. 1955

  7. [7]

    A simple algorithm for finding maximal network flows and an application to the Hitchcock problem,

    L. R. F. Jr and D. R. Fulkerson, “A simple algorithm for finding maximal network flows and an application to the Hitchcock problem,”Canadian J. Math., vol. 9, pp. 210–218, 1957

  8. [8]

    Algorithms for the assignment and transportation problems,

    J. Munkres, “Algorithms for the assignment and transportation problems,”J. Soc. Indust. Appl. Math., vol. 5, no. 1, pp. 32–38, Mar. 1957

  9. [9]

    Ann 5/2 algorithm for maximum matchings in bipartite graphs,

    J. E. Hopcroft and R. M. Karp, “Ann 5/2 algorithm for maximum matchings in bipartite graphs,”SIAM J. Comput., vol. 2, no. 4, pp. 225–231, Dec. 1973

  10. [10]

    Economic efficiency requires interaction,

    S. Dobzinski, N. Nisan, and S. Oren, “Economic efficiency requires interaction,” inSymposium on Theory of computing (STOC). ACM, 2014, pp. 233–242

  11. [11]

    Tradeoffs between information and ordinal approximation for bipartite matching,

    E. Anshelevich and W. Zhu, “Tradeoffs between information and ordinal approximation for bipartite matching,”Theory of Computing Systems, vol. 63, no. 7, pp. 1499–1530, 2019

  12. [12]

    An auction algorithm for bipartite matching in streaming and massively parallel computation models,

    S. Assadi, S. C. Liu, and R. E. Tarjan, “An auction algorithm for bipartite matching in streaming and massively parallel computation models,” in Symposium on Simplicity in Algorithms (SOSA). SIAM, 2021, pp. 165–171

  13. [13]

    High-speed switch scheduling for local-area networks,

    T. E. Anderson, S. S. Owicki, J. B. Saxe, and C. P. Thacker, “High-speed switch scheduling for local-area networks,”ACM Trans. Comput. Sys., vol. 11, no. 4, pp. 319–352, Nov. 1993

  14. [14]

    dcPIM: near-optimal proactive datacenter transport,

    Q. Cai, M. T. Arashloo, and R. Agarwal, “dcPIM: near-optimal proactive datacenter transport,” inACM Spec. Int. Gr. Data Commun. (SIGCOMM), Aug. 2022, pp. 53–65

  15. [15]

    Rail-only: A low-cost high-performance network for training LLMs with trillion parameters,

    W. Wang, M. Ghobadi, K. Shakeri, Y . Zhang, and N. Hasani, “Rail-only: A low-cost high-performance network for training LLMs with trillion parameters,” inIEEE Symp. High-Perf. Interconn. (HOTI), Aug. 2024, pp. 1–10

  16. [16]

    Mapreduce: Simplified data processing on large clusters,

    J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,”Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008

  17. [17]

    VL2: a scalable and flexible data center network,

    A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta, “VL2: a scalable and flexible data center network,”ACM SIGCOMM Comp. Commun. Rev., vol. 39, no. 4, pp. 51–62, Oct. 2009

  18. [18]

    The hadoop distributed file system,

    K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoop distributed file system,” inSymposium on Mass Storage Systems and Technologies (MSST). IEEE, 2010, pp. 1–10

  19. [19]

    Apache spark: A unified engine for big data processing,

    M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklinet al., “Apache spark: A unified engine for big data processing,”Communications of the ACM, vol. 59, no. 11, pp. 56–65, 2016

  20. [20]

    The iSLIP scheduling algorithm for input-queued switches,

    N. McKeown, “The iSLIP scheduling algorithm for input-queued switches,”IEEE/ACM Trans. Netw., vol. 7, no. 2, pp. 188–201, Apr. 1999

  21. [21]

    Balanced allocations,

    Y . Azar, A. Z. Broder, A. R. Karlin, and E. Upfal, “Balanced allocations,”SIAM J. Comput., vol. 29, no. 1, pp. 180–200, 1999

  22. [22]

    The power of two choices in randomized load balancing,

    M. Mitzenmacher, “The power of two choices in randomized load balancing,”IEEE Trans. Parallel Distrib. Syst., vol. 12, no. 10, pp. 1094–1104, 2001

  23. [23]

    A fast and simple randomized parallel algorithm for maximal matching,

    A. Israeli and A. Itai, “A fast and simple randomized parallel algorithm for maximal matching,”Inf. Process. Lett., vol. 22, no. 2, pp. 77–80, 1986

  24. [24]

    Locality in distributed graph algorithms,

    N. Linial, “Locality in distributed graph algorithms,”SIAM J. Comput., vol. 21, no. 1, pp. 193–201, 1992

  25. [25]

    An optimal algorithm for on-line bipartite matching,

    R. M. Karp, U. V . Vazirani, and V . V . Vazirani, “An optimal algorithm for on-line bipartite matching,” inACM Symp. Theory Comput. (STOC), vol. 22, May 1990, pp. 352–358

  26. [26]

    pFabric: minimal near-optimal datacenter transport,

    M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker, “pFabric: minimal near-optimal datacenter transport,”ACM SIGCOMM Comp. Commun. Rev., vol. 43, no. 4, pp. 435–446, Aug. 2013

  27. [27]

    pHost: distributed near-optimal datacenter transport over commodity network fabric,

    P. X. Gao, A. Narayan, G. Kumar, R. Agarwal, S. Ratnasamy, and S. Shenker, “pHost: distributed near-optimal datacenter transport over commodity network fabric,” inACM Conf. Emerg. Net. Expts. Tech. (CONEXT)”, vol. 11, Dec. 2015, pp. 1–12

  28. [28]

    Re-architecting datacenter networks and stacks for low latency and high performance,

    M. Handley, C. Raiciu, A. Agache, A. V oinescu, A. W. Moore, G. Antichi, and M. W ´ojcik, “Re-architecting datacenter networks and stacks for low latency and high performance,” inACM Spec. Int. Gr. Data Commun. (SIGCOMM), Aug. 2017, pp. 29–42

  29. [29]

    Homa: a receiver-driven low-latency transport protocol using network priorities,

    B. Montazeri, Y . Li, M. Alizadeh, and J. Ousterhout, “Homa: a receiver-driven low-latency transport protocol using network priorities,” inACM Spec. Int. Gr. Data Commun. (SIGCOMM), Aug. 2018, pp. 221–235

  30. [30]

    Scheduling cells in an input-queued switch,

    N. McKeown, P. Varaiya, and J. Walrand, “Scheduling cells in an input-queued switch,”Electron. Lett., vol. 29, no. 25, pp. 2174–2175, Dec. 1993

  31. [31]

    Optimal scheduling algorithms for input-queued switches,

    D. Shah and D. Wischik, “Optimal scheduling algorithms for input-queued switches,” inIEEE Inter. Conf. Comp. Commun. (INFOCOM), vol. 25, Apr. 2006, pp. 1–11. 19

  32. [32]

    Achieving 100% throughput in an input-queued switch,

    N. McKeown, A. Mekkittikul, V . Anantharam, and J. Walrand, “Achieving 100% throughput in an input-queued switch,”IEEE Trans. Commun., vol. 47, no. 8, pp. 1260–1267, Aug. 1999

  33. [33]

    Online knapsack problem and budgeted truthful bipartite matching,

    R. Vaze, “Online knapsack problem and budgeted truthful bipartite matching,” inIEEE Inter. Conf. Comp. Commun. (INFOCOM), May 2017, pp. 1–9

  34. [34]

    Low-complexity switch scheduling algorithms: delay optimality in heavy traffic,

    P. R. Jhunjhunwala and S. T. Maguluri, “Low-complexity switch scheduling algorithms: delay optimality in heavy traffic,”IEEE/ACM Trans. Netw., vol. 30, no. 1, pp. 464–473, Feb. 2022

  35. [35]

    A scalable, commodity data center network architecture,

    M. Al-Fares, A. Loukissas, and A. Vahdat, “A scalable, commodity data center network architecture,”SIGCOMM Comput. Commun. Rev., vol. 38, no. 4, pp. 63–74, 2008

  36. [36]

    van der Hofstad,Random Graphs and Complex Networks, vol

    R. van der Hofstad,Random Graphs and Complex Networks, vol. 2. Cambridge University Press, 2024

  37. [37]

    (2024, Jul.) Yet another packet simulator

    NetSys/Y APS. (2024, Jul.) Yet another packet simulator. [Online]. Available: https://github.com/NetSys/simulator

  38. [38]

    Is advance knowledge of flow sizes a plausible assumption?

    V . Dukic, S. A. Jyothi, B. Karla ˇs, M. Owaida, C. Zhang, and A. Singla, “Is advance knowledge of flow sizes a plausible assumption?” inUSENIX Conf. Netw. Sys. Des. Impl. (NSDI), Feb. 2019, pp. 565–580

  39. [39]

    Network traffic characteristics of data centers in the wild,

    T. Benson, A. Akella, and D. A. Maltz, “Network traffic characteristics of data centers in the wild,” inACM SIGCOMM Conf. Internet Meas. (IMC), vol. 10, Nov. 2010, pp. 267–280