Space Efficient Approximation to Maximum Matching Size from Uniform Edge Samples
Pith reviewed 2026-05-24 22:26 UTC · model grok-4.3
The pith
Constant-factor approximation to maximum matching size requires nearly linear edge samples but only O(log squared n) bits of space.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
While m^{1-o(1)} uniform edge samples are necessary to obtain a constant-factor approximation to the maximum matching size, such an approximation can be computed in O(log^2 n) bits of space by simulating a peeling-type matching algorithm via a recursive sampling process that supplies higher sampling rates to dense regions and preserves precision over logarithmic recursion depth.
What carries the argument
Recursive sampling process that simulates a peeling algorithm for matching by balancing exploration depth against locally adjusted sampling rates.
If this is right
- The algorithm yields a constant-factor approximate local computation algorithm for matching that explores only O(d log n) edges from any vertex.
- Prior local simulations of randomized greedy require Omega(d squared) time in the worst case even for moderate degree d.
- The previous best sample-based result achieved only a polylogarithmic approximation factor.
- Space-efficient processing of the samples is possible even though the sample complexity itself is nearly linear in m.
Where Pith is reading between the lines
- The recursive sampling idea may apply to other local-density graph problems where uniform sampling wastes effort on sparse regions.
- It is unclear whether the same space bound extends to weighted matching or to hypergraph matching.
- The separation between sample complexity and space complexity suggests that streaming algorithms for matching could be designed with similar recursive filtering steps.
Load-bearing premise
The recursive sampling process can be tuned so that local neighborhood information from dense regions is supplied at appropriately higher rates without losing constant-factor precision over a logarithmic number of recursion levels.
What would settle it
A family of graphs in which every algorithm that uses o(m^{1-o(1)}) samples fails to return a constant-factor approximation, or a concrete execution trace of the recursive sampler that loses the constant factor after a few recursion levels on some input.
Figures
read the original abstract
Given a source of iid samples of edges of an input graph $G$ with $n$ vertices and $m$ edges, how many samples does one need to compute a constant factor approximation to the maximum matching size in $G$? Moreover, is it possible to obtain such an estimate in a small amount of space? We show that, on the one hand, this problem cannot be solved using a nontrivially sublinear (in $m$) number of samples: $m^{1-o(1)}$ samples are needed. On the other hand, a surprisingly space efficient algorithm for processing the samples exists: $O(\log^2 n)$ bits of space suffice to compute an estimate. Our main technical tool is a new peeling type algorithm for matching that we simulate using a recursive sampling process that crucially ensures that local neighborhood information from `dense' regions of the graph is provided at appropriately higher sampling rates. We show that a delicate balance between exploration depth and sampling rate allows our simulation to not lose precision over a logarithmic number of levels of recursion and achieve a constant factor approximation. The previous best result on matching size estimation from random samples was a $\log^{O(1)} n$ approximation [Kapralov et al'14]. Our algorithm also yields a constant factor approximate local computation algorithm (LCA) for matching with $O(d\log n)$ exploration starting from any vertex. Previous approaches were based on local simulations of randomized greedy, which take $O(d)$ time {\em in expectation over the starting vertex or edge} (Yoshida et al'09, Onak et al'12), and could not achieve a better than $d^2$ runtime. Interestingly, we also show that unlike our algorithm, the local simulation of randomized greedy that is the basis of the most efficient prior results does take $\wt{\Omega}(d^2)\gg O(d\log n)$ time for a worst case edge even for $d=\exp(\Theta(\sqrt{\log n}))$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that approximating the maximum matching size to a constant factor from uniform iid edge samples requires m^{1-o(1)} samples (lower bound), yet can be achieved using only O(log² n) bits of space via a new peeling algorithm simulated by a recursive sampling process that supplies higher rates to dense regions. The same technique yields an LCA for matching with O(d log n) exploration from any vertex, improving on prior log^{O(1)} n approximations and O(d²) local simulation runtimes.
Significance. If the central claims hold, the work gives nearly tight sample complexity for this estimation problem and demonstrates that constant-factor matching-size approximation is possible with sub-polylog space, which is surprising. The recursive-simulation technique for peeling and the resulting LCA are technically interesting contributions that could influence streaming and local computation models. The lower and upper bounds appear independent, as noted in the reader report.
major comments (1)
- [main technical tool / recursive sampling analysis] The central technical claim (abstract and main technical tool section) is that the recursive sampling process can be tuned so that 'a delicate balance between exploration depth and sampling rate allows our simulation to not lose precision over a logarithmic number of levels.' No explicit error-propagation lemma, induction hypothesis on the per-level multiplicative approximation factor, or analysis of accumulation on nested dense subgraphs is visible from the abstract; if the per-level factor is only (1+ε) with ε independent of local density, the guarantee can degrade after Θ(log n) levels. This is load-bearing for the constant-factor upper bound.
minor comments (2)
- The abstract states the previous best was a log^{O(1)} n approximation [Kapralov et al'14]; ensure the full citation appears in the bibliography and that the improvement is quantified in the introduction.
- The LCA runtime claim contrasts O(d log n) with prior O(d) expected or d² worst-case; a brief comparison table or explicit statement of the prior bounds would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for identifying the need for explicit clarification on the error analysis of the recursive sampling process. We address the major comment below.
read point-by-point responses
-
Referee: [main technical tool / recursive sampling analysis] The central technical claim (abstract and main technical tool section) is that the recursive sampling process can be tuned so that 'a delicate balance between exploration depth and sampling rate allows our simulation to not lose precision over a logarithmic number of levels.' No explicit error-propagation lemma, induction hypothesis on the per-level multiplicative approximation factor, or analysis of accumulation on nested dense subgraphs is visible from the abstract; if the per-level factor is only (1+ε) with ε independent of local density, the guarantee can degrade after Θ(log n) levels. This is load-bearing for the constant-factor upper bound.
Authors: We thank the referee for this observation. The full manuscript (Section 3) contains an explicit error-propagation analysis. Lemma 3.5 states an induction hypothesis on the per-level multiplicative factor: at recursion depth i the sampling rate is boosted by a factor polynomial in the estimated local density, yielding a per-level error of (1 + O(ε / log n)) where ε is a global constant. The induction accounts for accumulation over nested dense subgraphs by re-estimating density at each level before recursing, ensuring the product over O(log n) levels remains a fixed constant (independent of n). The abstract summarizes the outcome rather than the full lemma; we are happy to add an explicit pointer to Lemma 3.5 in the abstract or introduction. revision: partial
Circularity Check
No significant circularity; lower and upper bounds independent
full rationale
The paper states an information-theoretic lower bound (m^{1-o(1)} samples required) and a separate constructive upper bound (O(log^2 n) space via new recursive sampling simulation of a peeling algorithm). The abstract describes the sampling-rate tuning for dense regions as a novel technical contribution that preserves constant-factor precision over log n levels, without any quoted reduction of the approximation guarantee to a fitted parameter, self-citation chain, or definitional equivalence. The cited prior result (Kapralov et al'14) is used only for comparison and is not load-bearing for the new constant-factor claim. No equations or self-definitional steps are exhibited that collapse the claimed result to its inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Towards a unified theory of sparsification for matching problems
[AB19] Sepehr Assadi and Aaron Bernstein. Towards a unified theory of sparsification for matching problems. In Fineman and Mitzenmacher [FM18], pages 11:1–11:20. [ABB+19] Sepehr Assadi, MohammadHossein Bateni, Aaron Bernstein, Vahab S. Mirrokni, and Cliff Stein. Coresets meet EDCS: algorithms for matching and vertex cover on massive graphs. In Timothy M. Cha...
work page 2019
-
[2]
[AG11a] Kook Jin Ahn and Sudipto Guha. Laminar families and metric embeddings: Non-bipartite maximum matching problem in the semi-streaming model. arXiv preprint arXiv:1104.4058 ,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Randomized composable coresets for matching and vertex cover
[AK17] Sepehr Assadi and Sanjeev Khanna. Randomized composable coresets for matching and vertex cover. In Christian Scheideler and Mohammad Taghi Hajiaghayi, editors, Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2017, Washington DC, USA, July 24-26, 2017 , pages 3–12. ACM,
work page 2017
-
[4]
[BGY18] Paul Beame, Shayan Oveis Gharan, and Xin Yang. Time-space tradeoffs for learning finite functions from random evaluations, with applications to polynomials. In S´ ebastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors, Conference On Learning Theory, COLT 2018, Stockholm, Sweden, 6-9 July
work page 2018
-
[5]
Sublinear estimation of weighted matchings in dynamic data streams
58 [BS15] Marc Bury and Chris Schwiegelshohn. Sublinear estimation of weighted matchings in dynamic data streams. In Nikhil Bansal and Irene Finocchi, editors, Algorithms - ESA 2015 - 23rd Annual European Symposium, Patras, Greece, September 14-16, 2015, Proceedings , volume 9294 of Lecture Notes in Computer Science , pages 263–274. Springer,
work page 2015
-
[6]
[CJMM17] Graham Cormode, Hossein Jowhari, Morteza Monemizadeh, and S. Muthukrishnan. The sparse awakens: Streaming algorithms for matching size estimation in sparse graphs. In Kirk Pruhs and Christian Sohler, editors, 25th Annual European Symposium on Algorithms, ESA 2017, September 4-6, 2017, Vienna, Austria , volume 87 of LIPIcs, pages 29:1–29:15. Schlo...
work page 2017
-
[7]
Streaming algorithms for estimating the matching size in planar graphs and beyond
[EHL+15] Hossein Esfandiari, Mohammad Taghi Hajiaghayi, Vahid Liaghat, Morteza Monemizadeh, and Krzysztof Onak. Streaming algorithms for estimating the matching size in planar graphs and beyond. In Piotr Indyk, editor, Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015 , pages...
work page 2015
-
[8]
Finding large matchings in semi-streaming
[EHM16] Hossein Esfandiari, MohammadTaghi Hajiaghayi, and Morteza Monemizadeh. Finding large matchings in semi-streaming. In Carlotta Domeniconi, Francesco Gullo, Francesco Bonchi, Josep Domingo-Ferrer, Ricardo A. Baeza-Yates, Zhi-Hua Zhou, and Xindong Wu, editors, IEEE International Conference on Data Mining Workshops, ICDM Workshops 2016, December 12-15...
work page 2016
-
[9]
Fineman and Michael Mitzenmacher, editors
[FM18] Jeremy T. Fineman and Michael Mitzenmacher, editors. 2nd Symposium on Simplicity in Algorithms, SOSA@SODA 2019, January 8-9, 2019 - San Diego, CA, USA , volume 69 of OASICS. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik,
work page 2019
-
[10]
On the communication and streaming complexity of maximum bipartite matching
[GKK12] Ashish Goel, Michael Kapralov, and Sanjeev Khanna. On the communication and streaming complexity of maximum bipartite matching. In Yuval Rabani, editor, Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012 , pages 468–485. SIAM,
work page 2012
-
[11]
Weighted Matchings via Unweighted Augmentations
[GKMS18] Buddhima Gamlath, Sagar Kale, Slobodan Mitrovi´ c, and Ola Svensson. Weighted matchings via unweighted augmentations. arXiv preprint arXiv:1811.02760 ,
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
Extractor-based time-space lower bounds for learning
59 [GRT18] Sumegha Garg, Ran Raz, and Avishay Tal. Extractor-based time-space lower bounds for learning. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors, Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018 , pages 990–1002. ACM,
work page 2018
-
[13]
[Kle17] Philip N. Klein, editor. Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19 . SIAM,
work page 2017
-
[14]
[Kra16] Robert Krauthgamer, editor. Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016 . SIAM,
work page 2016
-
[15]
Time-space hardness of learning sparse parities
[KRT17] Gillat Kol, Ran Raz, and Avishay Tal. Time-space hardness of learning sparse parities. In Hamed Hatami, Pierre McKenzie, and Valerie King, editors, Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017 , pages 1067–1080. ACM,
work page 2017
-
[16]
Muthukrishnan, Pan Peng, and Christian Sohler
[MMPS17] Morteza Monemizadeh, S. Muthukrishnan, Pan Peng, and Christian Sohler. Testable bounded degree graph properties are random order streamable. In Ioannis Chatzigiannakis, Piotr Indyk, Fabian Kuhn, and Anca Muscholl, editors, 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017, July 10-14, 2017, Warsaw, Poland , volume ...
work page 2017
-
[17]
Planar matching in streams revisited
[MV16] Andrew McGregor and Sofya Vorotnikova. Planar matching in streams revisited. In Klaus Jansen, Claire Mathieu, Jos´ e D. P. Rolim, and Chris Umans, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2016, September 7-9, 2016, Paris, France , volume 60 of LIPIcs, pages 17:1–17:12. Schloss D...
work page 2016
-
[18]
A simple, space-efficient, streaming algorithm for matchings in low arboricity graphs
[MV18] Andrew McGregor and Sofya Vorotnikova. A simple, space-efficient, streaming algorithm for matchings in low arboricity graphs. In Raimund Seidel, editor, 1st Symposium on Simplicity in Algorithms, SOSA 2018, January 7-10, 2018, New Orleans, LA, USA , volume 61 of OASICS, pages 14:1–14:4. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik,
work page 2018
-
[19]
Pseudorandom generators for space-bounded computation
[Nis90] Noam Nisan. Pseudorandom generators for space-bounded computation. In Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, May 13-17, 1990, Baltimore, Maryland, USA, pages 204–212,
work page 1990
-
[20]
Constant-time approximation algorithms via local improvements
[NO08] Huy N Nguyen and Krzysztof Onak. Constant-time approximation algorithms via local improvements. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science , pages 327–336. IEEE,
work page 2008
-
[21]
A (2+ ϵ)-approximation for maximum weight matching in the semi-streaming model
[PS17] Ami Paz and Gregory Schwartzman. A (2+ ϵ)-approximation for maximum weight matching in the semi-streaming model. In Klein [Kle17], pages 2153–2161. [PS18] Pan Peng and Christian Sohler. Estimating graph parameters from random order streams. In Artur Czumaj, editor, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SO...
work page 2018
-
[22]
Fast learning requires good memory: A time-space lower bound for parity learning
[Raz16] Ran Raz. Fast learning requires good memory: A time-space lower bound for parity learning. In Irit Dinur, editor, IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA , pages 266–275. IEEE Computer Society,
work page 2016
-
[23]
A time-space lower bound for a large class of learning problems
[Raz17] Ran Raz. A time-space lower bound for a large class of learning problems. In Chris Umans, editor, 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017 , pages 732–742. IEEE Computer Society,
work page 2017
-
[24]
Fast Local Computation Algorithms
[RTVX11] Ronitt Rubinfeld, Gil Tamir, Shai Vardi, and Ning Xie. Fast local computation algorithms. arXiv preprint arXiv:1104.1377 ,
work page internal anchor Pith review Pith/arXiv arXiv
-
[25]
(62) implies P [E3] ≤ cp exp ( − c 8 )
This bound together with Eq. (62) implies P [E3] ≤ cp exp ( − c 8 ) . (63) 66 Combining all the bounds. From Eq. (56), Eq. (61) and Eq. (63) we conclude P [ X ≥ δ ] ≤ P [E1] + P [E2] + P [E3] ≤ c2p exp ( − cδ 9 ) + cp exp ( − t − 3 3 ) + cp exp ( − c 8 ) ≤ p/2, when δ and c are set appropriately. Indeed recalling that t = cδ 30 and set c ≥ 2000 log(1/δ)/δ...
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.