pith. sign in

arxiv: 2604.02611 · v1 · submitted 2026-04-03 · 💻 cs.DS

Stochastic Function Certification with Correlations

Pith reviewed 2026-05-13 18:52 UTC · model grok-4.3

classification 💻 cs.DS
keywords stochastic certificationmatroid probingapproximation algorithmscorrelated distributionsnon-adaptive probinggraph probingboolean function certification
0
0 comments X

The pith

A non-adaptive O(log n) approximation certifies stochastic matroid bases under arbitrary correlations and is tight unless P equals NP.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how to probe correlated random elements one by one until their active set can be certified to span a given matroid, while keeping the expected number of probes small. It supplies a non-adaptive algorithm that achieves an O(log n) approximation ratio for any joint distribution on the elements. The same ratio cannot be improved by more than a constant factor in polynomial time, even when the matroid is a partition matroid. For uniform matroids the authors obtain constant-factor guarantees, with a further improvement to 2 under negative correlation, and they give an adaptive O(log k) algorithm for a graph-based model of k-uniform matroids that beats prior information-theoretic lower bounds.

Core claim

For the stochastic Boolean function certification problem where f is the indicator of a matroid basis, a non-adaptive O(log n)-approximation algorithm works for arbitrary joint distributions p. This ratio is tight up to constants even for partition matroids unless P equals NP. Constant-factor approximations exist for uniform matroids, improving to a 2-approximation when the variables are negatively correlated for the 1-uniform case. For k-uniform matroids on graphs whose edge correlations arise from independent vertex variables, an adaptive O(log k) approximation is possible.

What carries the argument

A non-adaptive ordering of elements computed from the known joint distribution p that minimizes the expected number of probes needed to discover an active basis.

Load-bearing premise

The joint distribution p over the random variables is known in advance or can be sampled from to design the probing order.

What would settle it

A polynomial-time algorithm that achieves o(log n) approximation for certifying a basis of a partition matroid under arbitrary correlations would show the claimed tightness is false.

read the original abstract

We study the Stochastic Boolean Function Certification (SBFC) problem, where we are given $n$ Bernoulli random variables $\{X_e: e \in U\}$ on a ground set $U$ of $n$ elements with joint distribution $p$, a Boolean function $f: 2^U \to \{0, 1\}$, and an (unknown) scenario $S = \{e \in U: X_e = 1\}$ of active elements sampled from $p$. We seek to probe the elements one-at-a-time to reveal if they are active until we can certify $f(S) = 1$, while minimizing the expected number of probes. Unlike most previous results that assume independence, we study correlated distributions $p$ and give approximation algorithms for several classes of functions $f$. When $f(S)$ is the indicator function for whether $S$ is the spanning set of a given matroid, our problem reduces to finding a basis of active elements of a matroid by probing elements. We give a non-adaptive $O(\log n)$-approximation algorithm for arbitrary distributions $p$, and show that this is tight up to constants unless P $=$ NP, even for partition matroids. For uniform matroids, we give constant factor $4.642$-approximation ([BBFT20]) that can be further improved to a $2$-approximation if additionally the random variables are negatively correlated for the case of $1$-uniform matroid. We also give an adaptive $O(\log k)$-approximation algorithm for SBFC for $k$-uniform matroids for the Graph Probing problem, where we seek to probe the edges of a graph one-at-a-time until we find $k$ active edges. The underlying distribution on edges arises from (hidden) independent vertex random variables, with an edge being active if at least one of its endpoints is active. This significantly improves over the information-theoretic lower bound on $\Omega(\mathrm{poly}(n))$ ([JGM19]) for adaptive algorithms for $k$-uniform matroids with arbitrary distributions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript studies the Stochastic Boolean Function Certification (SBFC) problem with correlated joint distributions p on n Bernoulli variables. For the special case where f certifies that a set spans a given matroid, it gives a non-adaptive O(log n)-approximation algorithm for arbitrary p together with a matching hardness result (up to constants) that holds even for partition matroids. For uniform matroids it supplies a 4.642-approximation (improving to 2 under negative correlation for the 1-uniform case) and, for the graph-probing variant with vertex-induced edge distributions, an adaptive O(log k)-approximation for k-uniform matroids.

Significance. If the stated approximation ratios and hardness results hold, the paper meaningfully extends the stochastic probing literature beyond the independent case by handling arbitrary correlations while retaining tight guarantees for matroids. The non-adaptive O(log n) result and its tightness for partition matroids, the constant-factor improvements for uniform matroids, and the adaptive O(log k) bound that beats the known poly(n) information-theoretic barrier for the vertex-induced model are all substantive contributions. The work credits standard set-cover-style analysis and matroid properties without introducing free parameters or circular definitions.

minor comments (3)
  1. Abstract: the numerical constant 4.642 is attributed to [BBFT20] but receives no further explanation or citation expansion in the abstract; a one-sentence parenthetical on its origin would help readers.
  2. Section 1 (Introduction): the modeling assumption that p is known or samplable for policy design is stated but could be highlighted earlier as the natural information model for non-adaptive design.
  3. Theorem statements: several approximation ratios are given without an accompanying table that lists all results side-by-side with their respective matroid classes and correlation assumptions; such a table would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript, recognition of its contributions to stochastic probing under correlations, and recommendation for minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's O(log n)-approximation for arbitrary joint distributions p on matroids, the matching hardness for partition matroids, the constant-factor results for uniform matroids, and the O(log k) adaptive result for graph probing all derive from standard matroid probing and set-cover style analysis. No equations or steps reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the explicit assumption that p is known or samplable is the standard model and does not create circularity. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The results rest on standard matroid axioms and the definition of stochastic probing; no new entities or fitted parameters are introduced in the abstract.

axioms (1)
  • standard math Matroids are defined by their independent sets and spanning sets
    Invoked when reducing SBFC to finding an active basis.

pith-pipeline@v0.9.0 · 5686 in / 1134 out tokens · 32603 ms · 2026-05-13T18:52:23.179930+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    Allen, L

    S. Allen, L. Hellerstein, D. Kletenik, and T. \" U nl\" u yurt. Evaluation of monotone DNF formulas. Algorithmica , 77:661--685, 2017

  2. [2]

    Alon and J.H

    N. Alon and J.H. Spencer. The Probabilistic Method . 2016

  3. [3]

    Bansal, J

    N. Bansal, J. Batra, M. Farhadi, and P. Tetali. Improved approximations for min sum vertex cover and generalized min sum set cover. arXiv:2007.09172, 2020

  4. [4]

    Y. Ben-Dov. Optimal testing procedures for special structures of coherent systems. Management Science , 27(12):1410--1420, 1981

  5. [5]

    Behnezhad, M

    S. Behnezhad, M. Derakhshan, and M.T. Hajiaghayi. Stochastic matching with few queries: (1- ) approximation. Symposium on Theory of Computing ( STOC ), 2020

  6. [6]

    Bansal, A

    N. Bansal, A. Gupta, and R. Krishnaswamy. A constant factor approximation algorithm for generalized min-sum set cover. Symposium on Discrete Algorithms ( SODA ), 2010

  7. [7]

    Bhalgat, A

    A. Bhalgat, A. Goel, and S. Khanna. Improved approximation results for stochastic knapsack problems. Symposium on Discrete Algorithms ( SODA ), 2011

  8. [8]

    Bansal, A

    N. Bansal, A. Gupta, J. Li, J. Mestre, V. Nagarajan, and A. Rudra. When LP is the cure for your matching woes: Improved bounds for stochastic matchings. Algorithmica , 63(4):733--762, 2012

  9. [9]

    S. Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning , 8(3-4):231--357, 2015

  10. [10]

    Butterworth

    R. Butterworth. Some reliability fault-testing models. Operations Research , 20(2):335--343, 1972

  11. [11]

    Calinescu, C

    G. Calinescu, C. Chekuri, M. P\' a l, and J. Vondr\' a k. Maximizing a submodular set function subject to a matroid constraint. Integer Programming and Combinatorial Optimization ( IPCO ), 2007

  12. [12]

    Chawla, E

    S. Chawla, E. Gergatsouli, J. McMahan, and C. Tzamos. Approximating P andora's box with correlations. arXiv:2108.12976, 2021

  13. [13]

    Chernoff

    H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics , pages 493--507, 1952

  14. [14]

    Dean, M.X

    B.C. Dean, M.X. Goemans, and J. Vondr\' a k. Approximating the stochastic knapsack problem: The benefit of adaptivity. Mathematics of Operations Research , 33(4):945--964, 2008

  15. [15]

    Deshpande, L

    A. Deshpande, L. Hellerstein, and D. Kletenik. Approximation algorithms for stochastic submodular set cover with applications to boolean function evaluation and min-knapsack. ACM Transactions on Algorithms , 12(3):1--28, 2016

  16. [16]

    U. Feige. A threshold of n for approximating set cover. Journal of the ACM , 45(4):634--652, 1998

  17. [17]

    Feige, L

    U. Feige, L. Lov\' a sz, and P. Tetali. Approximating min sum set cover. Algorithmica , 40:219--234, 2004

  18. [18]

    Gkenosis, N

    D. Gkenosis, N. Grammel, L. Hellerstein, and D. Kletenik. The stochastic score classification problem. European Symposium on Algorithms ( ESA ), 2018

  19. [19]

    Gkenosis, N

    D. Gkenosis, N. Grammel, L. Hellerstein, and D. Kletenik. The stochastic boolean function evaluation problem for symmetric boolean functions. Discrete Applied Mathematics , 309:269--277, 2022

  20. [20]

    Ghuge, A

    R. Ghuge, A. Gupta, and V. Nagarajan. Nonadaptive stochastic score classification and explainable half-space evaluation. Operations Research , 2024

  21. [21]

    Grammel, L

    N. Grammel, L. Hellerstein, D. Kletenik, and N. Liu. Algorithms for the unit-cost stochastic score classification problem. Algorithmica , 84(10):3054--3074, 2022

  22. [22]

    Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization

    D. Golovin and A. Krause. Adaptive submodularity: A new approach to active learning and stochastic optimization. arXiv:1003.3967, 2017

  23. [23]

    Garnett, Y

    R. Garnett, Y. Krishnamurthy, X. Xiong, J. Schneider, and R. Mann. Bayesian optimal active search and surveying. International Conference on Machine Learning ( ICML ), 2012

  24. [24]

    Gupta and V

    A. Gupta and V. Nagarajan. A stochastic probing problem with applications. Integer Programming and Combinatorial Optimization ( IPCO ), 2013

  25. [25]

    Gupta, V

    A. Gupta, V. Nagarajan, and S. Singla. Adaptivity gaps for stochastic probing: Submodular and XOS functions. Symposium on Discrete Algorithms ( SODA ), 2017

  26. [26]

    Hellerstein, B

    L. Hellerstein, B. Plank, and K. Schewior. Approximating matroid basis testing on partition matroids using budget-in-expectation. Symposium on Discrete Algorithms ( SODA ), 2026

  27. [27]

    Iwata, L

    S. Iwata, L. Fleischer, and S. Fujishige. A combinatorial strongly polynomial algorithm for minimizing submodular functions. Journal of the ACM , 48(4):761--777, 2001

  28. [28]

    S. Im, V. Nagarajan, and R. van der Zwaan. Minimum latency submodular cover. ACM Transactions on Algorithms , 13(1):13:1--13:28, 2016

  29. [29]

    Jiang, R

    S. Jiang, R. Garnett, and B. Moseley. Cost effective active search. Advances in Neural Information Processing Systems , 32, 2019

  30. [30]

    Jiang, J

    H. Jiang, J. Li, D. Liu, and S. Singla. Algorithms and adaptivity gaps for stochastic k - TSP . Innovations in Theoretical Computer Science ( ITCS ), 2020

  31. [31]

    Jiang, G

    S. Jiang, G. Malkomes, G. Converse, A. Shofner, B. Moseley, and R. Garnett. Efficient nonmyopic active search. International Conference on Machine Learning ( ICML ), 2017

  32. [32]

    Kaplan, E

    H. Kaplan, E. Kushilevitz, and Y. Mansour. Learning with attribute costs. Symposium on Theory of Computing ( STOC ), 2005

  33. [33]

    Kainkaryam and P.J

    R.M. Kainkaryam and P.J. Woolf. Pooling in high-throughput drug screening. Current Opinion in Drug Discovery & Development , 12(3):339, 2009

  34. [34]

    N. Liu. Two 6-approximation algorithms for the stochastic score classification problem. arXiv:2212.02370, 2022

  35. [35]

    B.M.E. Moret. Decision trees and diagrams. ACM Computing Surveys , 14(4):593--623, 1982

  36. [36]

    Nielsen, L

    M.A. Nielsen, L. Rohwedder, and K. Schewior. Non-adaptive evaluation of k -of- n functions: Tight gap and a unit-cost PTAS . arXiv:2507.05877, 2025

  37. [37]

    Plank and K

    B.M. Plank and K. Schewior. Simple algorithms for stochastic score classification with small approximation ratios. SIAM Journal on Discrete Mathematics , 38(3):2069--2088, 2024

  38. [38]

    Paley and A

    R.E.A.C. Paley and A. Zygmund. On some series of functions (1). Mathematical Proceedings of the Cambridge Philosophical Society , 26:337--357, 1930

  39. [39]

    Swamy and D.B

    C. Swamy and D.B. Shmoys. Sampling-based approximation algorithms for multistage stochastic optimization. SIAM Journal on Computing , 41(4):975--1004, 2012

  40. [40]

    \" U nl\" u yurt

    T. \" U nl\" u yurt. Sequential testing problem: A follow-up review. Discrete Applied Mathematics , 377:356--369, 2025