arxiv: 2604.10423 · v1 · submitted 2026-04-12 · 💻 cs.LG · cs.DS

Recognition: unknown

Replicable Composition

Kiarash Banihashem , MohammadHossein Bateni , Hossein Esfandiari , Samira Goudarzi , MohammadTaghi Hajiaghayi

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:28 UTC · model grok-4.3

classification 💻 cs.LG cs.DS

keywords replicable algorithmscomposition theoremssample complexitydifferential privacygeneralizationmachine learning

0 comments

The pith

Replicable algorithms for multiple problems can be composed using only the sum of their individual sample needs while preserving constant replicability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that k separate problems, each admitting a replicable algorithm with its own sample complexity ni, can be solved together using roughly the sum of those ni samples while keeping the replicability parameter constant. This improves on the quadratic dependence on k that appeared in both the naive analysis and earlier reductions through differential privacy. The proof first turns each replicable algorithm into a perfectly generalizing one, composes the resulting algorithms with a privacy-style analysis, and then recovers replicability by correlated sampling. The same machinery yields a boosting theorem that improves success probability for a broad class of replicable algorithms and an Omega(nk squared) lower bound that separates adaptive from non-adaptive composition.

Core claim

Problems with sample complexities n1 through nk can be jointly solved with O~(sum ni) samples while preserving constant replicability. The approach converts each replicable algorithm into a perfectly generalizing algorithm, composes them via a privacy-style analysis, and maps back via correlated sampling. This produces the first advanced composition theorem for replicability and new bounds for the composition of perfectly generalizing algorithms with heterogeneous parameters.

What carries the argument

Conversion of each replicable algorithm into a perfectly generalizing algorithm, followed by privacy-style composition and recovery of replicability through correlated sampling.

If this is right

The total sample cost for solving k replicable problems jointly scales linearly with the sum of individual costs rather than quadratically with k.
A boosting theorem separates the failure probability from the replicability parameter, immediately tightening sample bounds for several existing replicable algorithms.
Adaptive composition of replicable algorithms requires Omega(nk squared) samples, establishing a quadratic separation from the non-adaptive case.
The composition of perfectly generalizing algorithms with heterogeneous parameters admits new sample-efficient bounds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same conversion-plus-composition technique may extend to other consistency properties such as stability or monotonicity in learning algorithms.
Correlated sampling as a recovery step could apply to settings where multiple independent runs must produce identical outputs without extra data.
The phantom-run technique used for the adaptive lower bound may yield structural results for other sequential composition problems.

Load-bearing premise

Each replicable algorithm can be converted into a perfectly generalizing algorithm with only logarithmic overhead in samples, and the privacy-style composition analysis carries over without hidden dependence on the replicability parameter.

What would settle it

An explicit pair of replicable algorithms for two simple statistical tasks whose joint replicable solution requires more than O(n1 + n2) samples would falsify the linear composition claim.

read the original abstract

Replicability requires that algorithmic conclusions remain consistent when rerun on independently drawn data. A central structural question is composition: given $k$ problems each admitting a $\rho$-replicable algorithm with sample complexity $n$, how many samples are needed to solve all jointly while preserving replicability? The naive analysis yields $\widetilde{O}(nk^2)$ samples, and Bun et al. (STOC'23) observed that reductions through differential privacy give an alternative $\widetilde{O}(n^2k)$ bound, leaving open whether the optimal $\widetilde{O}(nk)$ scaling is achievable. We resolve this open problem and, more generally, show that problems with sample complexities $n_1,\ldots,n_k$ can be jointly solved with $\widetilde{O}(\sum_i n_i)$ samples while preserving constant replicability. Our approach converts each replicable algorithm into a perfectly generalizing one, composes them via a privacy-style analysis, and maps back via correlated sampling. This yields the first advanced composition theorem for replicability. En route, we obtain new bounds for the composition of perfectly generalizing algorithms with heterogeneous parameters. As part of our results, we provide a boosting theorem for the success probability of replicable algorithms. For a broad class of problems, the failure probability appears as a separate additive term independent of $\rho$, immediately yielding improved sample complexity bounds for several problems. Finally, we prove an $\Omega(nk^2)$ lower bound for adaptive composition, establishing a quadratic separation from the non-adaptive setting. The key technique, which we call the phantom run, yields structural results of independent interest.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper settles the open composition question for replicability with an O~(sum n_i) upper bound and an Omega(n k^2) adaptive lower bound via a new phantom-run technique.

read the letter

The core result is that k replicable problems with sample complexities n1 to nk can be solved jointly using roughly the sum of the individual ni while keeping replicability constant. This improves on both the naive O(nk^2) and the DP-based O(n^2 k) bounds, and it directly answers the question left open by Bun et al. The proof route converts each replicable algorithm to a perfectly generalizing one, composes the generalizing versions with a heterogeneous advanced composition theorem, and maps back with correlated sampling. They also supply a boosting theorem that treats failure probability as an additive term separate from rho for a broad class of problems, plus the adaptive lower bound that shows quadratic separation from the non-adaptive case.

Referee Report

0 major / 4 minor

Summary. The manuscript claims to resolve an open problem on replicable algorithm composition: given k problems each admitting a ρ-replicable algorithm with sample complexity n_i, the joint problem can be solved with Õ(∑ n_i) samples while preserving constant replicability (independent of k). The approach converts each replicable algorithm to a perfectly generalizing one (with logarithmic overhead), composes the resulting algorithms via a new privacy-style advanced composition theorem for heterogeneous parameters, and maps the output back to a replicable algorithm via correlated sampling. Additional contributions include a boosting theorem that separates failure probability from ρ for a broad class of problems and an Ω(n k²) lower bound for adaptive composition, proved via a new 'phantom run' technique.

Significance. If the central claims hold, this supplies the first advanced composition theorem for replicability and closes the gap between the naïve Õ(n k²) and DP-based Õ(n² k) bounds to the optimal Õ(n k) scaling. The heterogeneous perfectly-generalizing composition bounds and the phantom-run technique for structural lower bounds are of independent interest. The boosting result immediately improves sample-complexity expressions for several concrete problems by isolating the failure-probability term.

minor comments (4)

The introduction should explicitly restate the precise definition of ρ-replicability (including the failure probability δ) before invoking the conversion lemma, to make the logarithmic-overhead claim immediately verifiable.
In the statement of the heterogeneous composition theorem, the dependence of the final replicability parameter on the individual ρ_i values should be written out explicitly rather than left implicit in the Õ notation.
The boosting theorem (Section 5) applies only to a 'broad class of problems'; the manuscript should list the precise conditions (e.g., on the loss function or hypothesis class) under which the failure probability decouples from ρ.
Figure 1 (or the corresponding diagram of the conversion–composition–correlated-sampling pipeline) would benefit from labeling the sample-complexity multipliers at each arrow so that the Õ(∑ n_i) claim is visually traceable.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive evaluation of the manuscript, accurate summary of the contributions, and recommendation for minor revision. The referee's description correctly reflects the resolution of the open problem on replicable composition, the heterogeneous composition bounds, the boosting theorem, and the phantom-run lower bound technique.

read point-by-point responses

Referee: The manuscript claims to resolve an open problem on replicable algorithm composition: given k problems each admitting a ρ-replicable algorithm with sample complexity n_i, the joint problem can be solved with Õ(∑ n_i) samples while preserving constant replicability (independent of k). The approach converts each replicable algorithm to a perfectly generalizing one (with logarithmic overhead), composes the resulting algorithms via a new privacy-style advanced composition theorem for heterogeneous parameters, and maps the output back to a replicable algorithm via correlated sampling. Additional contributions include a boosting theorem that separates failure probability from ρ for a broad class of problems and an Ω(n k²) lower bound for adaptive composition, proved via a new 'phantom run' technique.

Authors: We thank the referee for this accurate and concise summary. It correctly captures the main results, including the conversion to perfectly generalizing algorithms, the privacy-style advanced composition for heterogeneous parameters, the correlated sampling step, the boosting theorem, and the Ω(nk²) adaptive lower bound via the phantom-run technique. No changes are required in response to this summary. revision: no

Circularity Check

0 steps flagged

No circularity: derivation relies on independent conversion and composition theorems

full rationale

The paper claims a new O~(sum ni) bound via conversion of replicable algorithms to perfectly generalizing ones, privacy-style composition, and correlated sampling back to replicability. No quoted equations or steps reduce this bound to a fitted input, self-definition, or load-bearing self-citation chain. The boosting theorem for success probability and the Omega(nk^2) adaptive lower bound (via phantom run) are presented as separate contributions. The abstract and approach description treat the composition analysis as a novel extension rather than a re-derivation of prior parameters. This is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The result rests on standard probabilistic method and algorithmic composition arguments plus two new technical devices (conversion to perfect generalization and phantom run) whose independence from prior literature cannot be checked from the abstract alone.

axioms (1)

standard math Standard concentration and union bounds over independent samples hold for the underlying distribution class.
Implicit in all sample-complexity statements for replicable algorithms.

pith-pipeline@v0.9.0 · 5612 in / 1185 out tokens · 55397 ms · 2026-05-10T16:28:31.970012+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

[1]

[BF16] Raef Bassily and Yoav Freund

To appear. [BF16] Raef Bassily and Yoav Freund. Typical stability.arXiv preprint arXiv:1604.03336,

work page arXiv
[2]

Stability is stable: Connections between replicability, privacy, and adaptive generalization

[BGH+23] Mark Bun, Marco Gaboardi, Max Hopkins, Russell Impagliazzo, Rex Lei, Toniann Pitassi, Satchit Sivakumar, and Jessica Sorrell. Stability is stable: Connections between replicability, privacy, and adaptive generalization. In Barna Saha and Rocco A. Servedio, editors,Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, Orl...

work page 2023
[3]

Borsuk-ulam and replicable learning of large-margin halfspaces.arXiv preprint arXiv:2503.15294,

[BHH+25] Ari Blondal, Hamed Hatami, Pooya Hatami, Chavdar Lalov, and Sivan Tretiak. Borsuk-ulam and replicable learning of large-margin halfspaces.arXiv preprint arXiv:2503.15294,

work page arXiv
[4]

[Bro97] Andrei Z. Broder. On the resemblance and containment of documents. In Bruno Carpentieri, Alfredo De Santis, Ugo Vaccaro, and James A. Storer, editors,Compression and Complexity of SEQUENCES 1997, Positano, Amalfitan Coast, Salerno, Italy, June 11-13, 1997, Proceedings, pages 21–29. IEEE,

work page 1997
[5]

Local borsuk-ulam, stability, and replicability

[CCMY24] Zachary Chase, Bogdan Chornomaz, Shay Moran, and Amir Yehudayoff. Local borsuk-ulam, stability, and replicability. In Bojan Mohar, Igor Shinkar, and Ryan O’Donnell, editors,Proceed- ings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, Vancouver, BC, Canada, June 24-28, 2024, pages 1769–1780. ACM,

work page 2024
[6]

Adaptive learning with robust generalization guarantees

[CLN+16] Rachel Cummings, Katrina Ligett, Kobbi Nissim, Aaron Roth, and Zhiwei Steven Wu. Adaptive learning with robust generalization guarantees. In Vitaly Feldman, Alexander Rakhlin, and Ohad Shamir, editors,Proceedings of the 29th Conference on Learning Theory, COLT 2016, New York, USA, June 23-26, 2016, volume 49 ofJMLR Workshop and Conference Proceed...

work page 2016
[7]

Stability and replicability in learning

[CMY23] Zachary Chase, Shay Moran, and Amir Yehudayoff. Stability and replicability in learning. In 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, Santa Cruz, CA, USA, November 6-9, 2023, pages 2430–2439. IEEE,

work page 2023
[8]

[DPWV23] Peter Dixon, Aduri Pavan, Jason Vander Woude, and N. V. Vinodchandran. List and certificate complexities in replicable learning. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, 48 Moritz Hardt, and Sergey Levine, editors,Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 202...

work page 2023
[9]

Replicable Reinforcement Learning with Linear Function Approximation

[EHK+25] Eric Eaton, Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, and Jessica Sorrell. Replicable reinforcement learning with linear function approximation.arXiv preprint arXiv:2509.08660,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Replicable bandits

[EKK+23] Hossein Esfandiari, Alkis Kalavasis, Amin Karbasi, Andreas Krause, Vahab Mirrokni, and Grig- oris Velegkas. Replicable bandits. InThe Eleventh International Conference on Learning Rep- resentations, ICLR 2023, Kigali, Rwanda, May 1-5,

work page 2023
[11]

Replica- ble clustering

[EKM+23] Hossein Esfandiari, Amin Karbasi, Vahab Mirrokni, Grigoris Velegkas, and Felix Zhou. Replica- ble clustering. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Information Processing Systems 36: Annual Confer- ence on Neural Information Processing Systems 2023, NeurIPS 2023, New...

work page 2023
[12]

Differentially private combinatorial optimization

[GLM+10] Anupam Gupta, Katrina Ligett, Frank McSherry, Aaron Roth, and Kunal Talwar. Differentially private combinatorial optimization. In Moses Charikar, editor,Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, Jan- uary 17-19, 2010, pages 1106–1125. SIAM,

work page 2010
[13]

Hopkins, D

[HIK+24] Max Hopkins, Russell Impagliazzo, Daniel Kane, Sihan Liu, and Christopher Ye. Replicability in high dimensional statistics.CoRR, abs/2406.02628,

work page arXiv
[14]

Hopkins, S

[HLYY25] Max Hopkins, Sihan Liu, Christopher Ye, and Yuichi Yoshida. From generative to episodic: Sample-efficient replicable reinforcement learning.arXiv preprint arXiv:2507.11926,

work page arXiv
[15]

Reproducibility in learn- ing

[ILPS22] Russell Impagliazzo, Rex Lei, Toniann Pitassi, and Jessica Sorrell. Reproducibility in learn- ing. In Stefano Leonardi and Anupam Gupta, editors,STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pages 818–831. ACM,

work page 2022
[16]

Repli- cable learning of large-margin halfspaces

[KKL+24] Alkis Kalavasis, Amin Karbasi, Kasper Green Larsen, Grigoris Velegkas, and Felix Zhou. Repli- cable learning of large-margin halfspaces. InForty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27,

work page 2024
[17]

Statistical indistinguisha- bility of learning algorithms

[KKMV23] Alkis Kalavasis, Amin Karbasi, Shay Moran, and Grigoris Velegkas. Statistical indistinguisha- bility of learning algorithms. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 ...

work page 2023
[18]

Replicability in reinforcement learn- ing

[KVYZ23] Amin Karbasi, Grigoris Velegkas, Lin Yang, and Felix Zhou. Replicability in reinforcement learn- ing. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, ...

work page 2023
[19]

The sample complexity of replicable realizable pac learning.arXiv preprint arXiv:2602.19552,

[LMPS26] Kasper Green Larsen, Markus Engelund Mathiasen, Chirag Pabbaraju, and Clement Svendsen. The sample complexity of replicable realizable pac learning.arXiv preprint arXiv:2602.19552,

work page arXiv
[20]

Improved replicable boosting with majority-of-majorities.arXiv preprint arXiv:2501.18388,

[LMS25] Kasper Green Larsen, Markus Engelund Mathiasen, and Clement Svendsen. Improved replicable boosting with majority-of-majorities.arXiv preprint arXiv:2501.18388,

work page arXiv
[21]

Private selection from private candidates

[LT19] Jingcheng Liu and Kunal Talwar. Private selection from private candidates. In Moses Charikar and Edith Cohen, editors,Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019, pages 298–309. ACM,

work page 2019
[22]

Replicable uniformity testing

[LY24] Sihan Liu and Christopher Ye. Replicable uniformity testing. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December ...

work page 2024
[23]

Rogers, Aaron Roth, Adam D

[RRST16] Ryan M. Rogers, Aaron Roth, Adam D. Smith, and Om Thakkar. Max-information, differential privacy, and post-selection hypothesis testing.CoRR, abs/1604.03924,

work page arXiv
[24]

[WDP+24] Jason Vander Woude, Peter Dixon, Aduri Pavan, Jamie Radcliffe, and N. V. Vinodchandran. Replicability in learning: Geometric partitions and kkm-sperner lemma. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: Annual Conf...

work page 2024
[25]

Optimal guarantees for algorithmic reproducibility and gradient complexity in convex optimization

[ZYKH23] Liang Zhang, Junchi Yang, Amin Karbasi, and Niao He. Optimal guarantees for algorithmic reproducibility and gradient complexity in convex optimization. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural In- formation Processing Systems 36: Annual Conference on Neural Information ...

work page 2023