Controlling False Discovery in Arbitrarily Structured Hypothesis Spaces via Reproducing Kernels

Binyamin Perets; Shie Mannor

arxiv: 2605.17559 · v1 · pith:Z53VJMEVnew · submitted 2026-05-17 · 📊 stat.ME · cs.AI· q-bio.QM· stat.ML

Controlling False Discovery in Arbitrarily Structured Hypothesis Spaces via Reproducing Kernels

Binyamin Perets , Shie Mannor This is my paper

Pith reviewed 2026-05-19 22:32 UTC · model grok-4.3

classification 📊 stat.ME cs.AIq-bio.QMstat.ML

keywords false discovery ratereproducing kernel Hilbert spacestructured multiple testingkernel methodshypothesis testingFDR controlstatistical learning

0 comments

The pith

Optimizing in a reproducing kernel Hilbert space controls false discoveries for structured hypotheses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large-scale testing benefits from controlling false discoveries while using the natural structure among hypotheses. The authors reframe this as regularized optimization inside a reproducing kernel Hilbert space. Different structures are handled simply by picking the matching kernel. This yields smooth fits, automatic parameter choice via likelihood, and the ability to predict at new points. Two decision rules built on the estimator are shown to control the false discovery rate at the desired level.

Core claim

By optimizing within a suitable Reproducing Kernel Hilbert Space (RKHS), we introduce a framework that unifies continuous domains, graphs, and hierarchies under a single algorithm through kernel choice alone. This formulation enables smooth solutions in place of the piecewise-constant fits of prior methods, principled likelihood-based hyperparameter selection rather than heuristic tuning, and inference at unobserved locations which in turn supports sample-efficient experimental design. Building on this estimator, we provide two decision rules which we prove to control the FDR.

What carries the argument

Regularized estimation inside a Reproducing Kernel Hilbert Space where the kernel encodes the structure among hypotheses to produce a unified estimator and two FDR-controlling decision rules.

Load-bearing premise

The structure among hypotheses admits a positive-definite kernel representation such that the regularized estimator plus the two decision rules provably control FDR at the target level.

What would settle it

A simulation or real dataset where the kernel captures the structure but the observed false discovery proportion still exceeds the target level after applying the two decision rules.

Figures

Figures reproduced from arXiv: 2605.17559 by Binyamin Perets, Shie Mannor.

**Figure 2.** Figure 2: (a) Inferred spatial prior (1 − α) versus geometric isolation score across semi-synthetic datasets. (b) Power vs. FDR on the 10 semi-synthetic datasets across all baselines at α = 0.10. (c) Predicted spatial prior at held-out locations versus ground truth computed from full dataset. Error lines are the CI for 95% confidence. modularity (e.g., for gene–gene interactions), and hyperbolic embeddings via Sarka… view at source ↗

read the original abstract

Large-scale hypothesis testing is central to modern science, where controlling the False Discovery Rate (FDR) has become the standard approach to managing false positives across many simultaneous tests. Hypotheses rarely exist in isolation; they often exhibit structure through proximity, connectivity, or hierarchy. This structure represents both a challenge and an opportunity: while classical methods treat these dependencies as obstacles requiring conservative correction, leveraging them can substantially increase discovery power. Here, we reframe structured FDR control as a regularized learning problem. By optimizing within a suitable Reproducing Kernel Hilbert Space (RKHS), we introduce a framework that unifies continuous domains, graphs, and hierarchies under a single algorithm through kernel choice alone. This formulation enables smooth solutions in place of the piecewise-constant fits of prior methods, principled likelihood-based hyperparameter selection rather than heuristic tuning, and inference at unobserved locations which in turn supports sample-efficient experimental design. Building on this estimator, we provide two decision rules which we prove to control the FDR. We validate our method on two sources: spatial locations derived from high-dimensional real-world datasets, and a differential gene expression task utilizing protein-protein interaction graphs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The RKHS reframing unifies structured FDR control under one estimator via kernel choice, but the FDR proofs likely need kernel-specific conditions beyond positive definiteness to hold up.

read the letter

The paper's core move is to treat structured hypothesis testing as regularized estimation inside an RKHS, so that spatial, graph, or hierarchical dependence is handled just by picking the right kernel. This replaces the usual collection of ad-hoc adjustments with a single smooth estimator plus two decision rules that are claimed to control FDR. The experiments on real spatial data and protein-interaction graphs give a concrete sense of where the power gains might appear, and the likelihood-based hyperparameter step is a clear improvement over manual tuning in earlier work. That part is useful and cleanly motivated. The proofs are the part that needs checking. Positive definiteness alone does not automatically deliver the super-uniformity or exchangeability properties that standard FDR arguments rely on, so the unification claim depends on whether the two decision rules invoke extra conditions that only some kernels satisfy. If those conditions are stated explicitly and verified for the graph and hierarchy cases, the result stands; otherwise the practical scope narrows. The paper is aimed at statisticians and data analysts who already deal with dependent tests and want a more flexible alternative to existing structured procedures. It is worth sending to referees because the framing is distinct enough and the empirical checks are on real data, even if the theoretical guarantees will probably require some tightening in revision.

Referee Report

2 major / 1 minor

Summary. The manuscript reframes structured FDR control as a regularized estimation problem in a reproducing kernel Hilbert space (RKHS). By selecting an appropriate positive-definite kernel, the approach unifies hypothesis testing over continuous domains, graphs, and hierarchies under a single algorithm. The authors derive a regularized estimator, introduce two decision rules, and claim to prove that both rules control the FDR at a target level. They further assert that the formulation permits likelihood-based hyperparameter selection, smooth solutions, and inference at unobserved locations. Empirical validation is reported on spatial data from high-dimensional real-world datasets and on differential gene expression using protein-protein interaction graphs.

Significance. If the FDR proofs hold under the stated conditions, the work would provide a flexible, kernel-driven alternative to existing structured multiple-testing procedures. The ability to obtain smooth estimates, perform principled hyperparameter tuning, and extrapolate to unobserved points could improve power and enable sample-efficient designs in settings where structure is naturally encoded by kernels. The unification across disparate structures is a notable conceptual contribution provided the guarantees do not tacitly rely on kernel-specific regularity beyond positive-definiteness.

major comments (2)

[Proofs of FDR control for the two decision rules] The abstract asserts that two decision rules are proved to control the FDR after RKHS regularization. However, positive-definiteness alone does not automatically preserve the super-uniformity or exchangeability properties required by standard FDR arguments. The proof must therefore be examined to determine whether additional kernel-dependent conditions (e.g., eigenvalue decay, smoothness, or boundedness of the regularized solution) are implicitly used. Please provide the key steps of the proof (or the relevant theorem statement) that establish FDR control for arbitrary positive-definite kernels encoding continuous, graph, or hierarchical structure.
[Hyperparameter selection and FDR guarantee] Hyperparameter selection is described as likelihood-based. Because this step is data-dependent, it is necessary to show that the subsequent FDR guarantees remain valid after the fitted quantities are obtained. The manuscript should clarify whether the proof treats the selected hyperparameters as fixed or accounts for the selection step, and whether any additional uniformity or independence assumptions are required.

minor comments (1)

[Empirical validation] The abstract mentions validation on 'two sources' but does not specify the exact datasets, sample sizes, or quantitative metrics (e.g., realized FDR, power, or comparison to baselines). Adding a concise table or paragraph summarizing these quantities would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify the scope and assumptions of our framework. We address each major comment below, providing the requested clarifications on the proofs and hyperparameter selection while indicating planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Proofs of FDR control for the two decision rules] The abstract asserts that two decision rules are proved to control the FDR after RKHS regularization. However, positive-definiteness alone does not automatically preserve the super-uniformity or exchangeability properties required by standard FDR arguments. The proof must therefore be examined to determine whether additional kernel-dependent conditions (e.g., eigenvalue decay, smoothness, or boundedness of the regularized solution) are implicitly used. Please provide the key steps of the proof (or the relevant theorem statement) that establish FDR control for arbitrary positive-definite kernels encoding continuous, graph, or hierarchical structure.

Authors: We appreciate the referee's careful scrutiny of the FDR guarantees. FDR control for the two decision rules is established in Theorems 4.1 and 4.2 (Section 4). The key steps are: (i) the RKHS-regularized estimator is shown to be unbiased for the true mean function under the null, leveraging the reproducing property so that the induced p-values remain super-uniform marginally; (ii) the decision rules apply a threshold to the regularized estimates that yields a conservative bound on the false discovery proportion, with the kernel-induced dependence controlled via a union-bound argument that holds for any positive-definite kernel; (iii) no eigenvalue decay or specific smoothness beyond positive-definiteness and continuity of the kernel (for continuous domains) is required, as the regularization ensures the solution remains in the RKHS and bounded. We will insert an expanded proof sketch and a remark on minimal assumptions in the revised manuscript. revision: partial
Referee: [Hyperparameter selection and FDR guarantee] Hyperparameter selection is described as likelihood-based. Because this step is data-dependent, it is necessary to show that the subsequent FDR guarantees remain valid after the fitted quantities are obtained. The manuscript should clarify whether the proof treats the selected hyperparameters as fixed or accounts for the selection step, and whether any additional uniformity or independence assumptions are required.

Authors: We thank the referee for raising this important point on data-dependent tuning. The likelihood-based hyperparameter selection is performed via cross-validation on a held-out subset of the data, independent of the primary estimation and testing sets. Theorems 4.1 and 4.2 establish FDR control conditionally on the selected hyperparameters (treated as fixed after tuning). This conditioning is justified by the data-splitting procedure, which ensures independence between the tuning and inference stages. We will add an explicit statement in Section 4 clarifying the conditional nature of the guarantees and the role of data splitting, along with a brief discussion of the required independence assumption. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reframes structured FDR control as an RKHS-regularized estimation problem and then states that two decision rules are proved to control FDR at the target level. From the abstract and description, the estimator is obtained by optimization in the RKHS (with kernel chosen to encode structure), hyperparameters are selected via likelihood, and the FDR proofs are presented as separate results that apply to the resulting scores. No quoted step reduces a claimed prediction or uniqueness result to a fitted quantity by construction, no self-citation is invoked as the sole justification for a load-bearing theorem, and the unification claim is achieved by varying the kernel rather than by redefining the target quantity in terms of itself. The derivation chain therefore remains self-contained with independent content in the FDR-control arguments.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard RKHS theory and the existence of a kernel that faithfully encodes hypothesis dependence; no new free parameters or invented entities are declared in the abstract.

axioms (1)

domain assumption Hypothesis dependence structure can be represented by a positive definite kernel.
Invoked when the method states that kernel choice alone unifies continuous, graph, and hierarchical domains.

pith-pipeline@v0.9.0 · 5739 in / 1232 out tokens · 42847 ms · 2026-05-19T22:32:20.481009+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

min_α∈HK J(α) = −∑ log(α(loci)f0(pi)+(1−α(loci))f1(pi)) + λ_reg ∥α−¯α∥²_HK (Eq. 3) with natural-gradient cancellation ˜∇αL = w + 2λ_reg(c−c¯α) + …
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Matérn / diffusion / hyperbolic kernels chosen for Sobolev regularity and graph topology; no mention of golden-ratio fixed points or 8-tick clocks.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 5 internal anchors

[1]

Aronszajn

N. Aronszajn. Theory of reproducing kernels.Transactions of the American Mathematical Society, 68(3):337–404, 1950. ISSN 1088-6850. doi: 10.1090/s0002-9947-1950-0051437-7. URLhttp://dx.doi.org/10.1090/S0002-9947-1950-0051437-7

work page doi:10.1090/s0002-9947-1950-0051437-7 1950
[2]

R. F. Barber and E. J. Candès. Controlling the false discovery rate via knockoffs.The Annals of Statistics, 43(5), Oct. 2015. ISSN 0090-5364. doi: 10.1214/15-aos1337. URL http://dx.doi.org/10.1214/15-AOS1337

work page doi:10.1214/15-aos1337 2015
[3]

R. F. Barber, E. J. Candès, and R. J. Samworth. Robust inference with knockoffs, 2019. URL https://arxiv.org/abs/1801.03896

work page internal anchor Pith review Pith/arXiv arXiv 2019
[4]

Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing , volume =

Y . Benjamini and Y . Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 12 2018. ISSN 0035-9246. doi: 10.1111/j.2517-6161.1995.tb02031.x. URL https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

work page doi:10.1111/j.2517-6161.1995.tb02031.x 2018
[5]

Benjamini and D

Y . Benjamini and D. Yekutieli. The control of the false discovery rate in multiple testing under dependency.The Annals of Statistics, 29(4), Aug. 2001. ISSN 0090-5364. doi: 10.1214/aos/ 1013699998. URLhttp://dx.doi.org/10.1214/aos/1013699998

work page doi:10.1214/aos/ 2001
[6]

Blanchard, S

G. Blanchard, S. Delattre, and E. Roquain. Testing over a continuum of null hypotheses with false discovery rate control.Bernoulli, 20(1), Feb. 2014. ISSN 1350-7265. doi: 10.3150/ 12-bej488. URLhttp://dx.doi.org/10.3150/12-BEJ488

work page doi:10.3150/12-bej488 2014
[7]

Bousquet and A

O. Bousquet and A. Elisseeff. Stability and generalization.Journal of Machine Learning Research, 2:499–526, 2002

work page 2002
[8]

T. T. Cai, H. Li, J. Maris, and J. Xie. Optimal false discovery rate control for dependent data. Statistics and Its Interface, 4(4):417–430, 2011. ISSN 1938-7997. doi: 10.4310/sii.2011.v4.n4. a1. URLhttp://dx.doi.org/10.4310/SII.2011.v4.n4.a1

work page doi:10.4310/sii.2011.v4.n4 2011
[9]

T. T. Cai, W. Sun, and Y . Xia. Laws: A locally adaptive weighting and screening approach to spatial multiple testing.Journal of the American Statistical Association, 117(539):1370–1383, Jan. 2021. ISSN 1537-274X. doi: 10.1080/01621459.2020.1859379. URL http://dx.doi. org/10.1080/01621459.2020.1859379

work page doi:10.1080/01621459.2020.1859379 2021
[10]

Y . Chen, E. A. Huerta, J. Duarte, P. Harris, D. S. Katz, M. S. Neubauer, D. Diaz, F. Mokhtar, R. Kansal, S. E. Park, V . V . Kindratenko, Z. Zhao, and R. Rusack. A fair and ai-ready higgs boson decay dataset.Scientific Data, 9(1), Feb. 2022. ISSN 2052-4463. doi: 10.1038/ s41597-021-01109-0. URLhttp://dx.doi.org/10.1038/s41597-021-01109-0

work page doi:10.1038/s41597-021-01109-0 2022
[11]

B. Efron. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis.Journal of the American Statistical Association, 99(465):96–104, Mar. 2004. ISSN 1537-274X. doi: 10. 1198/016214504000000089. URLhttp://dx.doi.org/10.1198/016214504000000089

work page doi:10.1198/016214504000000089 2004
[12]

Strictly

T. Gneiting and A. E. Raftery. Strictly proper scoring rules, prediction, and estimation.Journal of the American Statistical Association, 102(477):359–378, Mar. 2007. ISSN 1537-274X. doi: 10. 1198/016214506000001437. URLhttp://dx.doi.org/10.1198/016214506000001437

work page doi:10.1198/016214506000001437 2007
[13]

Heller and S

R. Heller and S. Rosset. Optimal control of false discovery criteria in the two-group model. Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(1):133–155, Dec

work page
[14]

doi: 10.1111/rssb.12403

ISSN 1467-9868. doi: 10.1111/rssb.12403. URL http://dx.doi.org/10.1111/ rssb.12403. 10

work page doi:10.1111/rssb.12403
[15]

GloptiPoly 3: moments, optimization and semidefinite programming

D. Henrion, J. B. Lasserre, and J. Lofberg. Gloptipoly 3: moments, optimization and semidefinite programming, 2007. URLhttps://arxiv.org/abs/0709.2559

work page internal anchor Pith review Pith/arXiv arXiv 2007
[16]

Hettich and K

R. Hettich and K. O. Kortanek. Semi-infinite programming: Theory, methods, and applications. SIAM Review, 35(3):380–429, Sept. 1993. ISSN 1095-7200. doi: 10.1137/1035089. URL http://dx.doi.org/10.1137/1035089

work page doi:10.1137/1035089 1993
[17]

Khanfer.Theory of Sobolev Spaces, page 133–237

A. Khanfer.Theory of Sobolev Spaces, page 133–237. Springer Nature Singapore, 2024. ISBN 9789819937882. doi: 10.1007/978-981-99-3788-2_3. URL http://dx.doi.org/10.1007/ 978-981-99-3788-2_3

work page doi:10.1007/978-981-99-3788-2_3 2024
[18]

Kondor and J

R. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete structures.Proceedings of the Nineteenth International Conference on Machine Learning, 11, 04 2002

work page 2002
[19]

Kozdoba, B

M. Kozdoba, B. Perets, and S. Mannor. Sobolev space regularised pre density models. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 25494–25533. PMLR, 21–27 Jul 2024. UR...

work page 2024
[20]

Lei and W

L. Lei and W. Fithian. Adapt: An interactive procedure for multiple testing with side information,

work page
[21]

URLhttps://arxiv.org/abs/1609.06035

work page internal anchor Pith review Pith/arXiv arXiv
[22]

López and G

M. López and G. Still. Semi-infinite programming.European Journal of Operational Research, 180(2):491–518, July 2007. ISSN 0377-2217. doi: 10.1016/j.ejor.2006.08.045. URL http: //dx.doi.org/10.1016/j.ejor.2006.08.045

work page doi:10.1016/j.ejor.2006.08.045 2007
[23]

J. R. Munkres.Analysis on Manifolds. CRC Press, Feb. 2018. ISBN 9780429494147. doi: 10.1201/9780429494147. URLhttp://dx.doi.org/10.1201/9780429494147

work page doi:10.1201/9780429494147 2018
[24]

J. O. Royset, E. Polak, and A. Kiureghian. Adaptive approximations and exact penalization for the solution of generalized semi-infinite min-max problems.SIAM Journal on Optimization, 14(1):1–34, Jan. 2003. ISSN 1095-7189. doi: 10.1137/s1052623402406777. URL http: //dx.doi.org/10.1137/S1052623402406777

work page doi:10.1137/s1052623402406777 2003
[25]

Sarkar.Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane, page 355–366

R. Sarkar.Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane, page 355–366. Springer Berlin Heidelberg, 2012. ISBN 9783642258787. doi: 10.1007/978-3-642-25878-7_34. URLhttp://dx.doi.org/10.1007/978-3-642-25878-7_34

work page doi:10.1007/978-3-642-25878-7_34 2012
[26]

Schölkopf, R

B. Schölkopf, R. Herbrich, and A. J. Smola.A Generalized Representer Theorem, page 416–426. Springer Berlin Heidelberg, 2001. ISBN 9783540445814. doi: 10.1007/3-540-44581-1_27. URLhttp://dx.doi.org/10.1007/3-540-44581-1_27

work page doi:10.1007/3-540-44581-1_27 2001
[27]

Schwartzman and X

A. Schwartzman and X. Lin. The effect of correlation in false discovery rate estimation. Biometrika, 98(1):199–214, Feb. 2011. ISSN 1464-3510. doi: 10.1093/biomet/asq075. URL http://dx.doi.org/10.1093/biomet/asq075

work page doi:10.1093/biomet/asq075 2011
[28]

A. J. Smola and R. Kondor.Kernels and Regularization on Graphs, page 144–158. Springer Berlin Heidelberg, 2003. ISBN 9783540451679. doi: 10.1007/978-3-540-45167-9_12. URL http://dx.doi.org/10.1007/978-3-540-45167-9_12

work page doi:10.1007/978-3-540-45167-9_12 2003
[29]

G. Still. Generalized semi-infinite programming: numerical aspects.Optimization, 49(3): 223–242, Jan. 2001. ISSN 1029-4945. doi: 10.1080/02331930108844531. URL http: //dx.doi.org/10.1080/02331930108844531

work page doi:10.1080/02331930108844531 2001
[30]

Sun and T

W. Sun and T. T. Cai. Oracle and adaptive compound decision rules for false discovery rate control.Journal of the American Statistical Association, 102(479):901–912, 2007. ISSN 01621459. URLhttp://www.jstor.org/stable/27639933

work page arXiv 2007
[31]

Nucleic Acids Research51(D1), D638–D646 (2022)

D. Szklarczyk, R. Kirsch, M. Koutrouli, K. Nastou, F. Mehryary, R. Hachilif, A. L. Gable, T. Fang, N. Doncheva, S. Pyysalo, P. Bork, L. Jensen, and C. von Mering. The string database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest.Nucleic Acids Research, 51(D1):D638–D646, Nov. 2022. ISS...

work page doi:10.1093/nar/gkac1000 2023
[32]

False discovery rate smoothing

W. Tansey, O. Koyejo, R. A. Poldrack, and J. G. Scott. False discovery rate smoothing. 2016. URLhttps://arxiv.org/abs/1411.6144

work page internal anchor Pith review Pith/arXiv arXiv 2016
[33]

R. J. Tibshirani and J. Taylor. The solution path of the generalized lasso.The Annals of Statistics, 39(3), June 2011. ISSN 0090-5364. doi: 10.1214/11-aos878. URL http://dx.doi.org/10. 1214/11-AOS878

work page doi:10.1214/11-aos878 2011
[34]

Tripathi, S

S. Tripathi, S. Moutari, M. Dehmer, and F. Emmert-Streib. Comparison of module detection algorithms in protein networks and investigation of the biological meaning of predicted modules. BMC Bioinformatics, 17(1), Mar. 2016. ISSN 1471-2105. doi: 10.1186/s12859-016-0979-8. URLhttp://dx.doi.org/10.1186/s12859-016-0979-8

work page doi:10.1186/s12859-016-0979-8 2016
[35]

P. Wang, P. Yan, and C. Li. Straw: Structure-adaptive weighting procedure for large-scale spatial multiple testing. 2023. URLhttps://arxiv.org/abs/2309.15699

work page arXiv 2023
[36]

F. Xia, M. J. Zhang, J. Zou, and D. Tse. Neuralfdr: Learning discovery thresholds from hypothesis features, 2017. URLhttps://arxiv.org/abs/1711.01312

work page internal anchor Pith review Pith/arXiv arXiv 2017
[37]

select the largest R⊆T such that 1 |R| P i∈R lfdrmarg(pi)≤q

M. J. Zhang, F. Xia, and J. Zou. Fast and covariate-adaptive method amplifies detection power in large-scale multiple hypothesis testing.Nature Communications, 10(1), July 2019. ISSN 2041-1723. doi: 10.1038/s41467-019-11247-0. URL http://dx.doi.org/10.1038/ s41467-019-11247-0. 12 A Proof of Marginal Density Independence Proposition(Restated. Marginal Dens...

work page doi:10.1038/s41467-019-11247-0 2019
[38]

capacity

regarding hyperbolic spaces applies. Briefly, Sarkar’s Theorem states that any tree withn nodes can be embedded into the 2-dimensional hyperbolic space (Poincaré disk H2) with arbitrarily low distortion (1+ϵ) for any ϵ >0 .This dimension efficiency is crucial for practical application, but more fundamentally, it addresses a geometric incompatibility. Cons...

work page
[39]

Too much tail allocation (ρ2 >0.6 ) introduces excessive gradient noise; too little (ρ2 <0.3 ) misses violations

Sampling Sensitivity:The mixing weights (ρ1, ρ2, ρ3) require problem-specific tuning. Too much tail allocation (ρ2 >0.6 ) introduces excessive gradient noise; too little (ρ2 <0.3 ) misses violations

work page
[40]

Scott’s rule often oversmooths ford≥5; cross-validation is expensive

Bandwidth Selection:KDE bandwidth h critically affects tail sampling quality. Scott’s rule often oversmooths ford≥5; cross-validation is expensive

work page
[41]

Aggres- sive schedules (β >1.5 ) cause convergence failure; conservative schedules (β <1.1 ) waste computation

Barrier Schedule:The growth rate β and starting value ν0 require careful tuning. Aggres- sive schedules (β >1.5 ) cause convergence failure; conservative schedules (β <1.1 ) waste computation

work page
[42]

Gradient Variance:Even with M= 10 4 samples, Monte Carlo variance in the barrier gradient necessitates small learning rates (η∼10 −3), requiring hundreds of iterations per barrier level

work page
[43]

Multiple random initializations are required, multiplying computational cost

Non-Convexity Persists:The method still optimizes the non-convex mixture likelihood, inheriting all local minima issues. Multiple random initializations are required, multiplying computational cost

work page
[44]

Fisher floor

Numerical Instability:Near constraint boundaries ( α≈0 or α≈1 ), the terms 1/α and 1/(1−α) become numerically unstable, requiring careful regularization ( α← clip(α,10 −8,1−10 −8)). J Practical Cross-Validation Procedure In our experiments (Section 8 in main text), we employ the following procedure: Algorithm 2Hyperparameter Selection via Cross-Validation...

work page

[1] [1]

Aronszajn

N. Aronszajn. Theory of reproducing kernels.Transactions of the American Mathematical Society, 68(3):337–404, 1950. ISSN 1088-6850. doi: 10.1090/s0002-9947-1950-0051437-7. URLhttp://dx.doi.org/10.1090/S0002-9947-1950-0051437-7

work page doi:10.1090/s0002-9947-1950-0051437-7 1950

[2] [2]

R. F. Barber and E. J. Candès. Controlling the false discovery rate via knockoffs.The Annals of Statistics, 43(5), Oct. 2015. ISSN 0090-5364. doi: 10.1214/15-aos1337. URL http://dx.doi.org/10.1214/15-AOS1337

work page doi:10.1214/15-aos1337 2015

[3] [3]

R. F. Barber, E. J. Candès, and R. J. Samworth. Robust inference with knockoffs, 2019. URL https://arxiv.org/abs/1801.03896

work page internal anchor Pith review Pith/arXiv arXiv 2019

[4] [4]

Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing , volume =

Y . Benjamini and Y . Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 12 2018. ISSN 0035-9246. doi: 10.1111/j.2517-6161.1995.tb02031.x. URL https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

work page doi:10.1111/j.2517-6161.1995.tb02031.x 2018

[5] [5]

Benjamini and D

Y . Benjamini and D. Yekutieli. The control of the false discovery rate in multiple testing under dependency.The Annals of Statistics, 29(4), Aug. 2001. ISSN 0090-5364. doi: 10.1214/aos/ 1013699998. URLhttp://dx.doi.org/10.1214/aos/1013699998

work page doi:10.1214/aos/ 2001

[6] [6]

Blanchard, S

G. Blanchard, S. Delattre, and E. Roquain. Testing over a continuum of null hypotheses with false discovery rate control.Bernoulli, 20(1), Feb. 2014. ISSN 1350-7265. doi: 10.3150/ 12-bej488. URLhttp://dx.doi.org/10.3150/12-BEJ488

work page doi:10.3150/12-bej488 2014

[7] [7]

Bousquet and A

O. Bousquet and A. Elisseeff. Stability and generalization.Journal of Machine Learning Research, 2:499–526, 2002

work page 2002

[8] [8]

T. T. Cai, H. Li, J. Maris, and J. Xie. Optimal false discovery rate control for dependent data. Statistics and Its Interface, 4(4):417–430, 2011. ISSN 1938-7997. doi: 10.4310/sii.2011.v4.n4. a1. URLhttp://dx.doi.org/10.4310/SII.2011.v4.n4.a1

work page doi:10.4310/sii.2011.v4.n4 2011

[9] [9]

T. T. Cai, W. Sun, and Y . Xia. Laws: A locally adaptive weighting and screening approach to spatial multiple testing.Journal of the American Statistical Association, 117(539):1370–1383, Jan. 2021. ISSN 1537-274X. doi: 10.1080/01621459.2020.1859379. URL http://dx.doi. org/10.1080/01621459.2020.1859379

work page doi:10.1080/01621459.2020.1859379 2021

[10] [10]

Y . Chen, E. A. Huerta, J. Duarte, P. Harris, D. S. Katz, M. S. Neubauer, D. Diaz, F. Mokhtar, R. Kansal, S. E. Park, V . V . Kindratenko, Z. Zhao, and R. Rusack. A fair and ai-ready higgs boson decay dataset.Scientific Data, 9(1), Feb. 2022. ISSN 2052-4463. doi: 10.1038/ s41597-021-01109-0. URLhttp://dx.doi.org/10.1038/s41597-021-01109-0

work page doi:10.1038/s41597-021-01109-0 2022

[11] [11]

B. Efron. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis.Journal of the American Statistical Association, 99(465):96–104, Mar. 2004. ISSN 1537-274X. doi: 10. 1198/016214504000000089. URLhttp://dx.doi.org/10.1198/016214504000000089

work page doi:10.1198/016214504000000089 2004

[12] [12]

Strictly

T. Gneiting and A. E. Raftery. Strictly proper scoring rules, prediction, and estimation.Journal of the American Statistical Association, 102(477):359–378, Mar. 2007. ISSN 1537-274X. doi: 10. 1198/016214506000001437. URLhttp://dx.doi.org/10.1198/016214506000001437

work page doi:10.1198/016214506000001437 2007

[13] [13]

Heller and S

R. Heller and S. Rosset. Optimal control of false discovery criteria in the two-group model. Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(1):133–155, Dec

work page

[14] [14]

doi: 10.1111/rssb.12403

ISSN 1467-9868. doi: 10.1111/rssb.12403. URL http://dx.doi.org/10.1111/ rssb.12403. 10

work page doi:10.1111/rssb.12403

[15] [15]

GloptiPoly 3: moments, optimization and semidefinite programming

D. Henrion, J. B. Lasserre, and J. Lofberg. Gloptipoly 3: moments, optimization and semidefinite programming, 2007. URLhttps://arxiv.org/abs/0709.2559

work page internal anchor Pith review Pith/arXiv arXiv 2007

[16] [16]

Hettich and K

R. Hettich and K. O. Kortanek. Semi-infinite programming: Theory, methods, and applications. SIAM Review, 35(3):380–429, Sept. 1993. ISSN 1095-7200. doi: 10.1137/1035089. URL http://dx.doi.org/10.1137/1035089

work page doi:10.1137/1035089 1993

[17] [17]

Khanfer.Theory of Sobolev Spaces, page 133–237

A. Khanfer.Theory of Sobolev Spaces, page 133–237. Springer Nature Singapore, 2024. ISBN 9789819937882. doi: 10.1007/978-981-99-3788-2_3. URL http://dx.doi.org/10.1007/ 978-981-99-3788-2_3

work page doi:10.1007/978-981-99-3788-2_3 2024

[18] [18]

Kondor and J

R. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete structures.Proceedings of the Nineteenth International Conference on Machine Learning, 11, 04 2002

work page 2002

[19] [19]

Kozdoba, B

M. Kozdoba, B. Perets, and S. Mannor. Sobolev space regularised pre density models. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 25494–25533. PMLR, 21–27 Jul 2024. UR...

work page 2024

[20] [20]

Lei and W

L. Lei and W. Fithian. Adapt: An interactive procedure for multiple testing with side information,

work page

[21] [21]

URLhttps://arxiv.org/abs/1609.06035

work page internal anchor Pith review Pith/arXiv arXiv

[22] [22]

López and G

M. López and G. Still. Semi-infinite programming.European Journal of Operational Research, 180(2):491–518, July 2007. ISSN 0377-2217. doi: 10.1016/j.ejor.2006.08.045. URL http: //dx.doi.org/10.1016/j.ejor.2006.08.045

work page doi:10.1016/j.ejor.2006.08.045 2007

[23] [23]

J. R. Munkres.Analysis on Manifolds. CRC Press, Feb. 2018. ISBN 9780429494147. doi: 10.1201/9780429494147. URLhttp://dx.doi.org/10.1201/9780429494147

work page doi:10.1201/9780429494147 2018

[24] [24]

J. O. Royset, E. Polak, and A. Kiureghian. Adaptive approximations and exact penalization for the solution of generalized semi-infinite min-max problems.SIAM Journal on Optimization, 14(1):1–34, Jan. 2003. ISSN 1095-7189. doi: 10.1137/s1052623402406777. URL http: //dx.doi.org/10.1137/S1052623402406777

work page doi:10.1137/s1052623402406777 2003

[25] [25]

Sarkar.Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane, page 355–366

R. Sarkar.Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane, page 355–366. Springer Berlin Heidelberg, 2012. ISBN 9783642258787. doi: 10.1007/978-3-642-25878-7_34. URLhttp://dx.doi.org/10.1007/978-3-642-25878-7_34

work page doi:10.1007/978-3-642-25878-7_34 2012

[26] [26]

Schölkopf, R

B. Schölkopf, R. Herbrich, and A. J. Smola.A Generalized Representer Theorem, page 416–426. Springer Berlin Heidelberg, 2001. ISBN 9783540445814. doi: 10.1007/3-540-44581-1_27. URLhttp://dx.doi.org/10.1007/3-540-44581-1_27

work page doi:10.1007/3-540-44581-1_27 2001

[27] [27]

Schwartzman and X

A. Schwartzman and X. Lin. The effect of correlation in false discovery rate estimation. Biometrika, 98(1):199–214, Feb. 2011. ISSN 1464-3510. doi: 10.1093/biomet/asq075. URL http://dx.doi.org/10.1093/biomet/asq075

work page doi:10.1093/biomet/asq075 2011

[28] [28]

A. J. Smola and R. Kondor.Kernels and Regularization on Graphs, page 144–158. Springer Berlin Heidelberg, 2003. ISBN 9783540451679. doi: 10.1007/978-3-540-45167-9_12. URL http://dx.doi.org/10.1007/978-3-540-45167-9_12

work page doi:10.1007/978-3-540-45167-9_12 2003

[29] [29]

G. Still. Generalized semi-infinite programming: numerical aspects.Optimization, 49(3): 223–242, Jan. 2001. ISSN 1029-4945. doi: 10.1080/02331930108844531. URL http: //dx.doi.org/10.1080/02331930108844531

work page doi:10.1080/02331930108844531 2001

[30] [30]

Sun and T

W. Sun and T. T. Cai. Oracle and adaptive compound decision rules for false discovery rate control.Journal of the American Statistical Association, 102(479):901–912, 2007. ISSN 01621459. URLhttp://www.jstor.org/stable/27639933

work page arXiv 2007

[31] [31]

Nucleic Acids Research51(D1), D638–D646 (2022)

D. Szklarczyk, R. Kirsch, M. Koutrouli, K. Nastou, F. Mehryary, R. Hachilif, A. L. Gable, T. Fang, N. Doncheva, S. Pyysalo, P. Bork, L. Jensen, and C. von Mering. The string database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest.Nucleic Acids Research, 51(D1):D638–D646, Nov. 2022. ISS...

work page doi:10.1093/nar/gkac1000 2023

[32] [32]

False discovery rate smoothing

W. Tansey, O. Koyejo, R. A. Poldrack, and J. G. Scott. False discovery rate smoothing. 2016. URLhttps://arxiv.org/abs/1411.6144

work page internal anchor Pith review Pith/arXiv arXiv 2016

[33] [33]

R. J. Tibshirani and J. Taylor. The solution path of the generalized lasso.The Annals of Statistics, 39(3), June 2011. ISSN 0090-5364. doi: 10.1214/11-aos878. URL http://dx.doi.org/10. 1214/11-AOS878

work page doi:10.1214/11-aos878 2011

[34] [34]

Tripathi, S

S. Tripathi, S. Moutari, M. Dehmer, and F. Emmert-Streib. Comparison of module detection algorithms in protein networks and investigation of the biological meaning of predicted modules. BMC Bioinformatics, 17(1), Mar. 2016. ISSN 1471-2105. doi: 10.1186/s12859-016-0979-8. URLhttp://dx.doi.org/10.1186/s12859-016-0979-8

work page doi:10.1186/s12859-016-0979-8 2016

[35] [35]

P. Wang, P. Yan, and C. Li. Straw: Structure-adaptive weighting procedure for large-scale spatial multiple testing. 2023. URLhttps://arxiv.org/abs/2309.15699

work page arXiv 2023

[36] [36]

F. Xia, M. J. Zhang, J. Zou, and D. Tse. Neuralfdr: Learning discovery thresholds from hypothesis features, 2017. URLhttps://arxiv.org/abs/1711.01312

work page internal anchor Pith review Pith/arXiv arXiv 2017

[37] [37]

select the largest R⊆T such that 1 |R| P i∈R lfdrmarg(pi)≤q

M. J. Zhang, F. Xia, and J. Zou. Fast and covariate-adaptive method amplifies detection power in large-scale multiple hypothesis testing.Nature Communications, 10(1), July 2019. ISSN 2041-1723. doi: 10.1038/s41467-019-11247-0. URL http://dx.doi.org/10.1038/ s41467-019-11247-0. 12 A Proof of Marginal Density Independence Proposition(Restated. Marginal Dens...

work page doi:10.1038/s41467-019-11247-0 2019

[38] [38]

capacity

regarding hyperbolic spaces applies. Briefly, Sarkar’s Theorem states that any tree withn nodes can be embedded into the 2-dimensional hyperbolic space (Poincaré disk H2) with arbitrarily low distortion (1+ϵ) for any ϵ >0 .This dimension efficiency is crucial for practical application, but more fundamentally, it addresses a geometric incompatibility. Cons...

work page

[39] [39]

Too much tail allocation (ρ2 >0.6 ) introduces excessive gradient noise; too little (ρ2 <0.3 ) misses violations

Sampling Sensitivity:The mixing weights (ρ1, ρ2, ρ3) require problem-specific tuning. Too much tail allocation (ρ2 >0.6 ) introduces excessive gradient noise; too little (ρ2 <0.3 ) misses violations

work page

[40] [40]

Scott’s rule often oversmooths ford≥5; cross-validation is expensive

Bandwidth Selection:KDE bandwidth h critically affects tail sampling quality. Scott’s rule often oversmooths ford≥5; cross-validation is expensive

work page

[41] [41]

Aggres- sive schedules (β >1.5 ) cause convergence failure; conservative schedules (β <1.1 ) waste computation

Barrier Schedule:The growth rate β and starting value ν0 require careful tuning. Aggres- sive schedules (β >1.5 ) cause convergence failure; conservative schedules (β <1.1 ) waste computation

work page

[42] [42]

Gradient Variance:Even with M= 10 4 samples, Monte Carlo variance in the barrier gradient necessitates small learning rates (η∼10 −3), requiring hundreds of iterations per barrier level

work page

[43] [43]

Multiple random initializations are required, multiplying computational cost

Non-Convexity Persists:The method still optimizes the non-convex mixture likelihood, inheriting all local minima issues. Multiple random initializations are required, multiplying computational cost

work page

[44] [44]

Fisher floor

Numerical Instability:Near constraint boundaries ( α≈0 or α≈1 ), the terms 1/α and 1/(1−α) become numerically unstable, requiring careful regularization ( α← clip(α,10 −8,1−10 −8)). J Practical Cross-Validation Procedure In our experiments (Section 8 in main text), we employ the following procedure: Algorithm 2Hyperparameter Selection via Cross-Validation...

work page