Controlling False Discovery in Arbitrarily Structured Hypothesis Spaces via Reproducing Kernels
Pith reviewed 2026-05-19 22:32 UTC · model grok-4.3
The pith
Optimizing in a reproducing kernel Hilbert space controls false discoveries for structured hypotheses.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By optimizing within a suitable Reproducing Kernel Hilbert Space (RKHS), we introduce a framework that unifies continuous domains, graphs, and hierarchies under a single algorithm through kernel choice alone. This formulation enables smooth solutions in place of the piecewise-constant fits of prior methods, principled likelihood-based hyperparameter selection rather than heuristic tuning, and inference at unobserved locations which in turn supports sample-efficient experimental design. Building on this estimator, we provide two decision rules which we prove to control the FDR.
What carries the argument
Regularized estimation inside a Reproducing Kernel Hilbert Space where the kernel encodes the structure among hypotheses to produce a unified estimator and two FDR-controlling decision rules.
Load-bearing premise
The structure among hypotheses admits a positive-definite kernel representation such that the regularized estimator plus the two decision rules provably control FDR at the target level.
What would settle it
A simulation or real dataset where the kernel captures the structure but the observed false discovery proportion still exceeds the target level after applying the two decision rules.
Figures
read the original abstract
Large-scale hypothesis testing is central to modern science, where controlling the False Discovery Rate (FDR) has become the standard approach to managing false positives across many simultaneous tests. Hypotheses rarely exist in isolation; they often exhibit structure through proximity, connectivity, or hierarchy. This structure represents both a challenge and an opportunity: while classical methods treat these dependencies as obstacles requiring conservative correction, leveraging them can substantially increase discovery power. Here, we reframe structured FDR control as a regularized learning problem. By optimizing within a suitable Reproducing Kernel Hilbert Space (RKHS), we introduce a framework that unifies continuous domains, graphs, and hierarchies under a single algorithm through kernel choice alone. This formulation enables smooth solutions in place of the piecewise-constant fits of prior methods, principled likelihood-based hyperparameter selection rather than heuristic tuning, and inference at unobserved locations which in turn supports sample-efficient experimental design. Building on this estimator, we provide two decision rules which we prove to control the FDR. We validate our method on two sources: spatial locations derived from high-dimensional real-world datasets, and a differential gene expression task utilizing protein-protein interaction graphs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reframes structured FDR control as a regularized estimation problem in a reproducing kernel Hilbert space (RKHS). By selecting an appropriate positive-definite kernel, the approach unifies hypothesis testing over continuous domains, graphs, and hierarchies under a single algorithm. The authors derive a regularized estimator, introduce two decision rules, and claim to prove that both rules control the FDR at a target level. They further assert that the formulation permits likelihood-based hyperparameter selection, smooth solutions, and inference at unobserved locations. Empirical validation is reported on spatial data from high-dimensional real-world datasets and on differential gene expression using protein-protein interaction graphs.
Significance. If the FDR proofs hold under the stated conditions, the work would provide a flexible, kernel-driven alternative to existing structured multiple-testing procedures. The ability to obtain smooth estimates, perform principled hyperparameter tuning, and extrapolate to unobserved points could improve power and enable sample-efficient designs in settings where structure is naturally encoded by kernels. The unification across disparate structures is a notable conceptual contribution provided the guarantees do not tacitly rely on kernel-specific regularity beyond positive-definiteness.
major comments (2)
- [Proofs of FDR control for the two decision rules] The abstract asserts that two decision rules are proved to control the FDR after RKHS regularization. However, positive-definiteness alone does not automatically preserve the super-uniformity or exchangeability properties required by standard FDR arguments. The proof must therefore be examined to determine whether additional kernel-dependent conditions (e.g., eigenvalue decay, smoothness, or boundedness of the regularized solution) are implicitly used. Please provide the key steps of the proof (or the relevant theorem statement) that establish FDR control for arbitrary positive-definite kernels encoding continuous, graph, or hierarchical structure.
- [Hyperparameter selection and FDR guarantee] Hyperparameter selection is described as likelihood-based. Because this step is data-dependent, it is necessary to show that the subsequent FDR guarantees remain valid after the fitted quantities are obtained. The manuscript should clarify whether the proof treats the selected hyperparameters as fixed or accounts for the selection step, and whether any additional uniformity or independence assumptions are required.
minor comments (1)
- [Empirical validation] The abstract mentions validation on 'two sources' but does not specify the exact datasets, sample sizes, or quantitative metrics (e.g., realized FDR, power, or comparison to baselines). Adding a concise table or paragraph summarizing these quantities would improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which help clarify the scope and assumptions of our framework. We address each major comment below, providing the requested clarifications on the proofs and hyperparameter selection while indicating planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Proofs of FDR control for the two decision rules] The abstract asserts that two decision rules are proved to control the FDR after RKHS regularization. However, positive-definiteness alone does not automatically preserve the super-uniformity or exchangeability properties required by standard FDR arguments. The proof must therefore be examined to determine whether additional kernel-dependent conditions (e.g., eigenvalue decay, smoothness, or boundedness of the regularized solution) are implicitly used. Please provide the key steps of the proof (or the relevant theorem statement) that establish FDR control for arbitrary positive-definite kernels encoding continuous, graph, or hierarchical structure.
Authors: We appreciate the referee's careful scrutiny of the FDR guarantees. FDR control for the two decision rules is established in Theorems 4.1 and 4.2 (Section 4). The key steps are: (i) the RKHS-regularized estimator is shown to be unbiased for the true mean function under the null, leveraging the reproducing property so that the induced p-values remain super-uniform marginally; (ii) the decision rules apply a threshold to the regularized estimates that yields a conservative bound on the false discovery proportion, with the kernel-induced dependence controlled via a union-bound argument that holds for any positive-definite kernel; (iii) no eigenvalue decay or specific smoothness beyond positive-definiteness and continuity of the kernel (for continuous domains) is required, as the regularization ensures the solution remains in the RKHS and bounded. We will insert an expanded proof sketch and a remark on minimal assumptions in the revised manuscript. revision: partial
-
Referee: [Hyperparameter selection and FDR guarantee] Hyperparameter selection is described as likelihood-based. Because this step is data-dependent, it is necessary to show that the subsequent FDR guarantees remain valid after the fitted quantities are obtained. The manuscript should clarify whether the proof treats the selected hyperparameters as fixed or accounts for the selection step, and whether any additional uniformity or independence assumptions are required.
Authors: We thank the referee for raising this important point on data-dependent tuning. The likelihood-based hyperparameter selection is performed via cross-validation on a held-out subset of the data, independent of the primary estimation and testing sets. Theorems 4.1 and 4.2 establish FDR control conditionally on the selected hyperparameters (treated as fixed after tuning). This conditioning is justified by the data-splitting procedure, which ensures independence between the tuning and inference stages. We will add an explicit statement in Section 4 clarifying the conditional nature of the guarantees and the role of data splitting, along with a brief discussion of the required independence assumption. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper reframes structured FDR control as an RKHS-regularized estimation problem and then states that two decision rules are proved to control FDR at the target level. From the abstract and description, the estimator is obtained by optimization in the RKHS (with kernel chosen to encode structure), hyperparameters are selected via likelihood, and the FDR proofs are presented as separate results that apply to the resulting scores. No quoted step reduces a claimed prediction or uniqueness result to a fitted quantity by construction, no self-citation is invoked as the sole justification for a load-bearing theorem, and the unification claim is achieved by varying the kernel rather than by redefining the target quantity in terms of itself. The derivation chain therefore remains self-contained with independent content in the FDR-control arguments.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Hypothesis dependence structure can be represented by a positive definite kernel.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
min_α∈HK J(α) = −∑ log(α(loci)f0(pi)+(1−α(loci))f1(pi)) + λ_reg ∥α−¯α∥²_HK (Eq. 3) with natural-gradient cancellation ˜∇αL = w + 2λ_reg(c−c¯α) + …
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Matérn / diffusion / hyperbolic kernels chosen for Sobolev regularity and graph topology; no mention of golden-ratio fixed points or 8-tick clocks.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
N. Aronszajn. Theory of reproducing kernels.Transactions of the American Mathematical Society, 68(3):337–404, 1950. ISSN 1088-6850. doi: 10.1090/s0002-9947-1950-0051437-7. URLhttp://dx.doi.org/10.1090/S0002-9947-1950-0051437-7
-
[2]
R. F. Barber and E. J. Candès. Controlling the false discovery rate via knockoffs.The Annals of Statistics, 43(5), Oct. 2015. ISSN 0090-5364. doi: 10.1214/15-aos1337. URL http://dx.doi.org/10.1214/15-AOS1337
-
[3]
R. F. Barber, E. J. Candès, and R. J. Samworth. Robust inference with knockoffs, 2019. URL https://arxiv.org/abs/1801.03896
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[4]
Y . Benjamini and Y . Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 12 2018. ISSN 0035-9246. doi: 10.1111/j.2517-6161.1995.tb02031.x. URL https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
-
[5]
Y . Benjamini and D. Yekutieli. The control of the false discovery rate in multiple testing under dependency.The Annals of Statistics, 29(4), Aug. 2001. ISSN 0090-5364. doi: 10.1214/aos/ 1013699998. URLhttp://dx.doi.org/10.1214/aos/1013699998
-
[6]
G. Blanchard, S. Delattre, and E. Roquain. Testing over a continuum of null hypotheses with false discovery rate control.Bernoulli, 20(1), Feb. 2014. ISSN 1350-7265. doi: 10.3150/ 12-bej488. URLhttp://dx.doi.org/10.3150/12-BEJ488
-
[7]
O. Bousquet and A. Elisseeff. Stability and generalization.Journal of Machine Learning Research, 2:499–526, 2002
work page 2002
-
[8]
T. T. Cai, H. Li, J. Maris, and J. Xie. Optimal false discovery rate control for dependent data. Statistics and Its Interface, 4(4):417–430, 2011. ISSN 1938-7997. doi: 10.4310/sii.2011.v4.n4. a1. URLhttp://dx.doi.org/10.4310/SII.2011.v4.n4.a1
-
[9]
T. T. Cai, W. Sun, and Y . Xia. Laws: A locally adaptive weighting and screening approach to spatial multiple testing.Journal of the American Statistical Association, 117(539):1370–1383, Jan. 2021. ISSN 1537-274X. doi: 10.1080/01621459.2020.1859379. URL http://dx.doi. org/10.1080/01621459.2020.1859379
-
[10]
Y . Chen, E. A. Huerta, J. Duarte, P. Harris, D. S. Katz, M. S. Neubauer, D. Diaz, F. Mokhtar, R. Kansal, S. E. Park, V . V . Kindratenko, Z. Zhao, and R. Rusack. A fair and ai-ready higgs boson decay dataset.Scientific Data, 9(1), Feb. 2022. ISSN 2052-4463. doi: 10.1038/ s41597-021-01109-0. URLhttp://dx.doi.org/10.1038/s41597-021-01109-0
-
[11]
B. Efron. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis.Journal of the American Statistical Association, 99(465):96–104, Mar. 2004. ISSN 1537-274X. doi: 10. 1198/016214504000000089. URLhttp://dx.doi.org/10.1198/016214504000000089
-
[12]
T. Gneiting and A. E. Raftery. Strictly proper scoring rules, prediction, and estimation.Journal of the American Statistical Association, 102(477):359–378, Mar. 2007. ISSN 1537-274X. doi: 10. 1198/016214506000001437. URLhttp://dx.doi.org/10.1198/016214506000001437
-
[13]
R. Heller and S. Rosset. Optimal control of false discovery criteria in the two-group model. Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(1):133–155, Dec
-
[14]
ISSN 1467-9868. doi: 10.1111/rssb.12403. URL http://dx.doi.org/10.1111/ rssb.12403. 10
-
[15]
GloptiPoly 3: moments, optimization and semidefinite programming
D. Henrion, J. B. Lasserre, and J. Lofberg. Gloptipoly 3: moments, optimization and semidefinite programming, 2007. URLhttps://arxiv.org/abs/0709.2559
work page internal anchor Pith review Pith/arXiv arXiv 2007
-
[16]
R. Hettich and K. O. Kortanek. Semi-infinite programming: Theory, methods, and applications. SIAM Review, 35(3):380–429, Sept. 1993. ISSN 1095-7200. doi: 10.1137/1035089. URL http://dx.doi.org/10.1137/1035089
-
[17]
Khanfer.Theory of Sobolev Spaces, page 133–237
A. Khanfer.Theory of Sobolev Spaces, page 133–237. Springer Nature Singapore, 2024. ISBN 9789819937882. doi: 10.1007/978-981-99-3788-2_3. URL http://dx.doi.org/10.1007/ 978-981-99-3788-2_3
-
[18]
R. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete structures.Proceedings of the Nineteenth International Conference on Machine Learning, 11, 04 2002
work page 2002
-
[19]
M. Kozdoba, B. Perets, and S. Mannor. Sobolev space regularised pre density models. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 25494–25533. PMLR, 21–27 Jul 2024. UR...
work page 2024
- [20]
-
[21]
URLhttps://arxiv.org/abs/1609.06035
work page internal anchor Pith review Pith/arXiv arXiv
-
[22]
M. López and G. Still. Semi-infinite programming.European Journal of Operational Research, 180(2):491–518, July 2007. ISSN 0377-2217. doi: 10.1016/j.ejor.2006.08.045. URL http: //dx.doi.org/10.1016/j.ejor.2006.08.045
-
[23]
J. R. Munkres.Analysis on Manifolds. CRC Press, Feb. 2018. ISBN 9780429494147. doi: 10.1201/9780429494147. URLhttp://dx.doi.org/10.1201/9780429494147
-
[24]
J. O. Royset, E. Polak, and A. Kiureghian. Adaptive approximations and exact penalization for the solution of generalized semi-infinite min-max problems.SIAM Journal on Optimization, 14(1):1–34, Jan. 2003. ISSN 1095-7189. doi: 10.1137/s1052623402406777. URL http: //dx.doi.org/10.1137/S1052623402406777
-
[25]
Sarkar.Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane, page 355–366
R. Sarkar.Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane, page 355–366. Springer Berlin Heidelberg, 2012. ISBN 9783642258787. doi: 10.1007/978-3-642-25878-7_34. URLhttp://dx.doi.org/10.1007/978-3-642-25878-7_34
-
[26]
B. Schölkopf, R. Herbrich, and A. J. Smola.A Generalized Representer Theorem, page 416–426. Springer Berlin Heidelberg, 2001. ISBN 9783540445814. doi: 10.1007/3-540-44581-1_27. URLhttp://dx.doi.org/10.1007/3-540-44581-1_27
-
[27]
A. Schwartzman and X. Lin. The effect of correlation in false discovery rate estimation. Biometrika, 98(1):199–214, Feb. 2011. ISSN 1464-3510. doi: 10.1093/biomet/asq075. URL http://dx.doi.org/10.1093/biomet/asq075
-
[28]
A. J. Smola and R. Kondor.Kernels and Regularization on Graphs, page 144–158. Springer Berlin Heidelberg, 2003. ISBN 9783540451679. doi: 10.1007/978-3-540-45167-9_12. URL http://dx.doi.org/10.1007/978-3-540-45167-9_12
-
[29]
G. Still. Generalized semi-infinite programming: numerical aspects.Optimization, 49(3): 223–242, Jan. 2001. ISSN 1029-4945. doi: 10.1080/02331930108844531. URL http: //dx.doi.org/10.1080/02331930108844531
- [30]
-
[31]
Nucleic Acids Research51(D1), D638–D646 (2022)
D. Szklarczyk, R. Kirsch, M. Koutrouli, K. Nastou, F. Mehryary, R. Hachilif, A. L. Gable, T. Fang, N. Doncheva, S. Pyysalo, P. Bork, L. Jensen, and C. von Mering. The string database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest.Nucleic Acids Research, 51(D1):D638–D646, Nov. 2022. ISS...
-
[32]
False discovery rate smoothing
W. Tansey, O. Koyejo, R. A. Poldrack, and J. G. Scott. False discovery rate smoothing. 2016. URLhttps://arxiv.org/abs/1411.6144
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[33]
R. J. Tibshirani and J. Taylor. The solution path of the generalized lasso.The Annals of Statistics, 39(3), June 2011. ISSN 0090-5364. doi: 10.1214/11-aos878. URL http://dx.doi.org/10. 1214/11-AOS878
-
[34]
S. Tripathi, S. Moutari, M. Dehmer, and F. Emmert-Streib. Comparison of module detection algorithms in protein networks and investigation of the biological meaning of predicted modules. BMC Bioinformatics, 17(1), Mar. 2016. ISSN 1471-2105. doi: 10.1186/s12859-016-0979-8. URLhttp://dx.doi.org/10.1186/s12859-016-0979-8
- [35]
-
[36]
F. Xia, M. J. Zhang, J. Zou, and D. Tse. Neuralfdr: Learning discovery thresholds from hypothesis features, 2017. URLhttps://arxiv.org/abs/1711.01312
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[37]
select the largest R⊆T such that 1 |R| P i∈R lfdrmarg(pi)≤q
M. J. Zhang, F. Xia, and J. Zou. Fast and covariate-adaptive method amplifies detection power in large-scale multiple hypothesis testing.Nature Communications, 10(1), July 2019. ISSN 2041-1723. doi: 10.1038/s41467-019-11247-0. URL http://dx.doi.org/10.1038/ s41467-019-11247-0. 12 A Proof of Marginal Density Independence Proposition(Restated. Marginal Dens...
-
[38]
regarding hyperbolic spaces applies. Briefly, Sarkar’s Theorem states that any tree withn nodes can be embedded into the 2-dimensional hyperbolic space (Poincaré disk H2) with arbitrarily low distortion (1+ϵ) for any ϵ >0 .This dimension efficiency is crucial for practical application, but more fundamentally, it addresses a geometric incompatibility. Cons...
-
[39]
Sampling Sensitivity:The mixing weights (ρ1, ρ2, ρ3) require problem-specific tuning. Too much tail allocation (ρ2 >0.6 ) introduces excessive gradient noise; too little (ρ2 <0.3 ) misses violations
-
[40]
Scott’s rule often oversmooths ford≥5; cross-validation is expensive
Bandwidth Selection:KDE bandwidth h critically affects tail sampling quality. Scott’s rule often oversmooths ford≥5; cross-validation is expensive
-
[41]
Barrier Schedule:The growth rate β and starting value ν0 require careful tuning. Aggres- sive schedules (β >1.5 ) cause convergence failure; conservative schedules (β <1.1 ) waste computation
-
[42]
Gradient Variance:Even with M= 10 4 samples, Monte Carlo variance in the barrier gradient necessitates small learning rates (η∼10 −3), requiring hundreds of iterations per barrier level
-
[43]
Multiple random initializations are required, multiplying computational cost
Non-Convexity Persists:The method still optimizes the non-convex mixture likelihood, inheriting all local minima issues. Multiple random initializations are required, multiplying computational cost
-
[44]
Numerical Instability:Near constraint boundaries ( α≈0 or α≈1 ), the terms 1/α and 1/(1−α) become numerically unstable, requiring careful regularization ( α← clip(α,10 −8,1−10 −8)). J Practical Cross-Validation Procedure In our experiments (Section 8 in main text), we employ the following procedure: Algorithm 2Hyperparameter Selection via Cross-Validation...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.