arxiv: 2605.01775 · v1 · submitted 2026-05-03 · 📊 stat.ML · cs.LG· stat.ME

Recognition: unknown

A Semi-Supervised Kernel Two-Sample Test

Gyumin Lee , Shubhanshu Shekhar , Ilmun Kim

Authors on Pith no claims yet

Pith reviewed 2026-05-09 17:07 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords semi-supervisedtwo-sample testkernel methodsasymptotic normalitycovariateshypothesis testingpower analysis

0 comments

The pith

A semi-supervised kernel two-sample test uses abundant unlabeled covariates to produce an asymptotically normal statistic that is easy to calibrate and often more powerful than standard kernel tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses two-sample testing when extra unlabeled data on covariates is available but the main samples are limited. Standard kernel tests ignore this covariate information and rely on exchangeability for calibration, which adding covariates can break. The proposed method builds a test statistic that remains asymptotically normal under the null hypothesis of equal distributions, allowing calibration via normal quantiles. This integration of covariates yields higher asymptotic power against alternatives while maintaining consistency for both fixed and local alternatives. Simulations support that the approach improves performance in practice without complicating the calibration step.

Core claim

By incorporating semi-supervised covariate data into a kernel framework, the method constructs a test statistic whose distribution under the null converges to a normal limit, enabling direct use of standard normal critical values, while the resulting procedure attains greater asymptotic power than covariate-ignoring kernel tests and is consistent against fixed and local alternatives.

What carries the argument

A kernel-based test statistic that folds in unlabeled covariate information while preserving asymptotic normality under the null hypothesis of identical distributions.

Load-bearing premise

Adding covariate information to the test statistic still yields asymptotic normality under the null even though it breaks the exchangeability that permutation tests rely on.

What would settle it

A Monte Carlo study under the null with informative covariates in which the empirical distribution of the proposed statistic deviates substantially from normality at sample sizes where the theory predicts convergence.

Figures

Figures reproduced from arXiv: 2605.01775 by Gyumin Lee, Ilmun Kim, Shubhanshu Shekhar.

**Figure 1.** Figure 1: Experimental results for the distribution view at source ↗

**Figure 2.** Figure 2: We consider the case of PV = N(0d, ΣV ) and PW = N(aϵ,j , ΣW ), where aϵ,j ∈ R d has its first j entries equal to ϵ and the rest zero. We let ΣV = ΣW = ρ1d1 ⊤ d + (1 − ρ)Id and obtain {Vi} n1+m1 i=1 by sampling n1 + m1 independent samples from PV . We then construct V = (V ⊤ 1 , . . . , V ⊤ n1 ) ⊤ ∈ R n1×d and obtain a set of n1 labeled samples, X = V · b, where b = (bi) d i=1 ∈ R d with bi = 1 if i belon… view at source ↗

**Figure 2.** Figure 2: Power comparisons across different dependence scenarios. The xssMMD tests, employing various view at source ↗

**Figure 3.** Figure 3: An illustration of the construction of the xssMMD statistic based on the same principles as the general view at source ↗

**Figure 4.** Figure 4: Experimental results for the distribution of view at source ↗

**Figure 5.** Figure 5: Experimental results for the distribution of view at source ↗

**Figure 6.** Figure 6: Experimental results for the distribution of view at source ↗

**Figure 7.** Figure 7: Power analysis of the xssMMD test in various settings. The first two subfigures depict scenarios in view at source ↗

**Figure 8.** Figure 8: An example of data construction when testing coastal birds against grassland birds. Labeled data view at source ↗

**Figure 9.** Figure 9: An example of data construction of images with Gaussian noise of view at source ↗

read the original abstract

We consider the problem of two-sample testing in a semi-supervised setting with abundant unlabeled covariate data. Standard two-sample tests neglect covariate information, which has the potential to significantly boost performance. However, incorporating covariates potentially breaks the exchangeability assumption under the null, which further complicates a calibration procedure. To address these issues, we propose a semi-supervised method that produces a test statistic with asymptotic normality, while effectively integrating additional information from covariates. Our test is straightforward to calibrate due to the asymptotic normality under the null and achieves asymptotic power that is often much higher than existing kernel tests without covariates. Furthermore, we formally show that the proposed method is consistent in power against fixed and local alternatives. Simulations confirm the practical and theoretical strengths of our approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a kernel two-sample test that folds in unlabeled covariates while keeping the statistic asymptotically normal under the null for simple calibration.

read the letter

The main thing here is a semi-supervised extension of kernel two-sample tests that uses extra unlabeled covariate data to increase power. Standard kernel methods ignore those covariates, and the authors adjust the statistic so it remains asymptotically normal even though the covariates break exchangeability. They prove consistency against fixed and local alternatives and report simulations showing higher power than plain kernel tests without covariates. That combination of theory and practical calibration is the useful part if the derivations check out.

Referee Report

1 major / 2 minor

Summary. The paper proposes a semi-supervised kernel two-sample test that incorporates abundant unlabeled covariate data to improve performance over standard kernel tests. It claims the resulting test statistic is asymptotically normal under the null (enabling calibration via normal critical values), achieves higher asymptotic power than existing kernel methods without covariates, and is consistent against both fixed and local alternatives. These properties are supported by theoretical analysis and simulation experiments.

Significance. If the asymptotic normality result holds, the work offers a practical advance for two-sample testing in semi-supervised regimes by allowing covariate information to boost power without requiring complex resampling-based calibration. The consistency proofs and power comparisons would strengthen the case for adopting such methods when unlabeled covariates are available.

major comments (1)

[Abstract and theoretical derivation of the test statistic] The central claim of asymptotic normality under the null (stated in the abstract and presumably derived in the theoretical section) is load-bearing for the entire calibration procedure, power analysis, and consistency results. The manuscript must explicitly verify that the semi-supervised construction preserves the conditions for the CLT (e.g., appropriate centering and variance terms) even though covariates break exchangeability; without this step-by-step check, the normality assertion cannot be confirmed and the claimed advantages over standard kernel tests do not follow.

minor comments (2)

Clarify the precise form of the semi-supervised kernel estimator and any additional assumptions (e.g., on the covariate distribution or kernel bandwidth) needed for the asymptotic results.
In the simulation section, report the exact sample sizes, covariate dimensions, and number of Monte Carlo replications to allow direct replication of the power comparisons.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and for emphasizing the need for explicit verification of the asymptotic normality result. This is central to our claims, and we will revise the manuscript to strengthen the theoretical presentation as requested.

read point-by-point responses

Referee: The central claim of asymptotic normality under the null (stated in the abstract and presumably derived in the theoretical section) is load-bearing for the entire calibration procedure, power analysis, and consistency results. The manuscript must explicitly verify that the semi-supervised construction preserves the conditions for the CLT (e.g., appropriate centering and variance terms) even though covariates break exchangeability; without this step-by-step check, the normality assertion cannot be confirmed and the claimed advantages over standard kernel tests do not follow.

Authors: We agree that an explicit, step-by-step verification of the CLT conditions is required, especially since the unlabeled covariates break exchangeability of the labeled samples under the null. In the original derivation (Section 3), the test statistic is constructed by first estimating the conditional kernel mean embeddings from the abundant unlabeled covariates and then centering the labeled kernel terms accordingly; this yields a sum of conditionally mean-zero terms whose variance converges in probability to a positive constant. To address the referee's concern directly, we will add a new subsection that (i) states the null hypothesis in terms of the covariate-conditional distributions, (ii) verifies the centering removes the bias induced by the covariates, (iii) shows the variance estimator is consistent by law of large numbers on the unlabeled data, and (iv) confirms the Lindeberg condition holds for the triangular array of semi-supervised terms. These additions will make transparent that asymptotic normality is preserved and that the power gains relative to the standard kernel test follow from the reduced variance. We will also ensure the abstract and introduction reference this expanded derivation. revision: yes

Circularity Check

0 steps flagged

No significant circularity: asymptotic normality derived from standard CLT arguments on the semi-supervised statistic

full rationale

The paper proposes a new semi-supervised kernel two-sample statistic that incorporates unlabeled covariates while claiming to retain asymptotic normality under the null (despite broken exchangeability). This normality is established via direct analysis of the statistic's mean and variance under the null, not by redefining the target quantity in terms of itself or by fitting parameters on the same data and relabeling the fit as a prediction. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the derivation chain; the consistency and power results follow from the same limiting distribution without reducing to the input data by construction. The approach is therefore self-contained against external benchmarks such as standard kernel MMD tests.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard asymptotic theory for kernel tests and the assumption that covariates can be integrated without destroying normality under the null.

axioms (2)

domain assumption Kernel functions are positive definite and appropriate for two-sample testing
Implied by the use of kernel methods in the proposal.
domain assumption The constructed test statistic has asymptotic normality under the null hypothesis
This is the key property claimed for calibration and is central to the method.

pith-pipeline@v0.9.0 · 5422 in / 1293 out tokens · 42842 ms · 2026-05-09T17:07:05.898451+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

177 extracted references · 38 canonical work pages · 2 internal anchors

[1]

Semi-supervised inference: General theory and estimation of means

Zhang, Anru and Brown, Lawrence D and Cai, T Tony , journal=. Semi-supervised inference: General theory and estimation of means. 2019 , publisher=

2019
[2]

Journal of Machine Learning Research , volume=

On the Optimality of Gaussian Kernel Based Nonparametric Tests against Smooth Alternatives , author=. Journal of Machine Learning Research , volume=
[3]

Statistical Papers , volume=

Hellinger distances and -entropy in a one-parameter class of density functions , author=. Statistical Papers , volume=. 1989 , publisher=

1989
[4]

Two-sample smooth tests for the equality of distributions , volume =

Zhou, Wen-Xin and Zheng, Chao and Zhang, Zhen , year =. Two-sample smooth tests for the equality of distributions , volume =. Bernoulli , publisher =
[5]

Sutherland and Hsiao-Yu Tung and Heiko Strathmann and Soumyajit De and Aaditya Ramdas and Alex Smola and Arthur Gretton , booktitle=

Danica J. Sutherland and Hsiao-Yu Tung and Heiko Strathmann and Soumyajit De and Aaditya Ramdas and Alex Smola and Arthur Gretton , booktitle=. Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy
[6]

Computational-Statistical Trade-off in Kernel Two-Sample Testing with Random Fourier Features

Choi, Ikjun and Kim, Ilmun , journal=. Computational-Statistical Trade-off in Kernel Two-Sample Testing with Random Fourier Features
[7]

B-test: A non-parametric, low variance kernel two-sample test

Zaremba, Wojciech and Gretton, Arthur and Blaschko, Matthew , journal=. B-test: A non-parametric, low variance kernel two-sample test
[8]

Advances in Neural Information Processing Systems , volume=

Optimal kernel choice for large-scale two-sample tests , author=. Advances in Neural Information Processing Systems , volume=
[9]

The Annals of Statistics , volume=

The projected covariance measure for assumption-lean variable significance testing , author=. The Annals of Statistics , volume=. 2024 , publisher=

2024
[10]

Statistic Surveys , volume=

Methods for quantifying dataset similarity: a review, taxonomy and comparison , author=. Statistic Surveys , volume=. 2024 , publisher=

2024
[11]

Advances in Neural Information Processing Systems , volume=

An efficient doubly-robust test for the kernel treatment effect , author=. Advances in Neural Information Processing Systems , volume=
[12]

R \'e nyi divergence and Kullback-Leibler divergence

Van Erven, Tim and Harremos, Peter , journal=. R \'e nyi divergence and Kullback-Leibler divergence. 2014 , publisher=

2014
[13]

Bobkov, Sergey G and Chistyakov, GP and G. R. The Annals of Probability , volume=. 2019 , publisher=

2019
[14]

2012 , publisher=

Asymptotic methods in statistical decision theory , author=. 2012 , publisher=

2012
[15]

IEEE Transactions on Information Theory , volume=

Minimax rates of entropy estimation on large alphabets via best polynomial approximation , author=. IEEE Transactions on Information Theory , volume=. 2016 , publisher=

2016
[16]

The Berry-Esseen bound for Student's statistic

Bentkus, Vidmantas and G. The Berry-Esseen bound for Student's statistic. The Annals of Probability , volume=. 1996 , publisher=

1996
[17]

On normal approximations to U-statistics

Bentkus, Vidmantas and Jing, Bing-Yi and Zhou, Wang , journal=. On normal approximations to U-statistics. 2009 , publisher=

2009
[18]

, author=

Lower bounds on the smallest eigenvalue of a sample covariance matrix. , author=. Electronic Communications in Probability , volume=. 2014 , publisher=

2014
[19]

p-Norm bounds on the expectation of the maximum of a possibly dependent sample

Arnold, Barry C , journal=. p-Norm bounds on the expectation of the maximum of a possibly dependent sample. 1985 , publisher=

1985
[20]

Biometrika , volume=

High-dimensional semi-supervised learning: in search of optimal inference of the mean , author=. Biometrika , volume=. 2022 , publisher=

2022
[21]

Information Theory: From Coding to Learning

Polyanskiy, Yury and Wu, Yihong , year=. Information Theory: From Coding to Learning
[22]

, author=

The correlation-assisted missing data estimator. , author=. Journal of Machine Learning Research , volume=
[23]

The Annals of Statistics , volume=

A constrained risk inequality with applications to nonparametric functional estimation , author=. The Annals of Statistics , volume=. 1996 , publisher=

1996
[24]

International Conference on Artificial Intelligence and Statistics , pages=

A constrained risk inequality for general losses , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

2021
[25]

Double robust semi-supervised inference for the mean: selection bias under MAR labeling with decaying overlap

Zhang, Yuqian and Chakrabortty, Abhishek and Bradic, Jelena , journal=. Double robust semi-supervised inference for the mean: selection bias under MAR labeling with decaying overlap. 2023 , publisher=

2023
[26]

arXiv preprint arXiv:2306.00265 , year=

Doubly Robust Self-Training , author=. arXiv preprint arXiv:2306.00265 , year=

work page arXiv
[27]

Angelopoulos and Stephen Bates and Clara Fannjiang and Michael I

Anastasios N. Angelopoulos and Stephen Bates and Clara Fannjiang and Michael I. Jordan and Tijana Zrnic , title =. Science , volume =
[28]

High dimensional M-estimation with missing outcomes: A semi-parametric framework

Chakrabortty, Abhishek and Lu, Jiarui and Cai, T Tony and Li, Hongzhe , journal=. High dimensional M-estimation with missing outcomes: A semi-parametric framework
[29]

The Annals of Statistics , number =

Efficient and adaptive linear regression in semi-supervised settings , author =. The Annals of Statistics , number =
[30]

Journal of the American Statistical Association, 1–24 (2024) https://doi.org/10.1080/01621459.2023.2300522

Deng, Siyi and Ning, Yang and Zhao, Jiwei and Zhang, Heping , year =. Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning. doi:10.1080/01621459.2023.2277409 , journal =

work page doi:10.1080/01621459.2023.2277409 2023
[31]

Journal of the American Statistical Association , volume=

Semi-supervised linear regression , author=. Journal of the American Statistical Association , volume=. 2022 , publisher=

2022
[32]

Chakrabortty, Abhishek and Dai, Guorong and Tchetgen, Eric Tchetgen , journal=
[33]

A General M-estimation Theory in Semi-Supervised Framework

Song, Shanshan and Lin, Yuanyuan and Zhou, Yong , journal=. A General M-estimation Theory in Semi-Supervised Framework. 2023 , publisher=

2023
[34]

2018 , institution=

A few notes on contiguity, asymptotics, and local asymptotic normality , author=. 2018 , institution=

2018
[35]

Asymptotic Statistics

van der Vaart, Aad W , volume=. Asymptotic Statistics. 2000 , publisher=

2000
[36]

Semiparametric statistics

Van der Vaart, Aad W , pages=. Semiparametric statistics. 2002 , journal=

2002
[37]

Semiparametric Doubly Robust Targeted Double Machine Learning: A Review

Semiparametric doubly robust targeted double machine learning: a review , author=. arXiv preprint arXiv:2203.06469 , year=

work page arXiv
[38]

Lee, A. J. , year=. U-statistics: Theory and Practice
[39]

Wassily Hoeffding , journal =
[40]

CLT for U-statistics with growing dimension

DiCiccio, Cyrus and Romano, Joseph , journal=. CLT for U-statistics with growing dimension
[41]

The Econometrics Journal , volume=

Double/debiased machine learning foratment and structural parameters , author=. The Econometrics Journal , volume=. 2018 , page=

2018
[42]

Proceedings of the National Academy of Sciences , volume=

Universal inference , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=

2020
[43]

, year =

Kennedy, Edward H. , year =. Electronic Journal of Statistics , publisher =
[44]

V. S. Koroljuk and Yu. V. Borovskich , title = ". 1994 , publisher =. doi:10.1007/978-94-017-3515-5 , url =

work page doi:10.1007/978-94-017-3515-5 1994
[45]

2004 , publisher =

Larry Wasserman , title =. 2004 , publisher =. doi:10.1007/978-0-387-21736-9 , url =

work page doi:10.1007/978-0-387-21736-9 2004
[46]

Applications of the van Trees inequality: a Bayesian Cram \'e r-Rao bound

Gill, Richard D and Levit, Boris Y , journal=. Applications of the van Trees inequality: a Bayesian Cram \'e r-Rao bound. 1995 , publisher=

1995
[47]

2010 , publisher=

Asymptotic theory for cross-validated targeted maximum likelihood estimation , author=. 2010 , publisher=

2010
[48]

On a family of distributions attaining the Bhattacharyya bound

Tanaka, Hidekazu and Akahira, Masafumi , journal=. On a family of distributions attaining the Bhattacharyya bound. 2003 , publisher=

2003
[49]

On some analogues of the amount of information and their use in statistical estimation , author=. Sankhy. 1946 , publisher=

1946
[50]

2006 , publisher=

Theory of point estimation , author=. 2006 , publisher=

2006
[51]

The Journal of Machine Learning Research , volume=

Quantifying uncertainty in random forests via confidence intervals and hypothesis tests , author=. The Journal of Machine Learning Research , volume=. 2016 , publisher=

2016
[52]

V-statistics and variance estimation

Zhou, Zhengze and Mentch, Lucas and Hooker, Giles , journal=. V-statistics and variance estimation. 2021 , publisher=

2021
[53]

Jackknifing U-statistics

Arvesen, James N , journal=. Jackknifing U-statistics. 1969 , publisher=

1969
[54]

A Distribution-Free Theory of Nonparametric Regression

L. A Distribution-Free Theory of Nonparametric Regression. 2002 , publisher =. doi:10.1007/b97848 , url =

work page doi:10.1007/b97848 2002
[55]

Nonparametric regression using deep neural networks with ReLU activation function , volume=

Johannes Schmidt-Hieber , title = ". 2020 , month = aug, publisher =. doi:10.1214/19-aos1875 , url =

work page doi:10.1214/19-aos1875 2020
[56]

arXiv preprint arXiv:2102.12034 (accepted to Biometrika) , year=

Semiparametric counterfactual density estimation , author=. arXiv preprint arXiv:2102.12034 (accepted to Biometrika) , year=

work page arXiv
[57]

The Journal of Machine Learning Research , volume=

Analysis of a random forests model , author=. The Journal of Machine Learning Research , volume=. 2012 , publisher=

2012
[58]

Semi-supervised regression: A recent review

Kostopoulos, Georgios and Karlos, Stamatis and Kotsiantis, Sotiris and Ragos, Omiros , journal=. Semi-supervised regression: A recent review. 2018 , publisher=

2018
[59]

Downey , title =

Peter J. Downey , title =. 1990 , month = may, publisher =. doi:10.1016/0167-6377(90)90018-z , url =

work page doi:10.1016/0167-6377(90)90018-z 1990
[60]

Rigollet and A

Ph. Rigollet and A. B. Tsybakov , title =. 2007 , month = sep, publisher =. doi:10.3103/s1066530707030052 , url =

work page doi:10.3103/s1066530707030052 2007
[61]

Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003

Optimal rates of aggregation , author=. Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003. Proceedings , pages=. 2003 , organization=

2003
[62]

van der Laan and Eric C Polley and Alan E

Mark J. van der Laan and Eric C Polley and Alan E. Hubbard , title = ". 2007 , month = jan, publisher =. doi:10.2202/1544-6115.1309 , url =

work page doi:10.2202/1544-6115.1309 2007
[63]

Estimating the D imension of a M odel

Herman Callaert and Paul Janssen , title = ". 1978 , month = mar, publisher =. doi:10.1214/aos/1176344132 , url =

work page doi:10.1214/aos/1176344132 1978
[64]

Berry , title = "

Andrew C. Berry , title = ". 1941 , publisher =

1941
[65]

, series=

Esseen, C.G. , series=. On the Liapounoff Limit of Error in the Theory of Probability. 1942 , publisher=

1942
[66]

, year =

Zrnic, Tijana and Candès, Emmanuel J. , year =. Cross-prediction-powered inference , volume =. Proceedings of the National Academy of Sciences , publisher =
[67]

Normal approximation by Stein's method

Chen, Louis HY and Goldstein, Larry and Shao, Qi-Man , volume=. Normal approximation by Stein's method. 2011 , publisher=

2011
[68]

arXiv preprint arXiv:2111.15546 , year=

Black box tests for algorithmic stability , author=. arXiv preprint arXiv:2111.15546 , year=

work page arXiv
[69]

Bagging Provides Assumption-free Stability

Soloff, Jake A and Barber, Rina Foygel and Willett, Rebecca , journal=. Bagging Provides Assumption-free Stability
[70]

Advances in Neural Information Processing Systems , volume=

Debiased machine learning without sample-splitting for stable estimators , author=. Advances in Neural Information Processing Systems , volume=
[71]

Double/de-biased machine learning of global and local parameters using regularized Riesz representers

Chernozhukov, Victor and Newey, W and Robins, James and Singh, Rahul , journal=. Double/de-biased machine learning of global and local parameters using regularized Riesz representers
[72]

2021 , publisher=

Hirshberg, David A and Wager, Stefan , journal=. 2021 , publisher=

2021
[73]

Journal of the American Statistical Association , volume=

A general framework for inference on algorithm-agnostic variable importance , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=

2023
[74]

van der Laan and Daniel Rubin

Mark J. van der Laan and Daniel Rubin , title =". 2006 , month = jan, publisher =. doi:10.2202/1557-4679.1043 , url =

work page doi:10.2202/1557-4679.1043 2006
[75]

Luedtke and Mark J

Alexander R. Luedtke and Mark J. van der Laan , title = ". 2016 , month = apr, publisher =. doi:10.1214/15-aos1384 , url =

work page doi:10.1214/15-aos1384 2016
[76]

Adversarial Estimation of Riesz Representers

Chernozhukov, Victor and Newey, Whitney and Singh, Rahul and Syrgkanis, Vasilis , journal=. Adversarial Estimation of Riesz Representers
[77]

NATO science series sub series iii computer and systems sciences , volume=

Leave-one-out error and stability of learning algorithms with applications , author=. NATO science series sub series iii computer and systems sciences , volume=. 2003 , publisher=

2003
[78]

Cross-Validation and Mean-Square Stability

Kale, Satyen and Kumar, Ravi and Vassilvitskii, Sergei , booktitle=. Cross-Validation and Mean-Square Stability
[79]

Stability and Generalization

Bousquet, Olivier and Elisseeff, Andr. Stability and Generalization. The Journal of Machine Learning Research , volume=. 2002 , publisher=

2002
[80]

Train faster, generalize better: Stability of stochastic gradient descent

Hardt, Moritz and Recht, Ben and Singer, Yoram , booktitle=. Train faster, generalize better: Stability of stochastic gradient descent. 2016 , organization=

2016

Showing first 80 references.