Coupled Training with Privileged Information and Unlabeled Data

Jason M. Klusowski; Jiahao Shi; Omar Hagrass

arxiv: 2605.23268 · v1 · pith:HOYPQXJUnew · submitted 2026-05-22 · 📊 stat.ML · cs.LG

Coupled Training with Privileged Information and Unlabeled Data

Jiahao Shi , Omar Hagrass , Jason M. Klusowski This is my paper

Pith reviewed 2026-05-25 03:44 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords privileged informationjoint trainingtwo-stage trainingunlabeled dataalternating optimizationgeneralization guaranteesdeployment model

0 comments

The pith

Joint training of privileged and deployment models prevents inheriting errors from weak extra data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In many prediction settings, extra information available only at training time can mislead a deployment model if used naively. The common two-stage method first trains on the privileged data then transfers its predictions, which hurts accuracy when that data is noisy. The paper instead couples the two models in a single joint objective so the deployment model incorporates the extra signals only when they improve its own predictions. Theoretical guarantees identify conditions under which this joint approach yields strictly better accuracy than two-stage training. A simple alternating algorithm is shown to optimize the coupled objective even for large high-dimensional models, and experiments confirm it avoids the failures of sequential baselines on both synthetic and real tasks.

Core claim

By optimizing a joint objective over a privileged-information model and a deployment model simultaneously, the method ensures the deployment model benefits from the extra training data only when it genuinely reduces its own error, rather than always inheriting predictions from the first-stage model as occurs in two-stage training.

What carries the argument

The joint objective that couples the two models during training, optimized via a simple alternating algorithm that updates one model while holding the other fixed.

If this is right

Joint training improves deployment accuracy precisely when the privileged information is weak or noisy, avoiding the degradation that two-stage training can cause.
The alternating algorithm provides a scalable way to train the coupled models even when the feature dimension is large.
On synthetic data and real prediction tasks the method consistently outperforms standard two-stage baselines.
Guarantees describe explicit conditions on the strength of the privileged information under which the accuracy gain occurs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same coupling idea could be tested in settings where the privileged signal is a noisy label rather than an extra feature.
One could derive concrete improvement rates by specializing the guarantees to particular noise models on the privileged data.
The alternating procedure might be compared directly to end-to-end gradient methods on the joint objective for very deep networks.

Load-bearing premise

Conditions exist under which the joint objective yields strictly better generalization than two-stage training, and the alternating algorithm reaches a useful point for high-dimensional models.

What would settle it

An experiment on data with deliberately noisy privileged features where the joint method produces lower accuracy than the two-stage baseline would show the claimed improvement does not hold.

Figures

Figures reproduced from arXiv: 2605.23268 by Jason M. Klusowski, Jiahao Shi, Omar Hagrass.

**Figure 1.** Figure 1: Linear Gaussian signal strength. Performance of the proposed method under varying levels of privileged signal strength. were constructed via kernel smoothing and imputation in linear models (Chakrabortty & Cai, 2018), extended to kernel ridge regression (Wang, 2023), and studied in general settings by Xia & Wainwright (2024). From an inferential perspective, mean estimation with SSL data was investigated … view at source ↗

**Figure 2.** Figure 2: Synthetic controls. Test error is E[( ˆ𝑓 (𝑋) − 𝜇(𝑋))2 ]. Coupled training adapts to useful privileged signal, is more stable than Two-Stage under nuisance privileged dimensions, and improves with additional unlabeled data. 4-fold cross-validation protocol on the labeled set only. We compare against the 𝑋-only Baseline, Two-Stage pseudolabeling, a squared-loss generalized distillation baseline, and SVM+ (V… view at source ↗

**Figure 3.** Figure 3: Parkinson’s dataset. Test MSE versus 𝜆. each subject may contribute multiple recordings. Since recording conditions at deployment are less controlled, we treat some acoustic descriptors as privileged features available only during training. Feature split. We partition covariates into deployment features 𝑋 and privileged features 𝑊 to model the fact that some high-fidelity acoustic descriptors are only reli… view at source ↗

**Figure 4.** Figure 4: Bank Marketing dataset. Holdout Brier score versus 𝜆. Cross-validation for 𝜆. We select 𝜆ˆ using 5-fold cross-validation on the labeled set only, splitting by subject to prevent leakage (GroupKFold). In each fold, we train Coupled Training using the fixed unlabeled pool D𝑈 and evaluate the validation MSE of the deployment model ˆ𝑓𝜆 on the held-out labeled fold. We take 𝜆ˆ ∈ arg min𝜆∈Λ MSEval( ˆ𝑓𝜆), where Λ… view at source ↗

**Figure 5.** Figure 5: PneumoniaMNIST. Test AUROC versus 𝜆 for Algorithm 1. 30 epochs. Evaluation. Models output real-valued scores; we apply a sigmoid to obtain probabilities and report test AUROC (primary), along with accuracy at threshold 0.5 and probability MSE against {0, 1} targets. Results [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Synthetic binary classification diagnostic. Test 0–1 error for the cross-entropy analogue of Coupled Training as a function of 𝜆, averaged over seeds {0, 1, 2, 3, 4}. where the subscript −0 excludes the intercept. Given 𝑓 , the rich-view update minimizes ∑︁ 𝑗∈𝑈 CE 𝑝 𝑓 (𝑋𝑗), 𝑝𝑔 (𝑋𝑗 , 𝑊𝑗) + 𝜆 ∑︁ 𝑖∈𝐿 CE 𝑌𝑖 , 𝑝𝑔 (𝑋𝑖 , 𝑊𝑖) + 𝛼𝑔 2 ∥𝛾−0 ∥ 2 2 . We run 5 outer coupled iterations. Each 𝑓 -update and each 𝑔-upda… view at source ↗

read the original abstract

In many prediction problems, we have extra information during training (for example, measurements that are expensive or slow to collect) that will not be available when the model is deployed. A common strategy is to first train a model that uses all training information, then use its predictions on unlabeled examples to train a second model that only uses the inputs available at test time. However, when the extra training-only information is weak or noisy, this Two-Stage approach can mislead the deployment model and even hurt accuracy. We propose a joint training method that learns the two models together, so the deployment model can benefit from the extra information only when it actually helps, instead of inheriting its mistakes. We provide guarantees that describe when joint training improves prediction accuracy and analyze a simple alternating training algorithm for large, high-dimensional models. Experiments on synthetic data and real-world prediction tasks show that our approach avoids these failures and robustly outperforms standard Two-Stage baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Joint training for privileged info avoids two-stage error propagation with some guarantees and experiments, but the conditions look narrow.

read the letter

The paper's core move is to train the privileged-information model and the deployment model together instead of the usual two-stage pipeline. This lets the deployment model use the extra signal only when it actually helps, rather than copying mistakes from a weak privileged model. That framing is the main novelty relative to standard privileged learning or distillation setups. The authors supply conditions under which the joint objective improves accuracy over two-stage training and analyze a simple alternating optimization scheme that they claim scales to high-dimensional cases. Experiments on synthetic data and real tasks are said to show the method avoids the failures that two-stage runs into when the privileged information is noisy. Those pieces are the parts that actually move the needle. The guarantees and the alternating algorithm are the elements that give the work some formal grounding beyond pure heuristics. The experiments are presented as evidence that the approach is robust where the baseline is not. The soft spots are that the improvement conditions are invoked without being spelled out in enough detail to judge their restrictiveness, and the alternating procedure's convergence behavior in high dimensions is only sketched. Without seeing the full proofs or the exact experimental numbers and baselines, it is hard to tell how often the claimed gains materialize in practice. The citation pattern is not visible here, but the central claim does not appear to rest on circular reasoning. This is for people working on supervised learning with extra training-only features, such as sensor data or expensive labels. A reader who needs a practical alternative to two-stage privileged learning would find the comparison and the optimization analysis useful. The work is coherent enough on its own terms to deserve a serious referee, even if the theory section may need tightening.

Referee Report

0 major / 1 minor

Summary. The paper proposes a joint (coupled) training method for prediction problems where privileged information is available only during training. It contrasts this with the standard two-stage approach of first training a privileged model and then using its predictions to supervise a deployment model that uses only test-time inputs. The joint method is designed so the deployment model benefits from the privileged information only when it is helpful, avoiding error propagation from weak or noisy privileged signals. The manuscript claims to provide theoretical guarantees describing when joint training improves accuracy over two-stage training, analyzes a simple alternating optimization algorithm suitable for large high-dimensional models, and reports experiments on synthetic data and real-world tasks showing robust outperformance over two-stage baselines.

Significance. If the claimed guarantees are non-vacuous and the alternating algorithm is shown to converge usefully, the work could offer a principled and practical alternative to two-stage privileged-information methods in settings such as medical imaging or sensor fusion where extra training signals are costly at deployment. The emphasis on conditional use of privileged information and the scaling analysis for high-dimensional models address a recognized limitation of existing distillation-style pipelines.

minor comments (1)

[Abstract] The abstract asserts the existence of guarantees and an analysis of the alternating algorithm but supplies no equations, proof sketches, or even high-level statements of the conditions under which improvement holds; the full manuscript should make these explicit early in the theoretical development so readers can assess the scope of the claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the accurate summary of the manuscript and for noting its potential relevance to applications such as medical imaging and sensor fusion. The recommendation is listed as uncertain, yet the report contains no enumerated major comments or specific criticisms. We therefore provide no point-by-point responses and stand ready to address any additional questions the referee may raise.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract and context provide no equations, derivations, or self-citations that could reduce any claimed guarantee or prediction to its inputs by construction. Claims about joint training improving accuracy and analysis of alternating optimization are stated at a high level without visible load-bearing steps that match the enumerated circularity patterns. This matches the default expectation for papers lacking extractable derivation chains; the result is self-contained against external benchmarks with no evidence of fitted inputs renamed as predictions or ansatzes smuggled via citation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are described in sufficient detail to populate the ledger.

pith-pipeline@v0.9.0 · 5692 in / 1071 out tokens · 23959 ms · 2026-05-25T03:44:08.243344+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 1 internal anchor

[1]

Learning Sparsely Used Overcomplete Dictionaries

Agarwal, A., Anandkumar, A., Jain, P., Netrapalli, P., and Tandon, R. Learning Sparsely Used Overcomplete Dictionaries . In Proceedings of The 27th Conference on Learning Theory, volume 35, pp.\ 123--137. PMLR, 2014

work page 2014
[2]

R., Cohen, A., Dahmen, W., and DeVore, R

Barron, A. R., Cohen, A., Dahmen, W., and DeVore, R. A. Approximation and Learning by Greedy Algorithms . The Annals of Statistics, 36 0 (1): 0 64 -- 94, 2008

work page 2008
[3]

and Cai, T

Chakrabortty, A. and Cai, T. Efficient and Adaptive Linear Regression in Semi-Supervised Settings . The Annals of Statistics, 46 0 (4): 0 1541--1572, 2018

work page 2018
[4]

Chapelle, O., Sch \"o lkopf, B., and Zien, A. (eds.). Semi-Supervised Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2006. ISBN 978-0-262-03358-9

work page 2006
[5]

Chatterji, N. S. and Bartlett, P. L. Alternating minimization for dictionary learning: Local Convergence Guarantees . In Conference on Learning Theory, 2017

work page 2017
[6]

DeVore, R. A. and Temlyakov, V. N. Some remarks on greedy algorithms. Advances in Computational Mathematics, 5 0 (1): 0 173--187, 1996. doi:10.1007/BF02124742

work page doi:10.1007/bf02124742 1996
[7]

Y., Li, S., Narasimhan, B., and Tibshirani, R

Ding, D. Y., Li, S., Narasimhan, B., and Tibshirani, R. Cooperative learning for multiview analysis . Proceedings of the National Academy of Sciences, 119 0 (38): 0 e2202113119, 2022

work page 2022
[8]

A., Varma, P., Chen, V

Fries, J. A., Varma, P., Chen, V. S., et al. Weakly supervised classification of aortic valve malformations using unlabeled cardiac MRI sequences . Nature Communications, 10 0 (1): 0 3111, 2019

work page 2019
[10]

A Distribution-Free Theory of Nonparametric Regression

Gy \"o rfi, L., Kohler, M., Krzy \.z ak, A., and Walk, H. A Distribution-Free Theory of Nonparametric Regression . Springer, 2002

work page 2002
[11]

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

Hastie, T., Tibshirani, R., and Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2nd edition, 2009

work page 2009
[12]

W., Roy, J., and Korkontzelou, C

Hogan, J. W., Roy, J., and Korkontzelou, C. Handling drop-out in longitudinal studies. Statistics in Medicine, 23 0 (9): 0 1455--1497, 2004

work page 2004
[13]

Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction

Hou, J., Guo, Z., and Cai, T. Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction . Journal of Machine Learning Research, 24 0 (265): 0 1--58, 2023

work page 2023
[14]

Transductive Inference for Text Classification Using Support Vector Machines

Joachims, T. Transductive Inference for Text Classification Using Support Vector Machines . In Proceedings of the 16th International Conference on Machine Learning, volume 99, pp.\ 200--209, 1999

work page 1999
[15]

W., Sagawa, S., Marklund, H., et al

Koh, P. W., Sagawa, S., Marklund, H., et al. WILDS: A Benchmark of In-the-Wild Distribution Shifts . In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139, pp.\ 5637--5664. PMLR, 2021

work page 2021
[16]

and Aila, T

Laine, S. and Aila, T. Temporal Ensembling for Semi-Supervised Learning . In International Conference on Learning Representations, 2017

work page 2017
[17]

Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks

Lee, D.-H. Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks . In Workshop : Challenges in Representation Learning, volume 3, pp.\ 896, 2013

work page 2013
[18]

S., Ravikumar, P

Natarajan, N., Dhillon, I. S., Ravikumar, P. K., and Tewari, A. Learning with Noisy Labels . In Advances in Neural Information Processing Systems, volume 26, 2013

work page 2013
[19]

Machine Learning in Medicine

Rajkomar, A., Dean, J., and Kohane, I. Machine Learning in Medicine . New England Journal of Medicine, 380 0 (14): 0 1347--1358, 2019

work page 2019
[20]

Snorkel: Rapid Training Data Creation with Weak Supervision

Ratner, A., Bach, S., Ehrenberg, H., Fries, J., Wu, S., and Ré, C. Snorkel: Rapid Training Data Creation with Weak Supervision . In Proceedings of the VLDB Endowment, volume 11, pp.\ 269, 2017

work page 2017
[21]

Ratner, A. J. et al. Data Programming: Creating Large Training Sets, Quickly . In Advances in Neural Information Processing Systems, volume 29, 2016

work page 2016
[22]

Strength from Weakness: Fast Learning Using Weak Supervision

Robinson, J., Jegelka, S., and Sra, S. Strength from Weakness: Fast Learning Using Weak Supervision . In Proceedings of the 37th International Conference on Machine Learning, pp.\ 8127--8136, 2020

work page 2020
[24]

A Co-Regularization Approach to Semi-Supervised Learning with Multiple Views

Sindhwani, V., Niyogi, P., and Belkin, M. A Co-Regularization Approach to Semi-Supervised Learning with Multiple Views . In Proceedings of the Workshop on Learning with Multiple Views, 22nd ICML, 2005

work page 2005
[25]

Learning from Noisy Labels with Deep Neural Networks: A Survey

Song, H., Kim, M., Park, D., Shin, Y., and Lee, J.-G. Learning from Noisy Labels with Deep Neural Networks: A Survey . IEEE Transactions on Neural Networks and Learning Systems, 34 0 (11): 0 8135--8153, 2022

work page 2022
[27]

Learning Using Privileged Information: Similarity Control and Knowledge Transfer

Vapnik, V., Izmailov, R., et al. Learning Using Privileged Information: Similarity Control and Knowledge Transfer . Journal of Machine Learning Research, 16 0 (1): 0 2023--2049, 2015

work page 2023
[29]

and Wainwright, M

Xia, E. and Wainwright, M. J. Prediction Aided by Surrogate Training . arXiv preprint arXiv:2412.09364, 2024

work page arXiv 2024
[30]

D., and Cai, T

Zhang, A., Brown, L. D., and Cai, T. T. Semi-Supervised Inference: General Theory and Estimation of Means . The Annals of Statistics, 47 0 (5): 0 2538--2566, 2019

work page 2019
[31]

and Bradic, J

Zhang, Y. and Bradic, J. High-dimensional semi-supervised learning: in search of optimal inference of the mean . Biometrika, 109 0 (2): 0 387--403, 2021

work page 2021
[32]

A Comprehensive Survey on Transfer Learning

Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., and He, Q. A Comprehensive Survey on Transfer Learning . Proceedings of the IEEE, 109 0 (1): 0 43--76, 2020

work page 2020
[33]

and Roy, Jason and Korkontzelou, Christina , title =

Hogan, Joseph W. and Roy, Jason and Korkontzelou, Christina , title =. Statistics in Medicine , volume =

work page
[34]

New England Journal of Medicine , volume =

Rajkomar, Alvin and Dean, Jeffrey and Kohane, Isaac , title =. New England Journal of Medicine , volume =

work page
[35]

Proceedings of the IEEE , volume =

Zhuang, Fuzhen and Qi, Zhiyuan and Duan, Keyu and Xi, Dongbo and Zhu, Yongchun and Zhu, Hengshu and Xiong, Hui and He, Qing , title =. Proceedings of the IEEE , volume =

work page
[36]

Proceedings of the 38th International Conference on Machine Learning , editor =

Koh, Pang Wei and Sagawa, Shiori and Marklund, Henrik and others , title =. Proceedings of the 38th International Conference on Machine Learning , editor =

work page
[37]

Xia, Eric and Wainwright, Martin J , journal=

work page
[38]

Journal of Machine Learning Research , year =

Jue Hou and Zijian Guo and Tianxi Cai , title =. Journal of Machine Learning Research , year =

work page
[39]

Barron and Albert Cohen and Wolfgang Dahmen and Ronald A

Andrew R. Barron and Albert Cohen and Wolfgang Dahmen and Ronald A. DeVore , title =. The Annals of Statistics , number =

work page
[40]

Vapnik, Vladimir and Izmailov, Rauf and others , journal=

work page
[41]

2002 , publisher=

Gy. 2002 , publisher=

work page 2002
[42]

2006 , publisher =

Semi-Supervised Learning , editor =. 2006 , publisher =

work page 2006
[43]

Delalleau and others , title =

O. Delalleau and others , title =. International Workshop on Artificial Intelligence and Statistics , year =

work page
[44]

Proceedings of the 22nd International Conference on Machine Learning , pages=

Semi-supervised graph clustering: a kernel approach , author=. Proceedings of the 22nd International Conference on Machine Learning , pages=

work page
[45]

Proceedings of the 16th International Conference on Machine Learning , volume =

Joachims, Thorsten , title =. Proceedings of the 16th International Conference on Machine Learning , volume =. 1999 , pages =

work page 1999
[46]

Laine and T

S. Laine and T. Aila , title =. International Conference on Learning Representations , year =

work page
[47]

Workshop : Challenges in Representation Learning , volume =

Lee, Dong-Hyun , title =. Workshop : Challenges in Representation Learning , volume =. 2013 , pages =

work page 2013
[48]

Chakrabortty and T

A. Chakrabortty and T. Cai , title =. The Annals of Statistics , year =

work page
[49]

Brown and T

Anru Zhang and Lawrence D. Brown and T. Tony Cai , title =. The Annals of Statistics , year =

work page
[50]

Zhang and J

Y. Zhang and J. Bradic , title =. Biometrika , year =

work page
[51]

Wang , title =

K. Wang , title =. arXiv preprint arXiv:2302.10160 , year =

work page arXiv
[52]

G. J. McLachlan and T. Krishnan , title =. 2008 , publisher =

work page 2008
[53]

C. J. Wu , title =. The Annals of Statistics , year =

work page
[54]

Balakrishnan and others , title =

S. Balakrishnan and others , title =. The Annals of Statistics , year =

work page
[55]

J. M. Robins and A. Rotnitzky , title =. Journal of the American Statistical Association , year =

work page
[56]

J. M. Robins and others , title =. Journal of the American Statistical Association , year =

work page
[57]

A. J. Ratner and others , title =. Advances in Neural Information Processing Systems , volume =

work page
[58]

, title =

Ratner, A and Bach, SH and Ehrenberg, H and Fries, J and Wu, S and Ré, C. , title =. Proceedings of the VLDB Endowment , volume =. 2017 , pages =

work page 2017
[59]

Proceedings of the 37th International Conference on Machine Learning , year =

Robinson, Joshua and Jegelka, Stefanie and Sra, Suvrit , title =. Proceedings of the 37th International Conference on Machine Learning , year =

work page
[60]

Advances in Neural Information Processing Systems , volume =

Natarajan, Nagarajan and Dhillon, Inderjit S and Ravikumar, Pradeep K and Tewari, Ambuj , title =. Advances in Neural Information Processing Systems , volume =

work page
[61]

IEEE Transactions on Neural Networks and Learning Systems , year =

Hwanjun Song and Minseok Kim and Dongmin Park and Yooju Shin and Jae-Gil Lee , title =. IEEE Transactions on Neural Networks and Learning Systems , year =

work page
[62]

Advances in Neural Information Processing Systems , volume =

Whitehill, Jacob and Wu, Ting-fan and Bergsma, Jacob and Movellan, Javier and Ruvolo, Paul , title =. Advances in Neural Information Processing Systems , volume =

work page
[63]

Fries, J. A. and Varma, P. and Chen, V. S. and others , title =. Nature Communications , year =

work page
[64]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Jeremy Irvin and Pranav Rajpurkar and Michael Ko and Yifan Yu and others , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2019 , pages =

work page 2019
[65]

NegBio: a high-performance tool for negation and uncertainty detection in radiology reports

Peng, Y., Wang, X., Lu, L., Bagheri, M., Summers, R., & Lu, Z. , title =. arXiv preprint arXiv:1712.05898 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[66]

Mathematical Analysis, Probability and Applications -- Plenary Lectures. 2016

work page 2016
[67]

2018 , url=

Kaito Fujii and Tasuku Soma , journal=. 2018 , url=

work page 2018
[68]

2011 , url=

Abhimanyu Das and David Kempe , journal=. 2011 , url=

work page 2011
[69]

Proceedings of the Workshop on Learning with Multiple Views, 22nd ICML , year =

Vikas Sindhwani and Partha Niyogi and Mikhail Belkin , title =. Proceedings of the Workshop on Learning with Multiple Views, 22nd ICML , year =

work page
[70]

Advances in Neural Information Processing Systems , volume =

Tong Zhang , title =. Advances in Neural Information Processing Systems , volume =

work page
[71]

Chatterji and Peter L

Niladri S. Chatterji and Peter L. Bartlett , title =. Conference on Learning Theory , year =

work page
[72]

2014 , volume =

Agarwal, Alekh and Anandkumar, Animashree and Jain, Prateek and Netrapalli, Praneeth and Tandon, Rashish , booktitle =. 2014 , volume =

work page 2014
[73]

arXiv preprint arXiv:2304.01768 , year =

Simon Ruetz and Karin Schnass , title =. arXiv preprint arXiv:2304.01768 , year =

work page arXiv
[74]

Proceedings of the National Academy of Sciences , volume =

Daisy Yi Ding and Shuangning Li and Balasubramanian Narasimhan and Robert Tibshirani , title =. Proceedings of the National Academy of Sciences , volume =

work page
[75]

2009 , note =

A new learning paradigm: Learning using privileged information , journal =. 2009 , note =. doi:https://doi.org/10.1016/j.neunet.2009.06.042 , author =

work page doi:10.1016/j.neunet.2009.06.042 2009
[76]

and Temlyakov, Vladimir N

DeVore, Ronald A. and Temlyakov, Vladimir N. , title =. Advances in Computational Mathematics , volume =. 1996 , doi =

work page 1996
[77]

2000 , issn =

Operations Research Letters , volume =. 2000 , issn =. doi:https://doi.org/10.1016/S0167-6377(99)00074-7 , author =

work page doi:10.1016/s0167-6377(99)00074-7 2000
[78]

2009 , publisher=

The Elements of Statistical Learning: Data Mining, Inference, and Prediction , author=. 2009 , publisher=

work page 2009

[1] [1]

Learning Sparsely Used Overcomplete Dictionaries

Agarwal, A., Anandkumar, A., Jain, P., Netrapalli, P., and Tandon, R. Learning Sparsely Used Overcomplete Dictionaries . In Proceedings of The 27th Conference on Learning Theory, volume 35, pp.\ 123--137. PMLR, 2014

work page 2014

[2] [2]

R., Cohen, A., Dahmen, W., and DeVore, R

Barron, A. R., Cohen, A., Dahmen, W., and DeVore, R. A. Approximation and Learning by Greedy Algorithms . The Annals of Statistics, 36 0 (1): 0 64 -- 94, 2008

work page 2008

[3] [3]

and Cai, T

Chakrabortty, A. and Cai, T. Efficient and Adaptive Linear Regression in Semi-Supervised Settings . The Annals of Statistics, 46 0 (4): 0 1541--1572, 2018

work page 2018

[4] [4]

Chapelle, O., Sch \"o lkopf, B., and Zien, A. (eds.). Semi-Supervised Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2006. ISBN 978-0-262-03358-9

work page 2006

[5] [5]

Chatterji, N. S. and Bartlett, P. L. Alternating minimization for dictionary learning: Local Convergence Guarantees . In Conference on Learning Theory, 2017

work page 2017

[6] [6]

DeVore, R. A. and Temlyakov, V. N. Some remarks on greedy algorithms. Advances in Computational Mathematics, 5 0 (1): 0 173--187, 1996. doi:10.1007/BF02124742

work page doi:10.1007/bf02124742 1996

[7] [7]

Y., Li, S., Narasimhan, B., and Tibshirani, R

Ding, D. Y., Li, S., Narasimhan, B., and Tibshirani, R. Cooperative learning for multiview analysis . Proceedings of the National Academy of Sciences, 119 0 (38): 0 e2202113119, 2022

work page 2022

[8] [8]

A., Varma, P., Chen, V

Fries, J. A., Varma, P., Chen, V. S., et al. Weakly supervised classification of aortic valve malformations using unlabeled cardiac MRI sequences . Nature Communications, 10 0 (1): 0 3111, 2019

work page 2019

[9] [10]

A Distribution-Free Theory of Nonparametric Regression

Gy \"o rfi, L., Kohler, M., Krzy \.z ak, A., and Walk, H. A Distribution-Free Theory of Nonparametric Regression . Springer, 2002

work page 2002

[10] [11]

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

Hastie, T., Tibshirani, R., and Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2nd edition, 2009

work page 2009

[11] [12]

W., Roy, J., and Korkontzelou, C

Hogan, J. W., Roy, J., and Korkontzelou, C. Handling drop-out in longitudinal studies. Statistics in Medicine, 23 0 (9): 0 1455--1497, 2004

work page 2004

[12] [13]

Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction

Hou, J., Guo, Z., and Cai, T. Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction . Journal of Machine Learning Research, 24 0 (265): 0 1--58, 2023

work page 2023

[13] [14]

Transductive Inference for Text Classification Using Support Vector Machines

Joachims, T. Transductive Inference for Text Classification Using Support Vector Machines . In Proceedings of the 16th International Conference on Machine Learning, volume 99, pp.\ 200--209, 1999

work page 1999

[14] [15]

W., Sagawa, S., Marklund, H., et al

Koh, P. W., Sagawa, S., Marklund, H., et al. WILDS: A Benchmark of In-the-Wild Distribution Shifts . In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139, pp.\ 5637--5664. PMLR, 2021

work page 2021

[15] [16]

and Aila, T

Laine, S. and Aila, T. Temporal Ensembling for Semi-Supervised Learning . In International Conference on Learning Representations, 2017

work page 2017

[16] [17]

Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks

Lee, D.-H. Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks . In Workshop : Challenges in Representation Learning, volume 3, pp.\ 896, 2013

work page 2013

[17] [18]

S., Ravikumar, P

Natarajan, N., Dhillon, I. S., Ravikumar, P. K., and Tewari, A. Learning with Noisy Labels . In Advances in Neural Information Processing Systems, volume 26, 2013

work page 2013

[18] [19]

Machine Learning in Medicine

Rajkomar, A., Dean, J., and Kohane, I. Machine Learning in Medicine . New England Journal of Medicine, 380 0 (14): 0 1347--1358, 2019

work page 2019

[19] [20]

Snorkel: Rapid Training Data Creation with Weak Supervision

Ratner, A., Bach, S., Ehrenberg, H., Fries, J., Wu, S., and Ré, C. Snorkel: Rapid Training Data Creation with Weak Supervision . In Proceedings of the VLDB Endowment, volume 11, pp.\ 269, 2017

work page 2017

[20] [21]

Ratner, A. J. et al. Data Programming: Creating Large Training Sets, Quickly . In Advances in Neural Information Processing Systems, volume 29, 2016

work page 2016

[21] [22]

Strength from Weakness: Fast Learning Using Weak Supervision

Robinson, J., Jegelka, S., and Sra, S. Strength from Weakness: Fast Learning Using Weak Supervision . In Proceedings of the 37th International Conference on Machine Learning, pp.\ 8127--8136, 2020

work page 2020

[22] [24]

A Co-Regularization Approach to Semi-Supervised Learning with Multiple Views

Sindhwani, V., Niyogi, P., and Belkin, M. A Co-Regularization Approach to Semi-Supervised Learning with Multiple Views . In Proceedings of the Workshop on Learning with Multiple Views, 22nd ICML, 2005

work page 2005

[23] [25]

Learning from Noisy Labels with Deep Neural Networks: A Survey

Song, H., Kim, M., Park, D., Shin, Y., and Lee, J.-G. Learning from Noisy Labels with Deep Neural Networks: A Survey . IEEE Transactions on Neural Networks and Learning Systems, 34 0 (11): 0 8135--8153, 2022

work page 2022

[24] [27]

Learning Using Privileged Information: Similarity Control and Knowledge Transfer

Vapnik, V., Izmailov, R., et al. Learning Using Privileged Information: Similarity Control and Knowledge Transfer . Journal of Machine Learning Research, 16 0 (1): 0 2023--2049, 2015

work page 2023

[25] [29]

and Wainwright, M

Xia, E. and Wainwright, M. J. Prediction Aided by Surrogate Training . arXiv preprint arXiv:2412.09364, 2024

work page arXiv 2024

[26] [30]

D., and Cai, T

Zhang, A., Brown, L. D., and Cai, T. T. Semi-Supervised Inference: General Theory and Estimation of Means . The Annals of Statistics, 47 0 (5): 0 2538--2566, 2019

work page 2019

[27] [31]

and Bradic, J

Zhang, Y. and Bradic, J. High-dimensional semi-supervised learning: in search of optimal inference of the mean . Biometrika, 109 0 (2): 0 387--403, 2021

work page 2021

[28] [32]

A Comprehensive Survey on Transfer Learning

Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., and He, Q. A Comprehensive Survey on Transfer Learning . Proceedings of the IEEE, 109 0 (1): 0 43--76, 2020

work page 2020

[29] [33]

and Roy, Jason and Korkontzelou, Christina , title =

Hogan, Joseph W. and Roy, Jason and Korkontzelou, Christina , title =. Statistics in Medicine , volume =

work page

[30] [34]

New England Journal of Medicine , volume =

Rajkomar, Alvin and Dean, Jeffrey and Kohane, Isaac , title =. New England Journal of Medicine , volume =

work page

[31] [35]

Proceedings of the IEEE , volume =

Zhuang, Fuzhen and Qi, Zhiyuan and Duan, Keyu and Xi, Dongbo and Zhu, Yongchun and Zhu, Hengshu and Xiong, Hui and He, Qing , title =. Proceedings of the IEEE , volume =

work page

[32] [36]

Proceedings of the 38th International Conference on Machine Learning , editor =

Koh, Pang Wei and Sagawa, Shiori and Marklund, Henrik and others , title =. Proceedings of the 38th International Conference on Machine Learning , editor =

work page

[33] [37]

Xia, Eric and Wainwright, Martin J , journal=

work page

[34] [38]

Journal of Machine Learning Research , year =

Jue Hou and Zijian Guo and Tianxi Cai , title =. Journal of Machine Learning Research , year =

work page

[35] [39]

Barron and Albert Cohen and Wolfgang Dahmen and Ronald A

Andrew R. Barron and Albert Cohen and Wolfgang Dahmen and Ronald A. DeVore , title =. The Annals of Statistics , number =

work page

[36] [40]

Vapnik, Vladimir and Izmailov, Rauf and others , journal=

work page

[37] [41]

2002 , publisher=

Gy. 2002 , publisher=

work page 2002

[38] [42]

2006 , publisher =

Semi-Supervised Learning , editor =. 2006 , publisher =

work page 2006

[39] [43]

Delalleau and others , title =

O. Delalleau and others , title =. International Workshop on Artificial Intelligence and Statistics , year =

work page

[40] [44]

Proceedings of the 22nd International Conference on Machine Learning , pages=

Semi-supervised graph clustering: a kernel approach , author=. Proceedings of the 22nd International Conference on Machine Learning , pages=

work page

[41] [45]

Proceedings of the 16th International Conference on Machine Learning , volume =

Joachims, Thorsten , title =. Proceedings of the 16th International Conference on Machine Learning , volume =. 1999 , pages =

work page 1999

[42] [46]

Laine and T

S. Laine and T. Aila , title =. International Conference on Learning Representations , year =

work page

[43] [47]

Workshop : Challenges in Representation Learning , volume =

Lee, Dong-Hyun , title =. Workshop : Challenges in Representation Learning , volume =. 2013 , pages =

work page 2013

[44] [48]

Chakrabortty and T

A. Chakrabortty and T. Cai , title =. The Annals of Statistics , year =

work page

[45] [49]

Brown and T

Anru Zhang and Lawrence D. Brown and T. Tony Cai , title =. The Annals of Statistics , year =

work page

[46] [50]

Zhang and J

Y. Zhang and J. Bradic , title =. Biometrika , year =

work page

[47] [51]

Wang , title =

K. Wang , title =. arXiv preprint arXiv:2302.10160 , year =

work page arXiv

[48] [52]

G. J. McLachlan and T. Krishnan , title =. 2008 , publisher =

work page 2008

[49] [53]

C. J. Wu , title =. The Annals of Statistics , year =

work page

[50] [54]

Balakrishnan and others , title =

S. Balakrishnan and others , title =. The Annals of Statistics , year =

work page

[51] [55]

J. M. Robins and A. Rotnitzky , title =. Journal of the American Statistical Association , year =

work page

[52] [56]

J. M. Robins and others , title =. Journal of the American Statistical Association , year =

work page

[53] [57]

A. J. Ratner and others , title =. Advances in Neural Information Processing Systems , volume =

work page

[54] [58]

, title =

Ratner, A and Bach, SH and Ehrenberg, H and Fries, J and Wu, S and Ré, C. , title =. Proceedings of the VLDB Endowment , volume =. 2017 , pages =

work page 2017

[55] [59]

Proceedings of the 37th International Conference on Machine Learning , year =

Robinson, Joshua and Jegelka, Stefanie and Sra, Suvrit , title =. Proceedings of the 37th International Conference on Machine Learning , year =

work page

[56] [60]

Advances in Neural Information Processing Systems , volume =

Natarajan, Nagarajan and Dhillon, Inderjit S and Ravikumar, Pradeep K and Tewari, Ambuj , title =. Advances in Neural Information Processing Systems , volume =

work page

[57] [61]

IEEE Transactions on Neural Networks and Learning Systems , year =

Hwanjun Song and Minseok Kim and Dongmin Park and Yooju Shin and Jae-Gil Lee , title =. IEEE Transactions on Neural Networks and Learning Systems , year =

work page

[58] [62]

Advances in Neural Information Processing Systems , volume =

Whitehill, Jacob and Wu, Ting-fan and Bergsma, Jacob and Movellan, Javier and Ruvolo, Paul , title =. Advances in Neural Information Processing Systems , volume =

work page

[59] [63]

Fries, J. A. and Varma, P. and Chen, V. S. and others , title =. Nature Communications , year =

work page

[60] [64]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Jeremy Irvin and Pranav Rajpurkar and Michael Ko and Yifan Yu and others , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2019 , pages =

work page 2019

[61] [65]

NegBio: a high-performance tool for negation and uncertainty detection in radiology reports

Peng, Y., Wang, X., Lu, L., Bagheri, M., Summers, R., & Lu, Z. , title =. arXiv preprint arXiv:1712.05898 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[62] [66]

Mathematical Analysis, Probability and Applications -- Plenary Lectures. 2016

work page 2016

[63] [67]

2018 , url=

Kaito Fujii and Tasuku Soma , journal=. 2018 , url=

work page 2018

[64] [68]

2011 , url=

Abhimanyu Das and David Kempe , journal=. 2011 , url=

work page 2011

[65] [69]

Proceedings of the Workshop on Learning with Multiple Views, 22nd ICML , year =

Vikas Sindhwani and Partha Niyogi and Mikhail Belkin , title =. Proceedings of the Workshop on Learning with Multiple Views, 22nd ICML , year =

work page

[66] [70]

Advances in Neural Information Processing Systems , volume =

Tong Zhang , title =. Advances in Neural Information Processing Systems , volume =

work page

[67] [71]

Chatterji and Peter L

Niladri S. Chatterji and Peter L. Bartlett , title =. Conference on Learning Theory , year =

work page

[68] [72]

2014 , volume =

Agarwal, Alekh and Anandkumar, Animashree and Jain, Prateek and Netrapalli, Praneeth and Tandon, Rashish , booktitle =. 2014 , volume =

work page 2014

[69] [73]

arXiv preprint arXiv:2304.01768 , year =

Simon Ruetz and Karin Schnass , title =. arXiv preprint arXiv:2304.01768 , year =

work page arXiv

[70] [74]

Proceedings of the National Academy of Sciences , volume =

Daisy Yi Ding and Shuangning Li and Balasubramanian Narasimhan and Robert Tibshirani , title =. Proceedings of the National Academy of Sciences , volume =

work page

[71] [75]

2009 , note =

A new learning paradigm: Learning using privileged information , journal =. 2009 , note =. doi:https://doi.org/10.1016/j.neunet.2009.06.042 , author =

work page doi:10.1016/j.neunet.2009.06.042 2009

[72] [76]

and Temlyakov, Vladimir N

DeVore, Ronald A. and Temlyakov, Vladimir N. , title =. Advances in Computational Mathematics , volume =. 1996 , doi =

work page 1996

[73] [77]

2000 , issn =

Operations Research Letters , volume =. 2000 , issn =. doi:https://doi.org/10.1016/S0167-6377(99)00074-7 , author =

work page doi:10.1016/s0167-6377(99)00074-7 2000

[74] [78]

2009 , publisher=

The Elements of Statistical Learning: Data Mining, Inference, and Prediction , author=. 2009 , publisher=

work page 2009