Corruptions of Supervised Learning Problems: Typology and Mitigations

Laura Iacovissi; Nan Lu; Robert C. Williamson

arxiv: 2307.08643 · v4 · pith:2QTHUPPSnew · submitted 2023-07-17 · 💻 cs.LG · stat.ML

Corruptions of Supervised Learning Problems: Typology and Mitigations

Laura Iacovissi , Nan Lu , Robert C. Williamson This is my paper

Pith reviewed 2026-05-24 07:12 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords data corruptionsupervised learningMarkov kernelBayes riskloss correctionlabel corruptionattribute corruptionunified framework

0 comments

The pith

Markov kernels on data distributions unify all corruptions in supervised learning and enable loss corrections beyond labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a general theory of corruption by modeling every modification to a supervised learning problem as the action of Markov kernels on the underlying probability distributions. This produces a provably exhaustive typology that places existing corruption models under one roof with consistent names. Comparing Bayes risks shows label corruptions change only the loss function while attribute corruptions also change the hypothesis class. The same machinery yields loss-correction formulas for attribute and joint corruptions once the classical correction framework is relaxed to weaker requirements that cover dependent cases.

Core claim

By focusing on changes to the underlying probability distributions via Markov kernels, the approach constructs a provably exhaustive corruption framework that unifies existing models, enables a systematic comparison of Bayes risks in clean and corrupted settings, and supplies loss-correction formulas for attribute and joint corruption under a generalized paradigm with weaker requirements.

What carries the argument

Markov kernels applied to the joint distribution of inputs and labels, which induce the corrupted distributions and thereby determine the altered loss and hypothesis class.

If this is right

Existing corruption models receive a single exhaustive framework and consistent nomenclature.
Label corruptions affect only the loss function while attribute corruptions affect both loss and hypothesis class.
Loss-correction methods extend to dependent corruption types and to attribute and joint cases.
The classical loss-correction paradigm must be replaced by one with weaker requirements.
Bayes-risk comparisons become a systematic tool for predicting corruption consequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The kernel view could guide construction of algorithms that correct attribute corruptions without assuming independence.
Data-collection pipelines could be audited by estimating the effective Markov kernel they apply.
The same distribution-change language might transfer to settings beyond supervised learning where distributions are altered.
Empirical tests could check whether every observed corruption type admits a Markov-kernel representation.

Load-bearing premise

Every modification to a supervised learning problem, including changes to the model class and loss, can be captured by applying Markov kernels solely to the data-generating distributions.

What would settle it

A concrete corruption scenario that alters the learning problem yet cannot be expressed as the result of any Markov kernel acting on the original input-label distribution.

Figures

Figures reproduced from arXiv: 2307.08643 by Laura Iacovissi, Nan Lu, Robert C. Williamson.

**Figure 1.** Figure 1: Hierarchy of partial corruption types. The partial corruption types are hierarchically organized based on their dependence on the instance 𝑋 and label 𝑌 space, as depicted through a tree structure. At the root of the tree lies the most general form of corruption, where the domain and image spaces are the joint one 𝑋 × 𝑌, i.e., 𝐷(𝜅) = 𝐼(𝜅) = 𝑋 × 𝑌. The arrows signify that a child node has its domain or imag… view at source ↗

**Figure 2.** Figure 2: Feasible combinations of partial corruptions. Joint corruptions, i.e. of type 𝜅 : 𝑋 × 𝑌 ⇝ 𝑋 × 𝑌, are obtained by combining two compatible partial corruptions in [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

read the original abstract

Corruption is notoriously widespread in data collection. Despite extensive research, the existing literature predominantly focuses on specific settings and learning scenarios, lacking a unified view of corruption modelization and mitigation. In this work, we develop a general theory of corruption, which incorporates all modifications to a supervised learning problem, including changes in model class and loss. Focusing on changes to the underlying probability distributions via Markov kernels, our approach leads to three novel opportunities. First, it enables the construction of a novel, provably exhaustive corruption framework, distinguishing among different corruption types. This serves to unify existing models and establish a consistent nomenclature. Second, it facilitates a systematic analysis of corruption's consequences on learning tasks, by comparing Bayes risks in the clean and corrupted scenarios. Notably, while label corruptions affect only the loss function, attribute corruptions additionally influence the hypothesis class. Third, building upon these results, we investigate mitigations for various corruption types. We expand existing loss-correction methods for label corruption to handle dependent corruption types. Our findings highlight the necessity to generalize this classical corruption-corrected learning framework to a new paradigm with weaker requirements to encompass more corruption types. We provide such a paradigm as well as loss correction formulas in the attribute and joint corruption cases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Markov kernels on the joint distribution give a unified language for corruption types and extend loss corrections beyond label noise, but the exhaustiveness proof and the claim that kernels alone capture arbitrary changes to the hypothesis class remain unshown.

read the letter

The paper models corruptions in supervised learning as Markov kernels applied to the data distributions. This produces a typology that separates label corruptions from attribute and joint ones, a Bayes-risk comparison showing label noise affects only the loss while attribute noise also affects the hypothesis class, and loss-correction formulas for the attribute and joint cases under a generalized setup with weaker assumptions than classical label-noise corrections. The unification of existing models under one framework and the extension of corrections to dependent and attribute corruptions are the concrete new pieces. The risk distinction is a useful observation that clarifies why some corruptions are harder to mitigate than others. The work is internally consistent in how it derives the typology from the kernel modeling choice rather than from circular definitions. The main soft spot is that the abstract asserts a provably exhaustive framework and supplies the correction formulas without including the actual proof or the explicit constructions. The central modeling assumption—that every modification to the hypothesis class or loss can be realized solely by a kernel on P(X,Y) without extra structure—needs the derivations to be checked, and the stress-test concern about whether kernels suffice for arbitrary changes to H lands as a real question rather than a minor one. If those steps hold in the full text, the framework is a reasonable generalization; if not, the claims shrink to a useful but narrower typology. This is for readers already working on robustness and noisy supervised learning. It shows clear engagement with the label-noise literature and could be worth a reading group to examine the proofs. I would not cite it until the derivations are verified. It deserves peer review so referees can test the exhaustiveness and the correction formulas directly.

Referee Report

4 major / 0 minor

Summary. The manuscript develops a general theory of corruption in supervised learning by modeling all modifications—including to the hypothesis class and loss—via Markov kernels applied to the data-generating joint distribution P(X,Y). This is claimed to yield a provably exhaustive corruption typology that unifies existing models, a Bayes-risk comparison showing label corruptions affect only the loss while attribute corruptions also affect the hypothesis class, and explicit loss-correction formulas for attribute and joint corruptions under a generalized paradigm with weaker requirements than classical label-noise correction.

Significance. If the exhaustiveness claim and the loss-correction derivations hold, the work would supply a unifying modeling language and systematic mitigation approach for a broad range of corruptions, moving the literature beyond case-by-case treatments of label noise or specific attribute shifts. The Bayes-risk distinction between corruption types would be a useful organizing principle for choosing correction strategies.

major comments (4)

[Abstract] Abstract: the central claim that Markov kernels on P(X,Y) alone induce 'all modifications to a supervised learning problem, including changes in model class and loss' is asserted without an explicit construction or proof showing how an arbitrary restriction or expansion of the hypothesis class H is realized solely by the kernel; this construction is load-bearing for both the exhaustiveness and the unification claims.
[Abstract] Abstract: the 'provably exhaustive' corruption framework is announced but no proof of exhaustiveness, no enumeration of the corruption types, and no verification that every possible modification is captured appear in the manuscript; without these the typology cannot be assessed as exhaustive.
[Abstract] Abstract: the loss-correction formulas for attribute and joint corruption cases are promised under a 'generalized paradigm with weaker requirements,' yet neither the paradigm nor the formulas are supplied; these formulas are the concrete output needed to substantiate the mitigation contribution.
[Abstract] Abstract: the Bayes-risk comparison is described qualitatively ('label corruptions affect only the loss function, attribute corruptions additionally influence the hypothesis class') but no explicit risk expressions, no clean-versus-corrupted risk difference, and no derivation showing why the hypothesis class is unaffected by label kernels are given; this distinction is load-bearing for the claimed systematic analysis.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for their careful reading and constructive critique of the abstract. The comments correctly identify that several central claims are asserted at a high level without sufficient explicit support or cross-references in the current manuscript. We agree that these points require clarification and will make the requested additions and revisions to the abstract and body to substantiate the claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that Markov kernels on P(X,Y) alone induce 'all modifications to a supervised learning problem, including changes in model class and loss' is asserted without an explicit construction or proof showing how an arbitrary restriction or expansion of the hypothesis class H is realized solely by the kernel; this construction is load-bearing for both the exhaustiveness and the unification claims.

Authors: We agree the abstract would be strengthened by an explicit pointer to the construction. In the full manuscript (Section 3), we define a corruption as a Markov kernel K acting on the joint P(X,Y) and show that the induced marginals and conditionals can render certain hypotheses in H suboptimal or infeasible under the corrupted measure, thereby realizing effective restrictions or expansions of the hypothesis class without altering H itself. We will revise the abstract to include a one-sentence reference to this construction and the relevant section. revision: yes
Referee: [Abstract] Abstract: the 'provably exhaustive' corruption framework is announced but no proof of exhaustiveness, no enumeration of the corruption types, and no verification that every possible modification is captured appear in the manuscript; without these the typology cannot be assessed as exhaustive.

Authors: The referee is correct that the abstract announces exhaustiveness without supplying the supporting argument or enumeration in the visible text. The manuscript classifies corruptions according to whether the kernel acts on the label marginal, the attribute marginal, or the joint; exhaustiveness follows from the fact that any measurable transformation of P(X,Y) can be represented by some Markov kernel. We will add a short paragraph (or subsection) enumerating the three primary types with the supporting argument and will update the abstract to reference this enumeration. revision: yes
Referee: [Abstract] Abstract: the loss-correction formulas for attribute and joint corruption cases are promised under a 'generalized paradigm with weaker requirements,' yet neither the paradigm nor the formulas are supplied; these formulas are the concrete output needed to substantiate the mitigation contribution.

Authors: We acknowledge that the abstract promises the formulas and the generalized paradigm but does not exhibit them. The derivations appear in Section 5, where we relax the classical independence assumption between clean and corrupted labels and obtain explicit correction terms for attribute and joint kernels. We will revise the abstract to state that the formulas are derived in Section 5 and will ensure the paradigm is clearly named and contrasted with prior work. revision: yes
Referee: [Abstract] Abstract: the Bayes-risk comparison is described qualitatively ('label corruptions affect only the loss function, attribute corruptions additionally influence the hypothesis class') but no explicit risk expressions, no clean-versus-corrupted risk difference, and no derivation showing why the hypothesis class is unaffected by label kernels are given; this distinction is load-bearing for the claimed systematic analysis.

Authors: The comment is accurate: the abstract gives only the qualitative distinction. Section 4 supplies the explicit expressions R*(P) versus R*_K(P_K) and shows that a label-only kernel leaves the argmin over H unchanged while modifying the loss, whereas an attribute kernel changes both the effective loss and the measure with respect to which the risk is evaluated. We will add a concise statement of the risk difference to the abstract together with a reference to Section 4. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework derived from Markov kernel modeling choice

full rationale

The paper models corruptions via Markov kernels applied to data-generating distributions and derives an exhaustive framework, Bayes risk distinctions, and loss corrections from that choice. No equations, fitted parameters, or self-citations are shown reducing any central claim (exhaustive unification, risk comparisons, or correction formulas) to a tautology or input by construction. The derivation is self-contained against the stated modeling assumptions; the reader's assessment of score 2 aligns with absence of load-bearing self-reference or definitional collapse.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the modeling decision to represent all corruptions via Markov kernels on distributions and on the assertion that this choice yields an exhaustive typology and risk comparisons that support the new corrections.

axioms (1)

domain assumption All modifications to a supervised learning problem can be represented by Markov kernels acting on the underlying probability distributions.
Explicitly stated in the abstract as the focus of the work: 'Focusing on changes to the underlying probability distributions via Markov kernels'.

invented entities (1)

Provably exhaustive corruption framework no independent evidence
purpose: To distinguish among different corruption types and unify existing models under a single nomenclature.
Introduced in the abstract as the first novel opportunity enabled by the Markov-kernel approach.

pith-pipeline@v0.9.0 · 5752 in / 1528 out tokens · 25131 ms · 2026-05-24T07:12:23.782121+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

109 extracted references · 109 canonical work pages · 5 internal anchors

[1]

University of Chicago Press, 1998

Mary Poovey.A history of the modern fact: Problems of knowledge in the sciences of wealth and society. University of Chicago Press, 1998

work page 1998
[2]

ProcessandPurpose,NotThingandTechnique: HowtoPoseData Science Research Challenges.Harvard Data Science Review, 2(3), 2020

RobertCWilliamson. ProcessandPurpose,NotThingandTechnique: HowtoPoseData Science Research Challenges.Harvard Data Science Review, 2(3), 2020

work page 2020
[3]

How to prevent discriminatory outcomes in machine learning

World Economic Forum. How to prevent discriminatory outcomes in machine learning. In World Economic Forum Global Future Council on Human Rights 2016-18, REF, 2018

work page 2016
[4]

Shifts: A dataset of real distributional shift across multiple large-scale tasks

Andrey Malinin, Neil Band, Yarin Gal, Mark Gales, Alexander Ganshin, German Ches- nokov, Alexey Noskov, Andrey Ploskonosov, Liudmila Prokhorenkova, Ivan Provilkov, Vatsal Raina, Vyas Raina, Denis Roginskiy, Mariya Shmatova, Panagiotis Tigas, and Boris Yangel. Shifts: A dataset of real distributional shift across multiple large-scale tasks. In Thirty-fifth...

work page 2021
[5]

Wilds: A benchmark of in-the-wild distribution shifts

Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, et al. Wilds: A benchmark of in-the-wild distribution shifts. InInternational Conference on Machine Learning, pages 5637–5664. PMLR, 2021. 49

work page 2021
[6]

Enhancing(publicationson)dataquality: Deeperdatamindingandfuller data confession.Journal of the Royal Statistical Society Series A: Statistics in Society, 184(4): 1161–1175, 2021

Xiao-LiMeng. Enhancing(publicationson)dataquality: Deeperdatamindingandfuller data confession.Journal of the Royal Statistical Society Series A: Statistics in Society, 184(4): 1161–1175, 2021

work page 2021
[7]

Thinking beyond distributions in testing machine learned models

Negar Rostamzadeh, Ben Hutchinson, Christina Greer, and Vinodkumar Prabhakaran. Thinking beyond distributions in testing machine learned models. InNeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications, 2021

work page 2021
[8]

Geometry and stability of supervised learning problems.arXiv preprint arXiv:2403.01660, 2024

Facundo Mémoli, Brantley Vose, and Robert C Williamson. Geometry and stability of supervised learning problems.arXiv preprint arXiv:2403.01660, 2024

work page arXiv 2024
[9]

Learning from noisy examples.Machine Learning, 2: 343–370, 1988

Dana Angluin and Philip Laird. Learning from noisy examples.Machine Learning, 2: 343–370, 1988

work page 1988
[10]

Domain adaptation under target and conditional shift

Kun Zhang, Bernhard Schölkopf, Krikamol Muandet, and Zhikun Wang. Domain adaptation under target and conditional shift. InInternational conference on machine learning, pages 819–827. PMLR, 2013

work page 2013
[11]

Learning with noisy labels.Advances in neural information processing systems, 26, 2013

Nagarajan Natarajan, Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari. Learning with noisy labels.Advances in neural information processing systems, 26, 2013

work page 2013
[12]

Making deep neural networks robust to label noise: A loss correction approach

Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. In ProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition ,pages1944–1952, 2017

work page 1952
[13]

Improvingpredictiveinferenceundercovariateshiftbyweighting the log-likelihood function.Journal of statistical planning and inference, 90(2):227–244, 2000

HidetoshiShimodaira. Improvingpredictiveinferenceundercovariateshiftbyweighting the log-likelihood function.Journal of statistical planning and inference, 90(2):227–244, 2000

work page 2000
[14]

Dataset shift in machine learning

Joaquin Quiñonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. Dataset shift in machine learning. MIT Press, 2008

work page 2008
[15]

A one-step approach to covariate shift adaptation

Tianyi Zhang, Ikko Yamane, Nan Lu, and Masashi Sugiyama. A one-step approach to covariate shift adaptation. InAsian Conference on Machine Learning, pages 65–80. PMLR, 2020

work page 2020
[16]

A unifying view on dataset shift in classification.Pattern recognition, 45(1):521–530, 2012

Jose G Moreno-Torres, Troy Raeder, Rocío Alaiz-Rodríguez, Nitesh V Chawla, and Francisco Herrera. A unifying view on dataset shift in classification.Pattern recognition, 45(1):521–530, 2012

work page 2012
[17]

Patterns of dataset shift

Meelis Kull and Peter Flach. Patterns of dataset shift. InFirst International Workshop on Learning over Multiple Contexts (LMCE) at ECML-PKDD, 2014

work page 2014
[18]

José A. Sáez. Noise models in classification: Unified nomenclature, extended taxonomy and pragmatic categorization.Mathematics, 10(20), 2022

work page 2022
[19]

A unifying causal framework for analyzing dataset shift-stable learning algorithms.Journal of Causal Inference, 10(1):64–89, 2022

Adarsh Subbaswamy, Bryant Chen, and Suchi Saria. A unifying causal framework for analyzing dataset shift-stable learning algorithms.Journal of Causal Inference, 10(1):64–89, 2022

work page 2022
[20]

Learning k-DNF with noise in the attributes

George Shackelford and Dennis Volper. Learning k-DNF with noise in the attributes. In Proceedings of the first annual workshop on Computational learning theory, pages 97–103, 1988

work page 1988
[21]

Goldman and Robert H

Sally A. Goldman and Robert H. Sloan. Can PAC learning algorithms tolerate random attribute noise?Algorithmica, 14(1):70–84, 1995. 50

work page 1995
[22]

Class noise vs

Xingquan Zhu and Xindong Wu. Class noise vs. attribute noise: A quantitative study. The Artificial Intelligence Review, 22(3):177, 2004

work page 2004
[23]

Williamson and Zac Cranko

Robert C. Williamson and Zac Cranko. Information processing equalities and the information–risk bridge.Journal of machine learning research, 25(103):1–53, 2024

work page 2024
[24]

Combining labeled and unlabeled data with co-training

Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, pages 92–100, 1998

work page 1998
[25]

Learningwithsymmetric label noise: The importance of being unhinged.Advances in neural information processing systems, 28, 2015

BrendanVanRooyen,AdityaMenon,andRobertCWilliamson. Learningwithsymmetric label noise: The importance of being unhinged.Advances in neural information processing systems, 28, 2015

work page 2015
[26]

Williamson

Brendan van Rooyen and Robert C. Williamson. A theory of learning with corrupted labels. Journal of machine learning research, 18(228):1–50, 2018

work page 2018
[27]

Learning from binary labels with instance-dependent noise.Machine Learning, 107(8):1561–1595, 2018

Aditya Krishna Menon, Brendan Van Rooyen, and Nagarajan Natarajan. Learning from binary labels with instance-dependent noise.Machine Learning, 107(8):1561–1595, 2018

work page 2018
[28]

Learning with bounded instance and label-dependent label noise

Jiacheng Cheng, Tongliang Liu, Kotagiri Ramamohanarao, and Dacheng Tao. Learning with bounded instance and label-dependent label noise. InInternational Conference on Machine Learning, pages 1789–1799. PMLR, 2020

work page 2020
[29]

Instance- dependent label-noise learning under a structural causal model.Advances in Neural Information Processing Systems, 34:4409–4420, 2021

Yu Yao, Tongliang Liu, Mingming Gong, Bo Han, Gang Niu, and Kun Zhang. Instance- dependent label-noise learning under a structural causal model.Advances in Neural Information Processing Systems, 34:4409–4420, 2021

work page 2021
[30]

Tackling instance-dependent label noise via a universal probabilistic model

Qizhou Wang, Bo Han, Tongliang Liu, Gang Niu, Jian Yang, and Chen Gong. Tackling instance-dependent label noise via a universal probabilistic model. InProceedings of the AAAI Conference on Artificial Intelligence, pages 10183–10191, 2021

work page 2021
[31]

Decontamination of mutually contaminated models

Gilles Blanchard and Clayton Scott. Decontamination of mutually contaminated models. In Artificial Intelligence and Statistics, pages 1–9. PMLR, 2014

work page 2014
[32]

Learning from corrupted binary labels via class-probability estimation

Aditya Menon, Brendan Van Rooyen, Cheng Soon Ong, and Robert C Williamson. Learning from corrupted binary labels via class-probability estimation. InInternational conference on machine learning, pages 125–134. PMLR, 2015

work page 2015
[33]

Classi- fication with asymmetric label noise: Consistency and maximal denoising.Electronic Journal of Statistics, 10(2):2780–2824, 2016

Gilles Blanchard, Marek Flaska, Gregory Handy, Sara Pozzi, and Clayton Scott. Classi- fication with asymmetric label noise: Consistency and maximal denoising.Electronic Journal of Statistics, 10(2):2780–2824, 2016

work page 2016
[34]

Decontamination of mutual contamination models.Journal of machine learning research, 20(41), 2019

Julian Katz-Samuels, Gilles Blanchard, and Clayton Scott. Decontamination of mutual contamination models.Journal of machine learning research, 20(41), 2019

work page 2019
[35]

The class imbalance problem: A systematic study

Nathalie Japkowicz and Shaju Stephen. The class imbalance problem: A systematic study. Intelligent data analysis, 6(5):429–449, 2002

work page 2002
[36]

Learning from imbalanced data.IEEE Transactions on knowledge and data engineering, 21(9):1263–1284, 2009

Haibo He and Edwardo A García. Learning from imbalanced data.IEEE Transactions on knowledge and data engineering, 21(9):1263–1284, 2009

work page 2009
[37]

A systematic study of the class imbalance problem in convolutional neural networks.Neural networks, 106:249–259, 2018

Mateusz Buda, Atsuto Maki, and Maciej A Mazurowski. A systematic study of the class imbalance problem in convolutional neural networks.Neural networks, 106:249–259, 2018. 51

work page 2018
[38]

Detecting and correcting for label shift with black box predictors

Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for label shift with black box predictors. InInternational conference on machine learning, pages 3122–3130. PMLR, 2018

work page 2018
[39]

Covariate shift by kernel mean matching.Dataset shift in machine learning, 3(4):5, 2009

ArthurGretton,AlexSmola,JiayuanHuang,MarcelSchmittfull,KarstenBorgwardt,and Bernhard Schölkopf. Covariate shift by kernel mean matching.Dataset shift in machine learning, 3(4):5, 2009

work page 2009
[40]

MIT press, 2012

Masashi Sugiyama and Motoaki Kawanabe.Machine learning in non-stationary environ- ments: Introduction to covariate shift adaptation. MIT press, 2012

work page 2012
[41]

Domainadaptationwithconditionaltransferablecomponents

MingmingGong,KunZhang,TongliangLiu,DachengTao,ClarkGlymour,andBernhard Schölkopf. Domainadaptationwithconditionaltransferablecomponents. In International conference on machine learning, pages 2839–2848. PMLR, 2016

work page 2016
[42]

Label-noiserobustdomainadaptation

Xiyu Yu, Tongliang Liu, Mingming Gong, Kun Zhang, Kayhan Batmanghelich, and DachengTao. Label-noiserobustdomainadaptation. In Internationalconferenceonmachine learning, pages 10913–10924. PMLR, 2020

work page 2020
[43]

A Neural Algorithm of Artistic Style

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[44]

Perceptual losses for real-time style transfer and super-resolution

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. InEuropean Conference on Computer Vision, page 694, 2016

work page 2016
[45]

Audio style transfer

Eric Grinstein, Ngoc QK Duong, Alexey Ozerov, and Patrick Pérez. Audio style transfer. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 586–590. IEEE, 2018

work page 2018
[46]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks.arXiv preprint arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[47]

Explaining and Harnessing Adversarial Examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples.arXiv preprint arXiv:1412.6572, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[48]

Thelimitationsofdeeplearninginadversarialsettings

Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and AnanthramSwami. Thelimitationsofdeeplearninginadversarialsettings. In 2016IEEE European symposium on security and privacy (EuroS&P), pages 372–387. IEEE, 2016

work page 2016
[49]

Adversarialexamplesinthephysical world

AlexeyKurakin,IanJGoodfellow,andSamyBengio. Adversarialexamplesinthephysical world. InArtificial intelligence safety and security, pages 99–112. Chapman and Hall/CRC, 2018

work page 2018
[50]

Natural adversarial examples

Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural adversarial examples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15262–15271, 2021

work page 2021
[51]

Learning in the presence of concept drift and hidden contexts.Machine learning, 23:69–101, 1996

Gerhard Widmer and Miroslav Kubat. Learning in the presence of concept drift and hidden contexts.Machine learning, 23:69–101, 1996

work page 1996
[52]

A survey on concept drift adaptation.ACM computingsurveys (CSUR), 46(4): 1–37, 2014

João Gama, Indr˙e Žliobait˙e, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. A survey on concept drift adaptation.ACM computingsurveys (CSUR), 46(4): 1–37, 2014. 52

work page 2014
[53]

Learning underconceptdrift: Areview

Jie Lu, Anjin Liu, Fan Dong, Feng Gu, Joao Gama, and Guangquan Zhang. Learning underconceptdrift: Areview. IEEETransactionsonKnowledgeandDataEngineering ,pages 1–1, 2018

work page 2018
[54]

Entropy-based concept shift detection

Peter Vorburger and Abraham Bernstein. Entropy-based concept shift detection. InSixth International Conference on Data Mining (ICDM’06), pages 1113–1118. IEEE, 2006

work page 2006
[55]

Effective learning in dynamic environments by explicit context tracking

Gerhard Widmer and Miroslav Kubat. Effective learning in dynamic environments by explicit context tracking. InEuropean Conference on Machine Learning, volume 6, pages 227–243, 1993

work page 1993
[56]

Tolerating concept and sampling shift in lazy learning using prediction error context switching.Artificial Intelligence Review, 11:133–155, 1997

Marcos Salganicoff. Tolerating concept and sampling shift in lazy learning using prediction error context switching.Artificial Intelligence Review, 11:133–155, 1997

work page 1997
[57]

The problem of concept drift: definitions and related work.Computer Science Department, Trinity College Dublin, 106(2):58, 2004

Alexey Tsymbal. The problem of concept drift: definitions and related work.Computer Science Department, Trinity College Dublin, 106(2):58, 2004

work page 2004
[58]

Springer, 2007

Achim Klenke.Probability Theory: A Comprehensive Course. Springer, 2007

work page 2007
[59]

Springer, 2011

Erhan Çinlar.Probability and Stochastics. Springer, 2011

work page 2011
[60]

Springer, 2017

Olav Kallenberg.Random measures, theory and applications. Springer, 2017

work page 2017
[61]

StatisticalCausalModellingandDecisionTheory .PhDthesis,TheAustralian National University, 2023

DavidJohnston. StatisticalCausalModellingandDecisionTheory .PhDthesis,TheAustralian National University, 2023

work page 2023
[62]

Kleisli categories and probability - 03 - markov kernels.https: //youtu.be/psUDrasc21o?si=we87QEeKiGOa0_eN, 2020

Arthur Parzygnat. Kleisli categories and probability - 03 - markov kernels.https: //youtu.be/psUDrasc21o?si=we87QEeKiGOa0_eN, 2020

work page 2020
[63]

A class of measures of informativity of observation channels.Periodica Mathematica Hungarica, 2(1-4):191–213, 1972

Imre Csiszár. A class of measures of informativity of observation channels.Periodica Mathematica Hungarica, 2(1-4):191–213, 1972

work page 1972
[64]

Cambridge University Press, 1991

Erik Torgersen.Comparison of statistical experiments. Cambridge University Press, 1991

work page 1991
[65]

World Scientific, 2000

Albert N Shiryaev and Vladimir G Spokoiny.Statistical Experiments And Decision, Asymptotic Theory. World Scientific, 2000

work page 2000
[66]

Everyone wants to do the model work, not the data work

Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M Aroyo. "Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI. Inproceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–15, 2021

work page 2021
[67]

Convexity, classification, and risk bounds.Journal of the American Statistical Association, 101(473):138–156, 2006

Peter L Bartlett, Michael I Jordan, and Jon D McAuliffe. Convexity, classification, and risk bounds.Journal of the American Statistical Association, 101(473):138–156, 2006

work page 2006
[68]

A theory of learning from different domains.Machine Learning, 79: 151–175, 2010

Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Vaughan. A theory of learning from different domains.Machine Learning, 79: 151–175, 2010

work page 2010
[69]

Fairness evaluation in presence of biased noisy labels

Riccardo Fogliato, Alexandra Chouldechova, and Max G’Sell. Fairness evaluation in presence of biased noisy labels. InInternational conference on artificial intelligence and statistics, pages 2325–2336. PMLR, 2020

work page 2020
[70]

How the war on drugs damages black social mobility.The Brookings Institution, published Sept, 30, 2014

Jonathan Rothwell. How the war on drugs damages black social mobility.The Brookings Institution, published Sept, 30, 2014. 53

work page 2014
[71]

Learningclassifiersfromonlypositiveandunlabeleddata

CharlesElkanandKeithNoto. Learningclassifiersfromonlypositiveandunlabeleddata. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 213–220, 2008

work page 2008
[72]

Presence-only data and the EM algorithm.Biometrics, 65(2):554–563, 2009

Gill Ward, Trevor Hastie, Simon Barry, Jane Elith, and John R Leathwick. Presence-only data and the EM algorithm.Biometrics, 65(2):554–563, 2009

work page 2009
[73]

Analysis of learning from positive and unlabeled data.Advances in neural information processing systems, 27, 2014

Marthinus C Du Plessis, Gang Niu, and Masashi Sugiyama. Analysis of learning from positive and unlabeled data.Advances in neural information processing systems, 27, 2014

work page 2014
[74]

Convex formulation for learning from positive and unlabeled data

Marthinus Du Plessis, Gang Niu, and Masashi Sugiyama. Convex formulation for learning from positive and unlabeled data. InInternational conference on machine learning, pages 1386–1394. PMLR, 2015

work page 2015
[75]

Positive- unlabeled learning with non-negative risk estimator.Advances in neural information processing systems, 30, 2017

Ryuichi Kiryo, Gang Niu, Marthinus C Du Plessis, and Masashi Sugiyama. Positive- unlabeled learning with non-negative risk estimator.Advances in neural information processing systems, 30, 2017

work page 2017
[76]

Estimating labels from label proportions

Novi Quadrianto, Alex J Smola, Tiberio S Caetano, and Quoc V Le. Estimating labels from label proportions. InProceedings of the 25th International Conference on Machine learning, pages 776–783, 2008

work page 2008
[77]

On Learning from Label Proportions

Felix X Yu, Krzysztof Choromanski, Sanjiv Kumar, Tony Jebara, and Shih-Fu Chang. On learning from label proportions.arXiv preprint arXiv:1402.5902, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[78]

Learning from label proportionswithgenerativeadversarialnetworks

Jiabin Liu, Bo Wang, Zhiquan Qi, Yingjie Tian, and Yong Shi. Learning from label proportionswithgenerativeadversarialnetworks. Advancesinneuralinformationprocessing systems, 32, 2019

work page 2019
[79]

Learning from label proportions: A mutual contam- ination framework

Clayton Scott and Jianxin Zhang. Learning from label proportions: A mutual contam- ination framework. Advances in neural information processing systems, 33:22256–22267, 2020

work page 2020
[80]

Multi-class classification from multiple unlabeled datasets with partial risk regularization

Yuting Tang, Nan Lu, Tianyi Zhang, and Masashi Sugiyama. Multi-class classification from multiple unlabeled datasets with partial risk regularization. InAsian Conference on Machine Learning, pages 990–1005. PMLR, 2023

work page 2023

Showing first 80 references.

[1] [1]

University of Chicago Press, 1998

Mary Poovey.A history of the modern fact: Problems of knowledge in the sciences of wealth and society. University of Chicago Press, 1998

work page 1998

[2] [2]

ProcessandPurpose,NotThingandTechnique: HowtoPoseData Science Research Challenges.Harvard Data Science Review, 2(3), 2020

RobertCWilliamson. ProcessandPurpose,NotThingandTechnique: HowtoPoseData Science Research Challenges.Harvard Data Science Review, 2(3), 2020

work page 2020

[3] [3]

How to prevent discriminatory outcomes in machine learning

World Economic Forum. How to prevent discriminatory outcomes in machine learning. In World Economic Forum Global Future Council on Human Rights 2016-18, REF, 2018

work page 2016

[4] [4]

Shifts: A dataset of real distributional shift across multiple large-scale tasks

Andrey Malinin, Neil Band, Yarin Gal, Mark Gales, Alexander Ganshin, German Ches- nokov, Alexey Noskov, Andrey Ploskonosov, Liudmila Prokhorenkova, Ivan Provilkov, Vatsal Raina, Vyas Raina, Denis Roginskiy, Mariya Shmatova, Panagiotis Tigas, and Boris Yangel. Shifts: A dataset of real distributional shift across multiple large-scale tasks. In Thirty-fifth...

work page 2021

[5] [5]

Wilds: A benchmark of in-the-wild distribution shifts

Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, et al. Wilds: A benchmark of in-the-wild distribution shifts. InInternational Conference on Machine Learning, pages 5637–5664. PMLR, 2021. 49

work page 2021

[6] [6]

Enhancing(publicationson)dataquality: Deeperdatamindingandfuller data confession.Journal of the Royal Statistical Society Series A: Statistics in Society, 184(4): 1161–1175, 2021

Xiao-LiMeng. Enhancing(publicationson)dataquality: Deeperdatamindingandfuller data confession.Journal of the Royal Statistical Society Series A: Statistics in Society, 184(4): 1161–1175, 2021

work page 2021

[7] [7]

Thinking beyond distributions in testing machine learned models

Negar Rostamzadeh, Ben Hutchinson, Christina Greer, and Vinodkumar Prabhakaran. Thinking beyond distributions in testing machine learned models. InNeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications, 2021

work page 2021

[8] [8]

Geometry and stability of supervised learning problems.arXiv preprint arXiv:2403.01660, 2024

Facundo Mémoli, Brantley Vose, and Robert C Williamson. Geometry and stability of supervised learning problems.arXiv preprint arXiv:2403.01660, 2024

work page arXiv 2024

[9] [9]

Learning from noisy examples.Machine Learning, 2: 343–370, 1988

Dana Angluin and Philip Laird. Learning from noisy examples.Machine Learning, 2: 343–370, 1988

work page 1988

[10] [10]

Domain adaptation under target and conditional shift

Kun Zhang, Bernhard Schölkopf, Krikamol Muandet, and Zhikun Wang. Domain adaptation under target and conditional shift. InInternational conference on machine learning, pages 819–827. PMLR, 2013

work page 2013

[11] [11]

Learning with noisy labels.Advances in neural information processing systems, 26, 2013

Nagarajan Natarajan, Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari. Learning with noisy labels.Advances in neural information processing systems, 26, 2013

work page 2013

[12] [12]

Making deep neural networks robust to label noise: A loss correction approach

Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. In ProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition ,pages1944–1952, 2017

work page 1952

[13] [13]

Improvingpredictiveinferenceundercovariateshiftbyweighting the log-likelihood function.Journal of statistical planning and inference, 90(2):227–244, 2000

HidetoshiShimodaira. Improvingpredictiveinferenceundercovariateshiftbyweighting the log-likelihood function.Journal of statistical planning and inference, 90(2):227–244, 2000

work page 2000

[14] [14]

Dataset shift in machine learning

Joaquin Quiñonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. Dataset shift in machine learning. MIT Press, 2008

work page 2008

[15] [15]

A one-step approach to covariate shift adaptation

Tianyi Zhang, Ikko Yamane, Nan Lu, and Masashi Sugiyama. A one-step approach to covariate shift adaptation. InAsian Conference on Machine Learning, pages 65–80. PMLR, 2020

work page 2020

[16] [16]

A unifying view on dataset shift in classification.Pattern recognition, 45(1):521–530, 2012

Jose G Moreno-Torres, Troy Raeder, Rocío Alaiz-Rodríguez, Nitesh V Chawla, and Francisco Herrera. A unifying view on dataset shift in classification.Pattern recognition, 45(1):521–530, 2012

work page 2012

[17] [17]

Patterns of dataset shift

Meelis Kull and Peter Flach. Patterns of dataset shift. InFirst International Workshop on Learning over Multiple Contexts (LMCE) at ECML-PKDD, 2014

work page 2014

[18] [18]

José A. Sáez. Noise models in classification: Unified nomenclature, extended taxonomy and pragmatic categorization.Mathematics, 10(20), 2022

work page 2022

[19] [19]

A unifying causal framework for analyzing dataset shift-stable learning algorithms.Journal of Causal Inference, 10(1):64–89, 2022

Adarsh Subbaswamy, Bryant Chen, and Suchi Saria. A unifying causal framework for analyzing dataset shift-stable learning algorithms.Journal of Causal Inference, 10(1):64–89, 2022

work page 2022

[20] [20]

Learning k-DNF with noise in the attributes

George Shackelford and Dennis Volper. Learning k-DNF with noise in the attributes. In Proceedings of the first annual workshop on Computational learning theory, pages 97–103, 1988

work page 1988

[21] [21]

Goldman and Robert H

Sally A. Goldman and Robert H. Sloan. Can PAC learning algorithms tolerate random attribute noise?Algorithmica, 14(1):70–84, 1995. 50

work page 1995

[22] [22]

Class noise vs

Xingquan Zhu and Xindong Wu. Class noise vs. attribute noise: A quantitative study. The Artificial Intelligence Review, 22(3):177, 2004

work page 2004

[23] [23]

Williamson and Zac Cranko

Robert C. Williamson and Zac Cranko. Information processing equalities and the information–risk bridge.Journal of machine learning research, 25(103):1–53, 2024

work page 2024

[24] [24]

Combining labeled and unlabeled data with co-training

Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, pages 92–100, 1998

work page 1998

[25] [25]

Learningwithsymmetric label noise: The importance of being unhinged.Advances in neural information processing systems, 28, 2015

BrendanVanRooyen,AdityaMenon,andRobertCWilliamson. Learningwithsymmetric label noise: The importance of being unhinged.Advances in neural information processing systems, 28, 2015

work page 2015

[26] [26]

Williamson

Brendan van Rooyen and Robert C. Williamson. A theory of learning with corrupted labels. Journal of machine learning research, 18(228):1–50, 2018

work page 2018

[27] [27]

Learning from binary labels with instance-dependent noise.Machine Learning, 107(8):1561–1595, 2018

Aditya Krishna Menon, Brendan Van Rooyen, and Nagarajan Natarajan. Learning from binary labels with instance-dependent noise.Machine Learning, 107(8):1561–1595, 2018

work page 2018

[28] [28]

Learning with bounded instance and label-dependent label noise

Jiacheng Cheng, Tongliang Liu, Kotagiri Ramamohanarao, and Dacheng Tao. Learning with bounded instance and label-dependent label noise. InInternational Conference on Machine Learning, pages 1789–1799. PMLR, 2020

work page 2020

[29] [29]

Instance- dependent label-noise learning under a structural causal model.Advances in Neural Information Processing Systems, 34:4409–4420, 2021

Yu Yao, Tongliang Liu, Mingming Gong, Bo Han, Gang Niu, and Kun Zhang. Instance- dependent label-noise learning under a structural causal model.Advances in Neural Information Processing Systems, 34:4409–4420, 2021

work page 2021

[30] [30]

Tackling instance-dependent label noise via a universal probabilistic model

Qizhou Wang, Bo Han, Tongliang Liu, Gang Niu, Jian Yang, and Chen Gong. Tackling instance-dependent label noise via a universal probabilistic model. InProceedings of the AAAI Conference on Artificial Intelligence, pages 10183–10191, 2021

work page 2021

[31] [31]

Decontamination of mutually contaminated models

Gilles Blanchard and Clayton Scott. Decontamination of mutually contaminated models. In Artificial Intelligence and Statistics, pages 1–9. PMLR, 2014

work page 2014

[32] [32]

Learning from corrupted binary labels via class-probability estimation

Aditya Menon, Brendan Van Rooyen, Cheng Soon Ong, and Robert C Williamson. Learning from corrupted binary labels via class-probability estimation. InInternational conference on machine learning, pages 125–134. PMLR, 2015

work page 2015

[33] [33]

Classi- fication with asymmetric label noise: Consistency and maximal denoising.Electronic Journal of Statistics, 10(2):2780–2824, 2016

Gilles Blanchard, Marek Flaska, Gregory Handy, Sara Pozzi, and Clayton Scott. Classi- fication with asymmetric label noise: Consistency and maximal denoising.Electronic Journal of Statistics, 10(2):2780–2824, 2016

work page 2016

[34] [34]

Decontamination of mutual contamination models.Journal of machine learning research, 20(41), 2019

Julian Katz-Samuels, Gilles Blanchard, and Clayton Scott. Decontamination of mutual contamination models.Journal of machine learning research, 20(41), 2019

work page 2019

[35] [35]

The class imbalance problem: A systematic study

Nathalie Japkowicz and Shaju Stephen. The class imbalance problem: A systematic study. Intelligent data analysis, 6(5):429–449, 2002

work page 2002

[36] [36]

Learning from imbalanced data.IEEE Transactions on knowledge and data engineering, 21(9):1263–1284, 2009

Haibo He and Edwardo A García. Learning from imbalanced data.IEEE Transactions on knowledge and data engineering, 21(9):1263–1284, 2009

work page 2009

[37] [37]

A systematic study of the class imbalance problem in convolutional neural networks.Neural networks, 106:249–259, 2018

Mateusz Buda, Atsuto Maki, and Maciej A Mazurowski. A systematic study of the class imbalance problem in convolutional neural networks.Neural networks, 106:249–259, 2018. 51

work page 2018

[38] [38]

Detecting and correcting for label shift with black box predictors

Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for label shift with black box predictors. InInternational conference on machine learning, pages 3122–3130. PMLR, 2018

work page 2018

[39] [39]

Covariate shift by kernel mean matching.Dataset shift in machine learning, 3(4):5, 2009

ArthurGretton,AlexSmola,JiayuanHuang,MarcelSchmittfull,KarstenBorgwardt,and Bernhard Schölkopf. Covariate shift by kernel mean matching.Dataset shift in machine learning, 3(4):5, 2009

work page 2009

[40] [40]

MIT press, 2012

Masashi Sugiyama and Motoaki Kawanabe.Machine learning in non-stationary environ- ments: Introduction to covariate shift adaptation. MIT press, 2012

work page 2012

[41] [41]

Domainadaptationwithconditionaltransferablecomponents

MingmingGong,KunZhang,TongliangLiu,DachengTao,ClarkGlymour,andBernhard Schölkopf. Domainadaptationwithconditionaltransferablecomponents. In International conference on machine learning, pages 2839–2848. PMLR, 2016

work page 2016

[42] [42]

Label-noiserobustdomainadaptation

Xiyu Yu, Tongliang Liu, Mingming Gong, Kun Zhang, Kayhan Batmanghelich, and DachengTao. Label-noiserobustdomainadaptation. In Internationalconferenceonmachine learning, pages 10913–10924. PMLR, 2020

work page 2020

[43] [43]

A Neural Algorithm of Artistic Style

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[44] [44]

Perceptual losses for real-time style transfer and super-resolution

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. InEuropean Conference on Computer Vision, page 694, 2016

work page 2016

[45] [45]

Audio style transfer

Eric Grinstein, Ngoc QK Duong, Alexey Ozerov, and Patrick Pérez. Audio style transfer. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 586–590. IEEE, 2018

work page 2018

[46] [46]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks.arXiv preprint arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[47] [47]

Explaining and Harnessing Adversarial Examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples.arXiv preprint arXiv:1412.6572, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[48] [48]

Thelimitationsofdeeplearninginadversarialsettings

Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and AnanthramSwami. Thelimitationsofdeeplearninginadversarialsettings. In 2016IEEE European symposium on security and privacy (EuroS&P), pages 372–387. IEEE, 2016

work page 2016

[49] [49]

Adversarialexamplesinthephysical world

AlexeyKurakin,IanJGoodfellow,andSamyBengio. Adversarialexamplesinthephysical world. InArtificial intelligence safety and security, pages 99–112. Chapman and Hall/CRC, 2018

work page 2018

[50] [50]

Natural adversarial examples

Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural adversarial examples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15262–15271, 2021

work page 2021

[51] [51]

Learning in the presence of concept drift and hidden contexts.Machine learning, 23:69–101, 1996

Gerhard Widmer and Miroslav Kubat. Learning in the presence of concept drift and hidden contexts.Machine learning, 23:69–101, 1996

work page 1996

[52] [52]

A survey on concept drift adaptation.ACM computingsurveys (CSUR), 46(4): 1–37, 2014

João Gama, Indr˙e Žliobait˙e, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. A survey on concept drift adaptation.ACM computingsurveys (CSUR), 46(4): 1–37, 2014. 52

work page 2014

[53] [53]

Learning underconceptdrift: Areview

Jie Lu, Anjin Liu, Fan Dong, Feng Gu, Joao Gama, and Guangquan Zhang. Learning underconceptdrift: Areview. IEEETransactionsonKnowledgeandDataEngineering ,pages 1–1, 2018

work page 2018

[54] [54]

Entropy-based concept shift detection

Peter Vorburger and Abraham Bernstein. Entropy-based concept shift detection. InSixth International Conference on Data Mining (ICDM’06), pages 1113–1118. IEEE, 2006

work page 2006

[55] [55]

Effective learning in dynamic environments by explicit context tracking

Gerhard Widmer and Miroslav Kubat. Effective learning in dynamic environments by explicit context tracking. InEuropean Conference on Machine Learning, volume 6, pages 227–243, 1993

work page 1993

[56] [56]

Tolerating concept and sampling shift in lazy learning using prediction error context switching.Artificial Intelligence Review, 11:133–155, 1997

Marcos Salganicoff. Tolerating concept and sampling shift in lazy learning using prediction error context switching.Artificial Intelligence Review, 11:133–155, 1997

work page 1997

[57] [57]

The problem of concept drift: definitions and related work.Computer Science Department, Trinity College Dublin, 106(2):58, 2004

Alexey Tsymbal. The problem of concept drift: definitions and related work.Computer Science Department, Trinity College Dublin, 106(2):58, 2004

work page 2004

[58] [58]

Springer, 2007

Achim Klenke.Probability Theory: A Comprehensive Course. Springer, 2007

work page 2007

[59] [59]

Springer, 2011

Erhan Çinlar.Probability and Stochastics. Springer, 2011

work page 2011

[60] [60]

Springer, 2017

Olav Kallenberg.Random measures, theory and applications. Springer, 2017

work page 2017

[61] [61]

StatisticalCausalModellingandDecisionTheory .PhDthesis,TheAustralian National University, 2023

DavidJohnston. StatisticalCausalModellingandDecisionTheory .PhDthesis,TheAustralian National University, 2023

work page 2023

[62] [62]

Kleisli categories and probability - 03 - markov kernels.https: //youtu.be/psUDrasc21o?si=we87QEeKiGOa0_eN, 2020

Arthur Parzygnat. Kleisli categories and probability - 03 - markov kernels.https: //youtu.be/psUDrasc21o?si=we87QEeKiGOa0_eN, 2020

work page 2020

[63] [63]

A class of measures of informativity of observation channels.Periodica Mathematica Hungarica, 2(1-4):191–213, 1972

Imre Csiszár. A class of measures of informativity of observation channels.Periodica Mathematica Hungarica, 2(1-4):191–213, 1972

work page 1972

[64] [64]

Cambridge University Press, 1991

Erik Torgersen.Comparison of statistical experiments. Cambridge University Press, 1991

work page 1991

[65] [65]

World Scientific, 2000

Albert N Shiryaev and Vladimir G Spokoiny.Statistical Experiments And Decision, Asymptotic Theory. World Scientific, 2000

work page 2000

[66] [66]

Everyone wants to do the model work, not the data work

Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M Aroyo. "Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI. Inproceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–15, 2021

work page 2021

[67] [67]

Convexity, classification, and risk bounds.Journal of the American Statistical Association, 101(473):138–156, 2006

Peter L Bartlett, Michael I Jordan, and Jon D McAuliffe. Convexity, classification, and risk bounds.Journal of the American Statistical Association, 101(473):138–156, 2006

work page 2006

[68] [68]

A theory of learning from different domains.Machine Learning, 79: 151–175, 2010

Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Vaughan. A theory of learning from different domains.Machine Learning, 79: 151–175, 2010

work page 2010

[69] [69]

Fairness evaluation in presence of biased noisy labels

Riccardo Fogliato, Alexandra Chouldechova, and Max G’Sell. Fairness evaluation in presence of biased noisy labels. InInternational conference on artificial intelligence and statistics, pages 2325–2336. PMLR, 2020

work page 2020

[70] [70]

How the war on drugs damages black social mobility.The Brookings Institution, published Sept, 30, 2014

Jonathan Rothwell. How the war on drugs damages black social mobility.The Brookings Institution, published Sept, 30, 2014. 53

work page 2014

[71] [71]

Learningclassifiersfromonlypositiveandunlabeleddata

CharlesElkanandKeithNoto. Learningclassifiersfromonlypositiveandunlabeleddata. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 213–220, 2008

work page 2008

[72] [72]

Presence-only data and the EM algorithm.Biometrics, 65(2):554–563, 2009

Gill Ward, Trevor Hastie, Simon Barry, Jane Elith, and John R Leathwick. Presence-only data and the EM algorithm.Biometrics, 65(2):554–563, 2009

work page 2009

[73] [73]

Analysis of learning from positive and unlabeled data.Advances in neural information processing systems, 27, 2014

Marthinus C Du Plessis, Gang Niu, and Masashi Sugiyama. Analysis of learning from positive and unlabeled data.Advances in neural information processing systems, 27, 2014

work page 2014

[74] [74]

Convex formulation for learning from positive and unlabeled data

Marthinus Du Plessis, Gang Niu, and Masashi Sugiyama. Convex formulation for learning from positive and unlabeled data. InInternational conference on machine learning, pages 1386–1394. PMLR, 2015

work page 2015

[75] [75]

Positive- unlabeled learning with non-negative risk estimator.Advances in neural information processing systems, 30, 2017

Ryuichi Kiryo, Gang Niu, Marthinus C Du Plessis, and Masashi Sugiyama. Positive- unlabeled learning with non-negative risk estimator.Advances in neural information processing systems, 30, 2017

work page 2017

[76] [76]

Estimating labels from label proportions

Novi Quadrianto, Alex J Smola, Tiberio S Caetano, and Quoc V Le. Estimating labels from label proportions. InProceedings of the 25th International Conference on Machine learning, pages 776–783, 2008

work page 2008

[77] [77]

On Learning from Label Proportions

Felix X Yu, Krzysztof Choromanski, Sanjiv Kumar, Tony Jebara, and Shih-Fu Chang. On learning from label proportions.arXiv preprint arXiv:1402.5902, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[78] [78]

Learning from label proportionswithgenerativeadversarialnetworks

Jiabin Liu, Bo Wang, Zhiquan Qi, Yingjie Tian, and Yong Shi. Learning from label proportionswithgenerativeadversarialnetworks. Advancesinneuralinformationprocessing systems, 32, 2019

work page 2019

[79] [79]

Learning from label proportions: A mutual contam- ination framework

Clayton Scott and Jianxin Zhang. Learning from label proportions: A mutual contam- ination framework. Advances in neural information processing systems, 33:22256–22267, 2020

work page 2020

[80] [80]

Multi-class classification from multiple unlabeled datasets with partial risk regularization

Yuting Tang, Nan Lu, Tianyi Zhang, and Masashi Sugiyama. Multi-class classification from multiple unlabeled datasets with partial risk regularization. InAsian Conference on Machine Learning, pages 990–1005. PMLR, 2023

work page 2023