The price of multi-group transductive learning

Daniel Hsu; Noah Bergam; Samuel Deng

arxiv: 2606.04423 · v1 · pith:BPNMBX4Mnew · submitted 2026-06-03 · 💻 cs.LG · stat.ML

The price of multi-group transductive learning

Noah Bergam , Samuel Deng , Daniel Hsu This is my paper

Pith reviewed 2026-06-28 07:22 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords multi-group learningtransductive learningerror rate boundsgroup fairnesssample complexitylearning theorymulti-group fairness

0 comments

The pith

Every multi-group learner in the transductive setting can suffer an error-rate penalty on some group that grows linearly with the number of groups.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that moving from single-group to multi-group learning in the transductive model forces a multiplicative increase in error on at least one group. The increase scales linearly with the number of groups and can reach roughly the square root of the sample size. A sympathetic reader would care because this cost is substantially higher than the logarithmic, group-independent bound that holds in the comparable statistical setting. The result therefore separates the two learning models on a concrete performance measure rather than on abstract complexity.

Core claim

We show every multi-group learner in the transductive setting may incur a multiplicative penalty in its error rate on some group relative to the error rate achievable in the single-group setting, and the penalty can increase linearly with the number of groups, up to roughly the square-root of the sample size. This stands in stark contrast to optimal multi-group learners in an analogous (group-realizable) statistical setting, where the penalty is always at most logarithmic in the sample size and independent of the number of groups.

What carries the argument

The multiplicative penalty factor on per-group error rate when comparing any transductive multi-group learner to its single-group counterpart.

If this is right

Any transductive procedure that guarantees low error on every group must pay a per-group cost linear in the number of groups.
The worst-case per-group error can be up to roughly sqrt(n) times larger than the single-group optimum.
The separation between transductive and statistical multi-group performance is witnessed already in realizable distributions.
Single-group transductive bounds cannot be lifted to the multi-group case without this extra factor.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers of transductive fairness algorithms may need to allocate sample size differently once the number of protected groups exceeds a small constant.
The linear penalty suggests that empirical risk minimization over joint group-label hypotheses may be information-theoretically suboptimal in the transductive regime.
Similar separations could appear in other batch settings where the test set is known in advance but labels are hidden.

Load-bearing premise

A group-realizable statistical setting exists in which the multi-group penalty stays logarithmic and does not grow with the number of groups.

What would settle it

An explicit transductive algorithm whose worst-group error multiplier remains logarithmic in sample size and independent of the number of groups, even on the distributions used to prove the linear lower bound.

read the original abstract

We show every multi-group learner in the transductive setting may incur a multiplicative penalty in its error rate on some group relative to the error rate achievable in the single-group setting, and the penalty can increasing linearly with the number of groups, up to roughly the square-root of the sample size. This stands in stark contrast to optimal multi-group learners in an analogous (group-realizable) statistical setting, where the penalty is always at most logarithmic in the sample size and independent of the number of groups.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean separation showing linear-in-G penalty for transductive multi-group learning versus log n in the statistical case, but the analogy between the two models needs verification on definitions of realizability and group access.

read the letter

The central result is that every multi-group learner in the transductive setting can suffer a multiplicative error penalty on some group that grows linearly with the number of groups G, up to order sqrt(n). This is set against an analogous statistical setting where the penalty stays at most logarithmic in n and does not grow with G.

What stands out is the quantitative contrast. The abstract states the transductive lower bound directly and notes the statistical upper bound from prior work. If the proofs go through without hidden differences in how groups are defined or revealed, this cleanly separates the two regimes and gives a concrete cost for the transductive model.

The soft spot is the comparison itself. The separation only holds if the statistical setting uses the same notion of group-realizability, the same group partition, and comparable access to labels. The stress-test note flags exactly this: if the statistical upper bound assumes groups revealed only at test time or a weaker realizability condition, the claimed gap shrinks or disappears. The lower-bound construction also needs checking to confirm the groups it uses can actually arise in the statistical model.

The paper is for researchers working on subgroup fairness or transductive algorithms who care about fixed-data regimes. It is a theoretical contribution that clarifies a modeling distinction rather than a broad empirical claim.

I would send it to peer review. The claim is sharp enough to be worth referee time even if the analogy requires tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript claims a separation result for multi-group learning: every multi-group learner in the transductive setting may incur a multiplicative penalty in its per-group error rate (relative to the single-group optimum) that grows linearly with the number of groups G, up to roughly sqrt(n). This is asserted to stand in contrast to an analogous group-realizable statistical setting, where the corresponding penalty is at most logarithmic in n and independent of G.

Significance. If the separation is established with precisely matching definitions of realizability, group partitions, and learner access, the result would clarify a fundamental distinction between transductive and statistical multi-group learning. It would supply a concrete lower-bound construction that scales with G and thereby inform algorithm design in fixed-sample regimes where groups are known in advance. The explicit contrast with prior statistical upper bounds is a potentially useful contribution, provided the models are shown to be comparable.

major comments (2)

[Abstract and Section 3] Abstract and Section 3 (lower-bound construction): The central separation requires that the referenced 'analogous (group-realizable) statistical setting' uses identical notions of group partition, exact vs. approximate realizability, and access to group labels as the transductive lower-bound instance. The provided text does not contain an explicit statement confirming that the statistical upper bound (O(log n), independent of G) applies under the same group structure that forces the linear-in-G penalty in the transductive case. If the statistical model employs a strictly weaker group revelation or different realizability, the claimed gap does not follow.
[Abstract] Abstract: The lower-bound claim states that the multiplicative penalty 'can increasing linearly with the number of groups, up to roughly the square-root of the sample size.' Without the explicit construction or the precise definition of the transductive multi-group model (including how groups are presented to the learner), it is impossible to verify whether the linear dependence on G is forced or whether the sqrt(n) scaling arises from a particular choice of n and G.

minor comments (2)

[Abstract] Abstract contains a grammatical error: 'the penalty can increasing linearly' should read 'the penalty can increase linearly'.
[Abstract] The abstract refers to 'roughly the square-root of the sample size' without specifying whether this is sqrt(n), sqrt(m) for some other quantity, or a function of both n and G; a precise asymptotic statement would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed comments on model comparability and the need for explicit verification of the separation. We address each point below and will revise the manuscript to improve clarity on the matching definitions between settings.

read point-by-point responses

Referee: [Abstract and Section 3] Abstract and Section 3 (lower-bound construction): The central separation requires that the referenced 'analogous (group-realizable) statistical setting' uses identical notions of group partition, exact vs. approximate realizability, and access to group labels as the transductive lower-bound instance. The provided text does not contain an explicit statement confirming that the statistical upper bound (O(log n), independent of G) applies under the same group structure that forces the linear-in-G penalty in the transductive case. If the statistical model employs a strictly weaker group revelation or different realizability, the claimed gap does not follow.

Authors: We agree an explicit confirmation strengthens the argument. Section 2 defines the transductive multi-group model with a fixed partition into G groups known to the learner in advance, and group-realizability meaning there exists a hypothesis with zero error on every group. The lower-bound instance in Section 3 uses precisely this partition and realizability notion. The referenced statistical upper bound (O(log n) independent of G) is from the group-realizable statistical model with identical group access and realizability. In the revision we will insert a paragraph at the start of Section 3 that directly states these elements match the statistical setting used for the contrast. revision: yes
Referee: [Abstract] Abstract: The lower-bound claim states that the multiplicative penalty 'can increasing linearly with the number of groups, up to roughly the square-root of the sample size.' Without the explicit construction or the precise definition of the transductive multi-group model (including how groups are presented to the learner), it is impossible to verify whether the linear dependence on G is forced or whether the sqrt(n) scaling arises from a particular choice of n and G.

Authors: The precise definition of the transductive multi-group model, including that groups are presented as a known partition of the n points, appears in Section 2. The explicit lower-bound construction forcing the linear-in-G penalty (up to sqrt(n)) is given in full in Section 3 via a concrete hypothesis class and labeling. The abstract is a high-level summary; we will add a sentence directing readers to Sections 2 and 3 for the model and construction details. revision: partial

Circularity Check

0 steps flagged

No circularity; theoretical separation stands on independent lower-bound construction

full rationale

The manuscript is a theoretical paper establishing a lower-bound separation between transductive and statistical multi-group learning. No equations, fitted parameters, or self-citations appear in the abstract or description that would reduce any claimed result to its own inputs by construction. The contrast between linear-in-G transductive penalty and logarithmic statistical penalty is asserted via standard realizability assumptions without self-definitional loops, renamed empirical patterns, or load-bearing self-citations. The derivation is therefore self-contained against external learning-theoretic benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The result rests on standard definitions of transductive learning and group-realizability in statistical learning theory; no free parameters, invented entities, or non-standard axioms are visible from the abstract.

axioms (1)

standard math Standard definitions of transductive and statistical (group-realizable) learning models
The separation is stated relative to these established frameworks.

pith-pipeline@v0.9.1-grok · 5597 in / 1262 out tokens · 27915 ms · 2026-06-28T07:22:38.691150+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Oracle efficient algorithms for groupwise regret

Krishna Acharya, Eshwar Ram Arunachaleswaran, Sampath Kannan, Aaron Roth, and Juba Ziani. Oracle efficient algorithms for groupwise regret. In Twelfth International Conference on Learning Representations, 2024

2024
[2]

The one-inclusion graph algorithm is not always optimal

Ishaq Aden-Ali, Yeshwanth Cherapanamjeri, Abhishek Shetty, and Nikita Zhivotovskiy. The one-inclusion graph algorithm is not always optimal. In Thirty-Sixth Annual Conference on Learning Theory, pages 72--88. PMLR, 2023 a

2023
[3]

Optimal PAC bounds without uniform convergence

Ishaq Aden-Ali, Yeshwanth Cherapanamjeri, Abhishek Shetty, and Nikita Zhivotovskiy. Optimal PAC bounds without uniform convergence. In Sixty-Fourth Annual Symposium on Foundations of Computer Science, pages 1203--1223. IEEE, 2023 b

2023
[4]

Colorings and orientations of graphs

Noga Alon and Michael Tarsi. Colorings and orientations of graphs. Combinatorica, 12 0 (2): 0 125--134, 1992

1992
[5]

Group-realizable multi-group learning by minimizing empirical risk

Navid Ardeshir, Samuel Deng, Daniel Hsu, and Jingwen Liu. Group-realizable multi-group learning by minimizing empirical risk. In Thirty-Seventh Annual Conference on Algorithmic Learning Theory. PMLR, 2026

2026
[6]

Regularization and optimal multiclass learning

Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, and Shang-Hua Teng. Regularization and optimal multiclass learning. In Thirty-Seventh Annual Conference on Learning Theory, pages 260--310. PMLR, 2024

2024
[7]

Panprediction: optimal predictions for any downstream task and loss

Sivaraman Balakrishnan, Nika Haghtalab, Daniel Hsu, Brian Lee, and Eric Zhao. Panprediction: optimal predictions for any downstream task and loss. In Twenty-Ninth International Conference on Artificial Intelligence and Statistics, 2026

2026
[8]

Prediction, learning, uniform convergence, and scale-sensitive dimensions

Peter L Bartlett and Philip M Long. Prediction, learning, uniform convergence, and scale-sensitive dimensions. Journal of Computer and System Sciences, 56 0 (2): 0 174--190, 1998

1998
[9]

Advancing subgroup fairness via sleeping experts

Avrim Blum and Thodoris Lykouris. Advancing subgroup fairness via sleeping experts. In Eleventh Innovations in Theoretical Computer Science Conference, 2020

2020
[10]

Learnability and the Vapnik-Chervonenkis dimension

Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36 0 (4): 0 929--965, 1989

1989
[11]

The Sample Complexity of Multicalibration

Natalie Collina, Jiuyao Lu, Georgy Noarov, and Aaron Roth. The sample complexity of multicalibration. arXiv preprint arXiv:2604.21923, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[12]

Group-wise oracle-efficient algorithms for online multi-group learning

Samuel Deng, Daniel Hsu, and Jingwen Liu. Group-wise oracle-efficient algorithms for online multi-group learning. In Advances in Neural Information Processing Systems 37, 2024

2024
[13]

A general lower bound on the number of examples needed for learning

Andrzej Ehrenfeucht, David Haussler, Michael Kearns, and Leslie Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82 0 (3): 0 247--261, 1989

1989
[14]

Using and combining predictors that specialize

Yoav Freund, Robert E Schapire, Yoram Singer, and Manfred K Warmuth. Using and combining predictors that specialize. In Twenty-Ninth Annual Symposium on Theory of Computing, pages 334--343, 1997

1997
[15]

Sample-efficient omniprediction for proper losses

Isaac Gibbs and Ryan J Tibshirani. Sample-efficient omniprediction for proper losses. arXiv preprint arXiv:2510.12769, 2025

work page arXiv 2025
[16]

Omnipredictors

Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, and Udi Wieder. Omnipredictors. In Thirteenth Innovations in Theoretical Computer Science, 2022

2022
[17]

A unifying perspective on multi-calibration: Game dynamics for multi-objective learning

Nika Haghtalab, Michael Jordan, and Eric Zhao. A unifying perspective on multi-calibration: Game dynamics for multi-objective learning. In Advances in Neural Information Processing Systems 36, pages 72464--72506, 2023

2023
[18]

The optimal sample complexity of PAC learning

Steve Hanneke. The optimal sample complexity of PAC learning. Journal of Machine Learning Research, 17 0 (38): 0 1--15, 2016 a

2016
[19]

Refined error bounds for several learning algorithms

Steve Hanneke. Refined error bounds for several learning algorithms. Journal of Machine Learning Research, 17 0 (135): 0 1--55, 2016 b

2016
[20]

Minimax analysis of active learning

Steve Hanneke and Liu Yang. Minimax analysis of active learning. Journal of Machine Learning Research, 16 0 (1): 0 3487--3602, 2015

2015
[21]

Predicting \ 0, 1\ -functions on randomly drawn points

David Haussler, Nick Littlestone, and Manfred K Warmuth. Predicting \ 0, 1\ -functions on randomly drawn points. Information and Computation, 115 0 (2): 0 248--292, 1994

1994
[22]

Multicalibration: Calibration for the (computationally-identifiable) masses

Ursula H \'e bert-Johnson, Michael Kim, Omer Reingold, and Guy Rothblum. Multicalibration: Calibration for the (computationally-identifiable) masses. In Thirty-Fifth International Conference on Machine Learning, pages 1939--1948. PMLR, 2018

1939
[23]

Multiaccuracy: Black-box post-processing for fairness in classification

Michael P Kim, Amirata Ghorbani, and James Zou. Multiaccuracy: Black-box post-processing for fairness in classification. In AAAI/ACM Conference on AI, Ethics, and Society, pages 247--254, 2019

2019
[24]

Unlabeled compression schemes for maximum classes

Dima Kuzmin and Manfred K Warmuth. Unlabeled compression schemes for maximum classes. Journal of Machine Learning Research, 8 0 (9), 2007

2007
[25]

The complexity of learning according to two models of a drifting environment

Philip M Long. The complexity of learning according to two models of a drifting environment. In Eleventh Annual Conference on Computational Learning Theory, pages 116--125, 1998

1998
[26]

Agnostic multi-group active learning

Nicholas Rittler and Kamalika Chaudhuri. Agnostic multi-group active learning. In Advances in Neural Information Processing Systems 36, pages 1100--1118, 2023

2023
[27]

Multi-group agnostic PAC learnability

Guy Rothblum and Gal Yona. Multi-group agnostic PAC learnability. In Thirty-Eighth International Conference on Machine Learning, pages 9107--9115. PMLR, 2021

2021
[28]

Shifting: One-inclusion mistake bounds and sample compression

Benjamin IP Rubinstein, Peter L Bartlett, and J Hyam Rubinstein. Shifting: One-inclusion mistake bounds and sample compression. Journal of Computer and System Sciences, 75 0 (1): 0 37--59, 2009

2009
[29]

Simple and near-optimal algorithms for hidden stratification and multi-group learning

Christopher J Tosh and Daniel Hsu. Simple and near-optimal algorithms for hidden stratification and multi-group learning. In Thirty-Ninth International Conference on Machine Learning, pages 21633--21657. PMLR, 2022

2022
[30]

A theory of the learnable

Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27 0 (11): 0 1134--1142, 1984

1984
[31]

Theory of pattern recognition

Vladimir Vapnik and Alexey Chervonenkis. Theory of pattern recognition. Nauka, Moscow, 1974

1974

[1] [1]

Oracle efficient algorithms for groupwise regret

Krishna Acharya, Eshwar Ram Arunachaleswaran, Sampath Kannan, Aaron Roth, and Juba Ziani. Oracle efficient algorithms for groupwise regret. In Twelfth International Conference on Learning Representations, 2024

2024

[2] [2]

The one-inclusion graph algorithm is not always optimal

Ishaq Aden-Ali, Yeshwanth Cherapanamjeri, Abhishek Shetty, and Nikita Zhivotovskiy. The one-inclusion graph algorithm is not always optimal. In Thirty-Sixth Annual Conference on Learning Theory, pages 72--88. PMLR, 2023 a

2023

[3] [3]

Optimal PAC bounds without uniform convergence

Ishaq Aden-Ali, Yeshwanth Cherapanamjeri, Abhishek Shetty, and Nikita Zhivotovskiy. Optimal PAC bounds without uniform convergence. In Sixty-Fourth Annual Symposium on Foundations of Computer Science, pages 1203--1223. IEEE, 2023 b

2023

[4] [4]

Colorings and orientations of graphs

Noga Alon and Michael Tarsi. Colorings and orientations of graphs. Combinatorica, 12 0 (2): 0 125--134, 1992

1992

[5] [5]

Group-realizable multi-group learning by minimizing empirical risk

Navid Ardeshir, Samuel Deng, Daniel Hsu, and Jingwen Liu. Group-realizable multi-group learning by minimizing empirical risk. In Thirty-Seventh Annual Conference on Algorithmic Learning Theory. PMLR, 2026

2026

[6] [6]

Regularization and optimal multiclass learning

Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, and Shang-Hua Teng. Regularization and optimal multiclass learning. In Thirty-Seventh Annual Conference on Learning Theory, pages 260--310. PMLR, 2024

2024

[7] [7]

Panprediction: optimal predictions for any downstream task and loss

Sivaraman Balakrishnan, Nika Haghtalab, Daniel Hsu, Brian Lee, and Eric Zhao. Panprediction: optimal predictions for any downstream task and loss. In Twenty-Ninth International Conference on Artificial Intelligence and Statistics, 2026

2026

[8] [8]

Prediction, learning, uniform convergence, and scale-sensitive dimensions

Peter L Bartlett and Philip M Long. Prediction, learning, uniform convergence, and scale-sensitive dimensions. Journal of Computer and System Sciences, 56 0 (2): 0 174--190, 1998

1998

[9] [9]

Advancing subgroup fairness via sleeping experts

Avrim Blum and Thodoris Lykouris. Advancing subgroup fairness via sleeping experts. In Eleventh Innovations in Theoretical Computer Science Conference, 2020

2020

[10] [10]

Learnability and the Vapnik-Chervonenkis dimension

Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36 0 (4): 0 929--965, 1989

1989

[11] [11]

The Sample Complexity of Multicalibration

Natalie Collina, Jiuyao Lu, Georgy Noarov, and Aaron Roth. The sample complexity of multicalibration. arXiv preprint arXiv:2604.21923, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[12] [12]

Group-wise oracle-efficient algorithms for online multi-group learning

Samuel Deng, Daniel Hsu, and Jingwen Liu. Group-wise oracle-efficient algorithms for online multi-group learning. In Advances in Neural Information Processing Systems 37, 2024

2024

[13] [13]

A general lower bound on the number of examples needed for learning

Andrzej Ehrenfeucht, David Haussler, Michael Kearns, and Leslie Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82 0 (3): 0 247--261, 1989

1989

[14] [14]

Using and combining predictors that specialize

Yoav Freund, Robert E Schapire, Yoram Singer, and Manfred K Warmuth. Using and combining predictors that specialize. In Twenty-Ninth Annual Symposium on Theory of Computing, pages 334--343, 1997

1997

[15] [15]

Sample-efficient omniprediction for proper losses

Isaac Gibbs and Ryan J Tibshirani. Sample-efficient omniprediction for proper losses. arXiv preprint arXiv:2510.12769, 2025

work page arXiv 2025

[16] [16]

Omnipredictors

Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, and Udi Wieder. Omnipredictors. In Thirteenth Innovations in Theoretical Computer Science, 2022

2022

[17] [17]

A unifying perspective on multi-calibration: Game dynamics for multi-objective learning

Nika Haghtalab, Michael Jordan, and Eric Zhao. A unifying perspective on multi-calibration: Game dynamics for multi-objective learning. In Advances in Neural Information Processing Systems 36, pages 72464--72506, 2023

2023

[18] [18]

The optimal sample complexity of PAC learning

Steve Hanneke. The optimal sample complexity of PAC learning. Journal of Machine Learning Research, 17 0 (38): 0 1--15, 2016 a

2016

[19] [19]

Refined error bounds for several learning algorithms

Steve Hanneke. Refined error bounds for several learning algorithms. Journal of Machine Learning Research, 17 0 (135): 0 1--55, 2016 b

2016

[20] [20]

Minimax analysis of active learning

Steve Hanneke and Liu Yang. Minimax analysis of active learning. Journal of Machine Learning Research, 16 0 (1): 0 3487--3602, 2015

2015

[21] [21]

Predicting \ 0, 1\ -functions on randomly drawn points

David Haussler, Nick Littlestone, and Manfred K Warmuth. Predicting \ 0, 1\ -functions on randomly drawn points. Information and Computation, 115 0 (2): 0 248--292, 1994

1994

[22] [22]

Multicalibration: Calibration for the (computationally-identifiable) masses

Ursula H \'e bert-Johnson, Michael Kim, Omer Reingold, and Guy Rothblum. Multicalibration: Calibration for the (computationally-identifiable) masses. In Thirty-Fifth International Conference on Machine Learning, pages 1939--1948. PMLR, 2018

1939

[23] [23]

Multiaccuracy: Black-box post-processing for fairness in classification

Michael P Kim, Amirata Ghorbani, and James Zou. Multiaccuracy: Black-box post-processing for fairness in classification. In AAAI/ACM Conference on AI, Ethics, and Society, pages 247--254, 2019

2019

[24] [24]

Unlabeled compression schemes for maximum classes

Dima Kuzmin and Manfred K Warmuth. Unlabeled compression schemes for maximum classes. Journal of Machine Learning Research, 8 0 (9), 2007

2007

[25] [25]

The complexity of learning according to two models of a drifting environment

Philip M Long. The complexity of learning according to two models of a drifting environment. In Eleventh Annual Conference on Computational Learning Theory, pages 116--125, 1998

1998

[26] [26]

Agnostic multi-group active learning

Nicholas Rittler and Kamalika Chaudhuri. Agnostic multi-group active learning. In Advances in Neural Information Processing Systems 36, pages 1100--1118, 2023

2023

[27] [27]

Multi-group agnostic PAC learnability

Guy Rothblum and Gal Yona. Multi-group agnostic PAC learnability. In Thirty-Eighth International Conference on Machine Learning, pages 9107--9115. PMLR, 2021

2021

[28] [28]

Shifting: One-inclusion mistake bounds and sample compression

Benjamin IP Rubinstein, Peter L Bartlett, and J Hyam Rubinstein. Shifting: One-inclusion mistake bounds and sample compression. Journal of Computer and System Sciences, 75 0 (1): 0 37--59, 2009

2009

[29] [29]

Simple and near-optimal algorithms for hidden stratification and multi-group learning

Christopher J Tosh and Daniel Hsu. Simple and near-optimal algorithms for hidden stratification and multi-group learning. In Thirty-Ninth International Conference on Machine Learning, pages 21633--21657. PMLR, 2022

2022

[30] [30]

A theory of the learnable

Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27 0 (11): 0 1134--1142, 1984

1984

[31] [31]

Theory of pattern recognition

Vladimir Vapnik and Alexey Chervonenkis. Theory of pattern recognition. Nauka, Moscow, 1974

1974