The price of multi-group transductive learning
Pith reviewed 2026-06-28 07:22 UTC · model grok-4.3
The pith
Every multi-group learner in the transductive setting can suffer an error-rate penalty on some group that grows linearly with the number of groups.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We show every multi-group learner in the transductive setting may incur a multiplicative penalty in its error rate on some group relative to the error rate achievable in the single-group setting, and the penalty can increase linearly with the number of groups, up to roughly the square-root of the sample size. This stands in stark contrast to optimal multi-group learners in an analogous (group-realizable) statistical setting, where the penalty is always at most logarithmic in the sample size and independent of the number of groups.
What carries the argument
The multiplicative penalty factor on per-group error rate when comparing any transductive multi-group learner to its single-group counterpart.
If this is right
- Any transductive procedure that guarantees low error on every group must pay a per-group cost linear in the number of groups.
- The worst-case per-group error can be up to roughly sqrt(n) times larger than the single-group optimum.
- The separation between transductive and statistical multi-group performance is witnessed already in realizable distributions.
- Single-group transductive bounds cannot be lifted to the multi-group case without this extra factor.
Where Pith is reading between the lines
- Designers of transductive fairness algorithms may need to allocate sample size differently once the number of protected groups exceeds a small constant.
- The linear penalty suggests that empirical risk minimization over joint group-label hypotheses may be information-theoretically suboptimal in the transductive regime.
- Similar separations could appear in other batch settings where the test set is known in advance but labels are hidden.
Load-bearing premise
A group-realizable statistical setting exists in which the multi-group penalty stays logarithmic and does not grow with the number of groups.
What would settle it
An explicit transductive algorithm whose worst-group error multiplier remains logarithmic in sample size and independent of the number of groups, even on the distributions used to prove the linear lower bound.
read the original abstract
We show every multi-group learner in the transductive setting may incur a multiplicative penalty in its error rate on some group relative to the error rate achievable in the single-group setting, and the penalty can increasing linearly with the number of groups, up to roughly the square-root of the sample size. This stands in stark contrast to optimal multi-group learners in an analogous (group-realizable) statistical setting, where the penalty is always at most logarithmic in the sample size and independent of the number of groups.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims a separation result for multi-group learning: every multi-group learner in the transductive setting may incur a multiplicative penalty in its per-group error rate (relative to the single-group optimum) that grows linearly with the number of groups G, up to roughly sqrt(n). This is asserted to stand in contrast to an analogous group-realizable statistical setting, where the corresponding penalty is at most logarithmic in n and independent of G.
Significance. If the separation is established with precisely matching definitions of realizability, group partitions, and learner access, the result would clarify a fundamental distinction between transductive and statistical multi-group learning. It would supply a concrete lower-bound construction that scales with G and thereby inform algorithm design in fixed-sample regimes where groups are known in advance. The explicit contrast with prior statistical upper bounds is a potentially useful contribution, provided the models are shown to be comparable.
major comments (2)
- [Abstract and Section 3] Abstract and Section 3 (lower-bound construction): The central separation requires that the referenced 'analogous (group-realizable) statistical setting' uses identical notions of group partition, exact vs. approximate realizability, and access to group labels as the transductive lower-bound instance. The provided text does not contain an explicit statement confirming that the statistical upper bound (O(log n), independent of G) applies under the same group structure that forces the linear-in-G penalty in the transductive case. If the statistical model employs a strictly weaker group revelation or different realizability, the claimed gap does not follow.
- [Abstract] Abstract: The lower-bound claim states that the multiplicative penalty 'can increasing linearly with the number of groups, up to roughly the square-root of the sample size.' Without the explicit construction or the precise definition of the transductive multi-group model (including how groups are presented to the learner), it is impossible to verify whether the linear dependence on G is forced or whether the sqrt(n) scaling arises from a particular choice of n and G.
minor comments (2)
- [Abstract] Abstract contains a grammatical error: 'the penalty can increasing linearly' should read 'the penalty can increase linearly'.
- [Abstract] The abstract refers to 'roughly the square-root of the sample size' without specifying whether this is sqrt(n), sqrt(m) for some other quantity, or a function of both n and G; a precise asymptotic statement would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the detailed comments on model comparability and the need for explicit verification of the separation. We address each point below and will revise the manuscript to improve clarity on the matching definitions between settings.
read point-by-point responses
-
Referee: [Abstract and Section 3] Abstract and Section 3 (lower-bound construction): The central separation requires that the referenced 'analogous (group-realizable) statistical setting' uses identical notions of group partition, exact vs. approximate realizability, and access to group labels as the transductive lower-bound instance. The provided text does not contain an explicit statement confirming that the statistical upper bound (O(log n), independent of G) applies under the same group structure that forces the linear-in-G penalty in the transductive case. If the statistical model employs a strictly weaker group revelation or different realizability, the claimed gap does not follow.
Authors: We agree an explicit confirmation strengthens the argument. Section 2 defines the transductive multi-group model with a fixed partition into G groups known to the learner in advance, and group-realizability meaning there exists a hypothesis with zero error on every group. The lower-bound instance in Section 3 uses precisely this partition and realizability notion. The referenced statistical upper bound (O(log n) independent of G) is from the group-realizable statistical model with identical group access and realizability. In the revision we will insert a paragraph at the start of Section 3 that directly states these elements match the statistical setting used for the contrast. revision: yes
-
Referee: [Abstract] Abstract: The lower-bound claim states that the multiplicative penalty 'can increasing linearly with the number of groups, up to roughly the square-root of the sample size.' Without the explicit construction or the precise definition of the transductive multi-group model (including how groups are presented to the learner), it is impossible to verify whether the linear dependence on G is forced or whether the sqrt(n) scaling arises from a particular choice of n and G.
Authors: The precise definition of the transductive multi-group model, including that groups are presented as a known partition of the n points, appears in Section 2. The explicit lower-bound construction forcing the linear-in-G penalty (up to sqrt(n)) is given in full in Section 3 via a concrete hypothesis class and labeling. The abstract is a high-level summary; we will add a sentence directing readers to Sections 2 and 3 for the model and construction details. revision: partial
Circularity Check
No circularity; theoretical separation stands on independent lower-bound construction
full rationale
The manuscript is a theoretical paper establishing a lower-bound separation between transductive and statistical multi-group learning. No equations, fitted parameters, or self-citations appear in the abstract or description that would reduce any claimed result to its own inputs by construction. The contrast between linear-in-G transductive penalty and logarithmic statistical penalty is asserted via standard realizability assumptions without self-definitional loops, renamed empirical patterns, or load-bearing self-citations. The derivation is therefore self-contained against external learning-theoretic benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard definitions of transductive and statistical (group-realizable) learning models
Reference graph
Works this paper leans on
-
[1]
Oracle efficient algorithms for groupwise regret
Krishna Acharya, Eshwar Ram Arunachaleswaran, Sampath Kannan, Aaron Roth, and Juba Ziani. Oracle efficient algorithms for groupwise regret. In Twelfth International Conference on Learning Representations, 2024
2024
-
[2]
The one-inclusion graph algorithm is not always optimal
Ishaq Aden-Ali, Yeshwanth Cherapanamjeri, Abhishek Shetty, and Nikita Zhivotovskiy. The one-inclusion graph algorithm is not always optimal. In Thirty-Sixth Annual Conference on Learning Theory, pages 72--88. PMLR, 2023 a
2023
-
[3]
Optimal PAC bounds without uniform convergence
Ishaq Aden-Ali, Yeshwanth Cherapanamjeri, Abhishek Shetty, and Nikita Zhivotovskiy. Optimal PAC bounds without uniform convergence. In Sixty-Fourth Annual Symposium on Foundations of Computer Science, pages 1203--1223. IEEE, 2023 b
2023
-
[4]
Colorings and orientations of graphs
Noga Alon and Michael Tarsi. Colorings and orientations of graphs. Combinatorica, 12 0 (2): 0 125--134, 1992
1992
-
[5]
Group-realizable multi-group learning by minimizing empirical risk
Navid Ardeshir, Samuel Deng, Daniel Hsu, and Jingwen Liu. Group-realizable multi-group learning by minimizing empirical risk. In Thirty-Seventh Annual Conference on Algorithmic Learning Theory. PMLR, 2026
2026
-
[6]
Regularization and optimal multiclass learning
Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, and Shang-Hua Teng. Regularization and optimal multiclass learning. In Thirty-Seventh Annual Conference on Learning Theory, pages 260--310. PMLR, 2024
2024
-
[7]
Panprediction: optimal predictions for any downstream task and loss
Sivaraman Balakrishnan, Nika Haghtalab, Daniel Hsu, Brian Lee, and Eric Zhao. Panprediction: optimal predictions for any downstream task and loss. In Twenty-Ninth International Conference on Artificial Intelligence and Statistics, 2026
2026
-
[8]
Prediction, learning, uniform convergence, and scale-sensitive dimensions
Peter L Bartlett and Philip M Long. Prediction, learning, uniform convergence, and scale-sensitive dimensions. Journal of Computer and System Sciences, 56 0 (2): 0 174--190, 1998
1998
-
[9]
Advancing subgroup fairness via sleeping experts
Avrim Blum and Thodoris Lykouris. Advancing subgroup fairness via sleeping experts. In Eleventh Innovations in Theoretical Computer Science Conference, 2020
2020
-
[10]
Learnability and the Vapnik-Chervonenkis dimension
Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36 0 (4): 0 929--965, 1989
1989
-
[11]
The Sample Complexity of Multicalibration
Natalie Collina, Jiuyao Lu, Georgy Noarov, and Aaron Roth. The sample complexity of multicalibration. arXiv preprint arXiv:2604.21923, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[12]
Group-wise oracle-efficient algorithms for online multi-group learning
Samuel Deng, Daniel Hsu, and Jingwen Liu. Group-wise oracle-efficient algorithms for online multi-group learning. In Advances in Neural Information Processing Systems 37, 2024
2024
-
[13]
A general lower bound on the number of examples needed for learning
Andrzej Ehrenfeucht, David Haussler, Michael Kearns, and Leslie Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82 0 (3): 0 247--261, 1989
1989
-
[14]
Using and combining predictors that specialize
Yoav Freund, Robert E Schapire, Yoram Singer, and Manfred K Warmuth. Using and combining predictors that specialize. In Twenty-Ninth Annual Symposium on Theory of Computing, pages 334--343, 1997
1997
-
[15]
Sample-efficient omniprediction for proper losses
Isaac Gibbs and Ryan J Tibshirani. Sample-efficient omniprediction for proper losses. arXiv preprint arXiv:2510.12769, 2025
-
[16]
Omnipredictors
Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, and Udi Wieder. Omnipredictors. In Thirteenth Innovations in Theoretical Computer Science, 2022
2022
-
[17]
A unifying perspective on multi-calibration: Game dynamics for multi-objective learning
Nika Haghtalab, Michael Jordan, and Eric Zhao. A unifying perspective on multi-calibration: Game dynamics for multi-objective learning. In Advances in Neural Information Processing Systems 36, pages 72464--72506, 2023
2023
-
[18]
The optimal sample complexity of PAC learning
Steve Hanneke. The optimal sample complexity of PAC learning. Journal of Machine Learning Research, 17 0 (38): 0 1--15, 2016 a
2016
-
[19]
Refined error bounds for several learning algorithms
Steve Hanneke. Refined error bounds for several learning algorithms. Journal of Machine Learning Research, 17 0 (135): 0 1--55, 2016 b
2016
-
[20]
Minimax analysis of active learning
Steve Hanneke and Liu Yang. Minimax analysis of active learning. Journal of Machine Learning Research, 16 0 (1): 0 3487--3602, 2015
2015
-
[21]
Predicting \ 0, 1\ -functions on randomly drawn points
David Haussler, Nick Littlestone, and Manfred K Warmuth. Predicting \ 0, 1\ -functions on randomly drawn points. Information and Computation, 115 0 (2): 0 248--292, 1994
1994
-
[22]
Multicalibration: Calibration for the (computationally-identifiable) masses
Ursula H \'e bert-Johnson, Michael Kim, Omer Reingold, and Guy Rothblum. Multicalibration: Calibration for the (computationally-identifiable) masses. In Thirty-Fifth International Conference on Machine Learning, pages 1939--1948. PMLR, 2018
1939
-
[23]
Multiaccuracy: Black-box post-processing for fairness in classification
Michael P Kim, Amirata Ghorbani, and James Zou. Multiaccuracy: Black-box post-processing for fairness in classification. In AAAI/ACM Conference on AI, Ethics, and Society, pages 247--254, 2019
2019
-
[24]
Unlabeled compression schemes for maximum classes
Dima Kuzmin and Manfred K Warmuth. Unlabeled compression schemes for maximum classes. Journal of Machine Learning Research, 8 0 (9), 2007
2007
-
[25]
The complexity of learning according to two models of a drifting environment
Philip M Long. The complexity of learning according to two models of a drifting environment. In Eleventh Annual Conference on Computational Learning Theory, pages 116--125, 1998
1998
-
[26]
Agnostic multi-group active learning
Nicholas Rittler and Kamalika Chaudhuri. Agnostic multi-group active learning. In Advances in Neural Information Processing Systems 36, pages 1100--1118, 2023
2023
-
[27]
Multi-group agnostic PAC learnability
Guy Rothblum and Gal Yona. Multi-group agnostic PAC learnability. In Thirty-Eighth International Conference on Machine Learning, pages 9107--9115. PMLR, 2021
2021
-
[28]
Shifting: One-inclusion mistake bounds and sample compression
Benjamin IP Rubinstein, Peter L Bartlett, and J Hyam Rubinstein. Shifting: One-inclusion mistake bounds and sample compression. Journal of Computer and System Sciences, 75 0 (1): 0 37--59, 2009
2009
-
[29]
Simple and near-optimal algorithms for hidden stratification and multi-group learning
Christopher J Tosh and Daniel Hsu. Simple and near-optimal algorithms for hidden stratification and multi-group learning. In Thirty-Ninth International Conference on Machine Learning, pages 21633--21657. PMLR, 2022
2022
-
[30]
A theory of the learnable
Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27 0 (11): 0 1134--1142, 1984
1984
-
[31]
Theory of pattern recognition
Vladimir Vapnik and Alexey Chervonenkis. Theory of pattern recognition. Nauka, Moscow, 1974
1974
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.