The Interplay Between Interpolation and Aggregation in Regression: Optimal Sample Complexity
Pith reviewed 2026-06-29 08:30 UTC · model grok-4.3
The pith
The γ-graph dimension characterizes learnability for aggregation of interpolating hypotheses in regression, with a median of three being optimal and stronger than proper learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The γ-graph dimension characterizes learnability for a broad class of natural aggregation procedures. Furthermore, an extremely simple aggregation procedure, combining three interpolating hypotheses via the median, is optimal among all these aggregation procedures, and is strictly more powerful than proper learning. Finally, some hypothesis classes are learnable only by aggregating infinitely many hypotheses or by using non-interpolating aggregation rules (which may predict outside the range of their inputs), and any finite interpolating aggregation fails to achieve even trivial performance.
What carries the argument
The γ-graph dimension, the combinatorial parameter that determines when aggregation procedures over interpolating hypotheses succeed in regression.
If this is right
- Learnability via the considered aggregation procedures holds if and only if the γ-graph dimension is finite.
- The median-of-three rule attains the optimal sample complexity among all natural interpolating aggregation procedures.
- The median-of-three rule succeeds on strictly more hypothesis classes than any proper learner.
- Some hypothesis classes require either infinitely many interpolating hypotheses or non-interpolating rules to achieve non-trivial performance.
- No finite interpolating aggregation achieves even trivial performance on those classes.
Where Pith is reading between the lines
- Practitioners might default to median aggregation of a few interpolators rather than searching for a single non-interpolating model.
- The results suggest examining whether similar dimension characterizations exist for classification or other loss functions.
- Computational questions arise around efficiently constructing or verifying the required interpolating hypotheses.
- The separation between finite and infinite aggregation points to possible limits on ensemble size in practice.
Load-bearing premise
The γ-graph dimension is the correct combinatorial measure for the specific family of aggregation procedures considered, and the optimality result holds for all hypothesis classes satisfying the dimension condition.
What would settle it
A hypothesis class with finite γ-graph dimension on which the median of three interpolating hypotheses fails to achieve the claimed sample complexity, or on which some other aggregation procedure performs strictly better.
read the original abstract
This work investigates theoretically the interplay between interpolation and aggregation in regression. We establish that the $\gamma$-graph dimension characterizes learnability for a broad class of natural aggregation procedures. Furthermore, we prove that an extremely simple aggregation procedure, combining three interpolating hypotheses via the median, is optimal among all these aggregation procedures, and is strictly more powerful than proper learning. Finally, we show that some hypothesis classes are learnable only by aggregating infinitely many hypotheses or by using non-interpolating aggregation rules (which may predict outside the range of their inputs), and any finite interpolating aggregation fails to achieve even trivial performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that the γ-graph dimension exactly characterizes learnability (with matching upper and lower bounds on sample complexity) for a broad class of natural aggregation procedures over interpolating hypotheses in regression. It further proves that the median of any three interpolating hypotheses is optimal among all such procedures and strictly more powerful than proper learning, while some hypothesis classes are learnable only via infinite aggregations or non-interpolating rules.
Significance. If the matching bounds hold, the work supplies a combinatorial characterization of learnability under interpolation-plus-aggregation, identifies an extremely simple optimal aggregator, and delineates the boundary between finite interpolating and more general rules. The explicit positive and negative results for finite vs. infinite aggregations constitute a clear advance in the theory of interpolation.
minor comments (2)
- [§2.2] §2.2: the notation for the output range of non-interpolating aggregators is introduced without an explicit contrast to the interpolating case; adding one sentence would improve readability.
- [Figure 1] Figure 1 caption: the legend does not indicate whether the plotted curves are upper or lower bounds; a parenthetical clarification would help.
Simulated Author's Rebuttal
We thank the referee for their positive summary of the paper and for recommending acceptance. No major comments were raised in the report.
Circularity Check
No significant circularity detected
full rationale
The derivation relies on defining the γ-graph dimension as an external combinatorial quantity and then proving matching upper and lower bounds on sample complexity for finite aggregators (including the three-median rule) versus infinite or non-interpolating rules. No step reduces a claimed prediction to a fitted parameter by construction, renames a known result, or imports a uniqueness theorem solely via self-citation; the bounds are derived directly from the dimension without the target result being presupposed in the definition or aggregation class. The analysis is self-contained against the stated combinatorial assumptions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
URLhttps://doi.org/10.1016/j.jcss.2023.103465
doi: 10.1016/J.JCSS.2023.103465. URLhttps://doi.org/10.1016/j.jcss.2023.103465. Bartlett, P. L., Long, P. M., and Williamson, R. C. Fat-shattering and the learnability of real-valued functions. In Warmuth, M. K. (ed.),Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, COLT 1994, New Brunswick, NJ, USA, July 12-15, 1994, pp.299–
-
[2]
ACM, 1994. doi: 10.1145/180139.181158. URLhttps://doi.org/10.1145/180139.181158. Bartlett, P. L., Long, P. M., Lugosi, G., and Tsigler, A. Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48):30063–30070, 2020. doi: 10.1073/pnas. 1907378117. URLhttps://www.pnas.org/doi/abs/10.1073/pnas.1907378117. Belkin, M. Fi...
-
[3]
URL http://proceedings.mlr.press/v19/daniely11a/daniely11a
JMLR.org, 2011. URL http://proceedings.mlr.press/v19/daniely11a/daniely11a. pdf. Daskalakis, C. and Golowich, N. Fast rates for nonparametric online learning: from realizability to learning in games. In Leonardi, S. and Gupta, A. (eds.),STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pp. 846–859. ACM, 20...
-
[4]
Verifying groups in linear time
URLhttps://jmlr.org/papers/v17/15-389.html. Hanneke, S., Larsen, K. G., and Zhivotovskiy, N. Revisiting agnostic PAC learning. In65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024, pp. 1968–1982. IEEE, 2024. doi: 10.1109/FOCS61266.2024.00118. URL https://doi.org/10.1109/FOCS61266.2024.00118. Has...
-
[5]
The sub-sequences need not be disjoint
Construct sub-training sequencesS1,...,Sm from S, wherem may depend onS and each (x,y)∈Sj satisfies(x,y)∈S. The sub-sequences need not be disjoint
-
[6]
ApplyAto each sub-sequence to obtain hypothesesh 1,...,hm
-
[7]
17 WhenAis clear from context, we writeA′(S) =A′(S,A)
Return the predictorx↦→r(h1(x),...,hm(x),x ), wherer: [0, 1]∗×X→[0, 1]is an aggregation rule. 17 WhenAis clear from context, we writeA′(S) =A′(S,A). Definition 3.4(Proper Aggregation Rule).An aggregation ruler: [0, 1]∗×X→[0, 1]isproperif for every sequencez1,...,zm∈[0,1]and everyx∈X, we haver(z1,...,zm,x)∈{z1,...,zm}. Definition 3.6(Finite Aggregation Alg...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.