The Interplay Between Interpolation and Aggregation in Regression: Optimal Sample Complexity

Kasper Green Larsen; Liang-Yu Zou; Mikael M{\o}ller H{\o}gsgaard

arxiv: 2605.29819 · v1 · pith:RZ3G2KV3new · submitted 2026-05-28 · 💻 cs.LG

The Interplay Between Interpolation and Aggregation in Regression: Optimal Sample Complexity

Mikael M{\o}ller H{\o}gsgaard , Kasper Green Larsen , Liang-Yu Zou This is my paper

Pith reviewed 2026-06-29 08:30 UTC · model grok-4.3

classification 💻 cs.LG

keywords interpolationaggregationregressionlearnabilitysample complexityγ-graph dimensionproper learning

0 comments

The pith

The γ-graph dimension characterizes learnability for aggregation of interpolating hypotheses in regression, with a median of three being optimal and stronger than proper learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how interpolation and aggregation interact in regression tasks. It shows that the γ-graph dimension determines whether a broad family of aggregation procedures can learn a given hypothesis class. A simple rule that takes the median of three interpolating hypotheses is proven to be optimal within this family and to succeed on more classes than any single proper hypothesis. The work also identifies hypothesis classes that cannot be learned at all by any finite interpolating aggregation.

Core claim

The γ-graph dimension characterizes learnability for a broad class of natural aggregation procedures. Furthermore, an extremely simple aggregation procedure, combining three interpolating hypotheses via the median, is optimal among all these aggregation procedures, and is strictly more powerful than proper learning. Finally, some hypothesis classes are learnable only by aggregating infinitely many hypotheses or by using non-interpolating aggregation rules (which may predict outside the range of their inputs), and any finite interpolating aggregation fails to achieve even trivial performance.

What carries the argument

The γ-graph dimension, the combinatorial parameter that determines when aggregation procedures over interpolating hypotheses succeed in regression.

If this is right

Learnability via the considered aggregation procedures holds if and only if the γ-graph dimension is finite.
The median-of-three rule attains the optimal sample complexity among all natural interpolating aggregation procedures.
The median-of-three rule succeeds on strictly more hypothesis classes than any proper learner.
Some hypothesis classes require either infinitely many interpolating hypotheses or non-interpolating rules to achieve non-trivial performance.
No finite interpolating aggregation achieves even trivial performance on those classes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners might default to median aggregation of a few interpolators rather than searching for a single non-interpolating model.
The results suggest examining whether similar dimension characterizations exist for classification or other loss functions.
Computational questions arise around efficiently constructing or verifying the required interpolating hypotheses.
The separation between finite and infinite aggregation points to possible limits on ensemble size in practice.

Load-bearing premise

The γ-graph dimension is the correct combinatorial measure for the specific family of aggregation procedures considered, and the optimality result holds for all hypothesis classes satisfying the dimension condition.

What would settle it

A hypothesis class with finite γ-graph dimension on which the median of three interpolating hypotheses fails to achieve the claimed sample complexity, or on which some other aggregation procedure performs strictly better.

read the original abstract

This work investigates theoretically the interplay between interpolation and aggregation in regression. We establish that the $\gamma$-graph dimension characterizes learnability for a broad class of natural aggregation procedures. Furthermore, we prove that an extremely simple aggregation procedure, combining three interpolating hypotheses via the median, is optimal among all these aggregation procedures, and is strictly more powerful than proper learning. Finally, we show that some hypothesis classes are learnable only by aggregating infinitely many hypotheses or by using non-interpolating aggregation rules (which may predict outside the range of their inputs), and any finite interpolating aggregation fails to achieve even trivial performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces the γ-graph dimension to characterize learnability under interpolation plus aggregation and shows that median-of-three on interpolators is optimal and beats proper learning.

read the letter

The main takeaway is that this paper pins down a combinatorial dimension that separates learnable from unlearnable classes when you allow aggregation of interpolating regressors, and it proves that a very simple median-of-three rule achieves the optimal sample complexity among a natural family of aggregators. That result is new on the basis of the abstract and the stress-test note, and it sits strictly above proper learning.

What the work does cleanly is match upper and lower bounds from the same dimension for finite aggregations, while also exhibiting classes that require either infinitely many hypotheses or rules that can output outside the observed range. The stress-test confirms the proofs establish both directions without obvious circularity in the dimension definition.

The soft spots are modest. The abstract and stress-test give no indication of how hard it is to compute or estimate the γ-graph dimension in practice, so the characterization is theoretically sharp but its utility for concrete hypothesis classes remains to be seen. There is also no discussion yet of how sensitive the optimality is to the precise noise model or to mild relaxations of the interpolation requirement.

This is a paper for people working on the theory of overparameterized ensembles and interpolation regimes. A reader who already follows the literature on graph dimensions or aggregation in regression will get immediate value from the new dimension and the median-of-three optimality. It deserves a serious referee because the central claims rest on matching bounds rather than loose arguments, even if the practical reach of the dimension needs more examples.

Referee Report

0 major / 2 minor

Summary. The paper claims that the γ-graph dimension exactly characterizes learnability (with matching upper and lower bounds on sample complexity) for a broad class of natural aggregation procedures over interpolating hypotheses in regression. It further proves that the median of any three interpolating hypotheses is optimal among all such procedures and strictly more powerful than proper learning, while some hypothesis classes are learnable only via infinite aggregations or non-interpolating rules.

Significance. If the matching bounds hold, the work supplies a combinatorial characterization of learnability under interpolation-plus-aggregation, identifies an extremely simple optimal aggregator, and delineates the boundary between finite interpolating and more general rules. The explicit positive and negative results for finite vs. infinite aggregations constitute a clear advance in the theory of interpolation.

minor comments (2)

[§2.2] §2.2: the notation for the output range of non-interpolating aggregators is introduced without an explicit contrast to the interpolating case; adding one sentence would improve readability.
[Figure 1] Figure 1 caption: the legend does not indicate whether the plotted curves are upper or lower bounds; a parenthetical clarification would help.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the paper and for recommending acceptance. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation relies on defining the γ-graph dimension as an external combinatorial quantity and then proving matching upper and lower bounds on sample complexity for finite aggregators (including the three-median rule) versus infinite or non-interpolating rules. No step reduces a claimed prediction to a fitted parameter by construction, renames a known result, or imports a uniqueness theorem solely via self-citation; the bounds are derived directly from the dimension without the target result being presupposed in the definition or aggregation class. The analysis is self-contained against the stated combinatorial assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated or derivable.

pith-pipeline@v0.9.1-grok · 5637 in / 980 out tokens · 24225 ms · 2026-06-29T08:30:11.464196+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

7 extracted references · 4 canonical work pages

[1]

URLhttps://doi.org/10.1016/j.jcss.2023.103465

doi: 10.1016/J.JCSS.2023.103465. URLhttps://doi.org/10.1016/j.jcss.2023.103465. Bartlett, P. L., Long, P. M., and Williamson, R. C. Fat-shattering and the learnability of real-valued functions. In Warmuth, M. K. (ed.),Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, COLT 1994, New Brunswick, NJ, USA, July 12-15, 1994, pp.299–

work page doi:10.1016/j.jcss.2023.103465 2023
[2]

doi: 10.1145/180139.181158

ACM, 1994. doi: 10.1145/180139.181158. URLhttps://doi.org/10.1145/180139.181158. Bartlett, P. L., Long, P. M., Lugosi, G., and Tsigler, A. Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48):30063–30070, 2020. doi: 10.1073/pnas. 1907378117. URLhttps://www.pnas.org/doi/abs/10.1073/pnas.1907378117. Belkin, M. Fi...

work page doi:10.1145/180139.181158 1994
[3]

URL http://proceedings.mlr.press/v19/daniely11a/daniely11a

JMLR.org, 2011. URL http://proceedings.mlr.press/v19/daniely11a/daniely11a. pdf. Daskalakis, C. and Golowich, N. Fast rates for nonparametric online learning: from realizability to learning in games. In Leonardi, S. and Gupta, A. (eds.),STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pp. 846–859. ACM, 20...

work page doi:10.1145/3519935.3519950 2011
[4]

Verifying groups in linear time

URLhttps://jmlr.org/papers/v17/15-389.html. Hanneke, S., Larsen, K. G., and Zhivotovskiy, N. Revisiting agnostic PAC learning. In65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024, pp. 1968–1982. IEEE, 2024. doi: 10.1109/FOCS61266.2024.00118. URL https://doi.org/10.1109/FOCS61266.2024.00118. Has...

work page doi:10.1109/focs61266.2024.00118 2024
[5]

The sub-sequences need not be disjoint

Construct sub-training sequencesS1,...,Sm from S, wherem may depend onS and each (x,y)∈Sj satisfies(x,y)∈S. The sub-sequences need not be disjoint
[6]

ApplyAto each sub-sequence to obtain hypothesesh 1,...,hm
[7]

17 WhenAis clear from context, we writeA′(S) =A′(S,A)

Return the predictorx↦→r(h1(x),...,hm(x),x ), wherer: [0, 1]∗×X→[0, 1]is an aggregation rule. 17 WhenAis clear from context, we writeA′(S) =A′(S,A). Definition 3.4(Proper Aggregation Rule).An aggregation ruler: [0, 1]∗×X→[0, 1]isproperif for every sequencez1,...,zm∈[0,1]and everyx∈X, we haver(z1,...,zm,x)∈{z1,...,zm}. Definition 3.6(Finite Aggregation Alg...

2024

[1] [1]

URLhttps://doi.org/10.1016/j.jcss.2023.103465

doi: 10.1016/J.JCSS.2023.103465. URLhttps://doi.org/10.1016/j.jcss.2023.103465. Bartlett, P. L., Long, P. M., and Williamson, R. C. Fat-shattering and the learnability of real-valued functions. In Warmuth, M. K. (ed.),Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, COLT 1994, New Brunswick, NJ, USA, July 12-15, 1994, pp.299–

work page doi:10.1016/j.jcss.2023.103465 2023

[2] [2]

doi: 10.1145/180139.181158

ACM, 1994. doi: 10.1145/180139.181158. URLhttps://doi.org/10.1145/180139.181158. Bartlett, P. L., Long, P. M., Lugosi, G., and Tsigler, A. Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48):30063–30070, 2020. doi: 10.1073/pnas. 1907378117. URLhttps://www.pnas.org/doi/abs/10.1073/pnas.1907378117. Belkin, M. Fi...

work page doi:10.1145/180139.181158 1994

[3] [3]

URL http://proceedings.mlr.press/v19/daniely11a/daniely11a

JMLR.org, 2011. URL http://proceedings.mlr.press/v19/daniely11a/daniely11a. pdf. Daskalakis, C. and Golowich, N. Fast rates for nonparametric online learning: from realizability to learning in games. In Leonardi, S. and Gupta, A. (eds.),STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pp. 846–859. ACM, 20...

work page doi:10.1145/3519935.3519950 2011

[4] [4]

Verifying groups in linear time

URLhttps://jmlr.org/papers/v17/15-389.html. Hanneke, S., Larsen, K. G., and Zhivotovskiy, N. Revisiting agnostic PAC learning. In65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024, pp. 1968–1982. IEEE, 2024. doi: 10.1109/FOCS61266.2024.00118. URL https://doi.org/10.1109/FOCS61266.2024.00118. Has...

work page doi:10.1109/focs61266.2024.00118 2024

[5] [5]

The sub-sequences need not be disjoint

Construct sub-training sequencesS1,...,Sm from S, wherem may depend onS and each (x,y)∈Sj satisfies(x,y)∈S. The sub-sequences need not be disjoint

[6] [6]

ApplyAto each sub-sequence to obtain hypothesesh 1,...,hm

[7] [7]

17 WhenAis clear from context, we writeA′(S) =A′(S,A)

Return the predictorx↦→r(h1(x),...,hm(x),x ), wherer: [0, 1]∗×X→[0, 1]is an aggregation rule. 17 WhenAis clear from context, we writeA′(S) =A′(S,A). Definition 3.4(Proper Aggregation Rule).An aggregation ruler: [0, 1]∗×X→[0, 1]isproperif for every sequencez1,...,zm∈[0,1]and everyx∈X, we haver(z1,...,zm,x)∈{z1,...,zm}. Definition 3.6(Finite Aggregation Alg...

2024