On the Koopman-Based Generalization Bounds for Multi-Task Deep Learning

Giuseppe Di Fatta; Giuseppe Nicosia; Mahdi Mohammadigohari; Panos M. Pardalos

arxiv: 2512.19199 · v2 · pith:BZGXDTO7new · submitted 2025-12-22 · 💻 cs.LG · cs.AI

On the Koopman-Based Generalization Bounds for Multi-Task Deep Learning

Mahdi Mohammadigohari , Giuseppe Di Fatta , Giuseppe Nicosia , Panos M. Pardalos This is my paper

Pith reviewed 2026-05-25 07:39 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords generalization boundsmulti-task deep learningKoopman operatoroperator theorySobolev spacecondition numberskernel methods

0 comments

The pith

Koopman operator methods produce tighter generalization bounds for multi-task deep neural networks by using small condition numbers of weight matrices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies operator-theoretic techniques based on the Koopman operator to derive generalization bounds in multi-task deep learning. It expands the hypothesis space with a tailored Sobolev space and exploits the small condition numbers of the weight matrices to obtain a bound that is tighter than those from conventional norm-based methods. This bound continues to hold in single-output settings and does not depend on network width, linking deep learning more closely to kernel methods.

Core claim

By introducing a Koopman-based analysis within an expanded Sobolev hypothesis space and bounding the condition numbers of the weight matrices, the authors derive generalization bounds for multitask deep neural networks that outperform existing Koopman-based bounds while remaining valid even for single-output networks.

What carries the argument

The Koopman operator applied to the weight matrices within a Sobolev space hypothesis class, which enables tighter operator-theoretic bounds independent of network width.

If this is right

The resulting bounds apply to both multi-task and single-task settings.
The framework remains flexible and does not require assumptions tied to network width.
The approach provides a more precise theoretical link between deep networks and kernel methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could extend to analyzing other neural architectures by lifting them into operator spaces.
Controlling condition numbers during training might serve as a practical way to achieve the tighter bounds.
Similar techniques may connect to stability analysis in dynamical systems for neural networks.

Load-bearing premise

The weight matrices have sufficiently small condition numbers that allow tightening the generalization bound beyond standard norm-based methods.

What would settle it

A counterexample or empirical case where networks with large condition numbers in their weight matrices show that the proposed bound is not tighter than previous Koopman or norm-based bounds.

Figures

Figures reproduced from arXiv: 2512.19199 by Giuseppe Di Fatta, Giuseppe Nicosia, Mahdi Mohammadigohari, Panos M. Pardalos.

read the original abstract

The paper establishes generalization bounds for multitask deep neural networks using operator-theoretic techniques. The authors propose a tighter bound than those derived from conventional norm based methods by leveraging small condition numbers in the weight matrices and introducing a tailored Sobolev space as an expanded hypothesis space. This enhanced bound remains valid even in single output settings, outperforming existing Koopman based bounds. The resulting framework maintains key advantages such as flexibility and independence from network width, offering a more precise theoretical understanding of multitask deep learning in the context of kernel methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims a tighter Koopman-operator generalization bound for multi-task nets by assuming small condition numbers on the weights plus a tailored Sobolev space, but that assumption is presented without derivation or control.

read the letter

The central claim is a generalization bound for multi-task deep networks that improves on prior Koopman work and stays valid even for single-output cases. It does this by moving to a custom Sobolev space and invoking small condition numbers on the weight matrices to get a tighter result while remaining independent of network width. That combination is the main novelty on offer from the abstract. The operator-theoretic framing itself is not brand new, but extending it this way to multi-task settings with the claimed flexibility is a concrete step if the details hold. The independence from width is a practical plus if it survives the proof. The main soft spot is the condition-number premise. The abstract treats small condition numbers as a lever to tighten the bound over norm-based methods, yet gives no indication that these numbers remain small under multi-task training dynamics or supplies an explicit bound on them. Without that, the improvement is conditional rather than unconditional. There is also no visible empirical check on bound tightness or comparison against standard baselines, which leaves the practical value unclear. The tailored Sobolev space is mentioned but not unpacked enough to judge whether it introduces hidden restrictions. This work is aimed at people already working on operator methods and generalization bounds in deep learning. A reader who follows the Koopman literature might find the single-output extension and width independence worth checking, but the paper needs the full derivation to be useful. It deserves a serious referee because the claim is specific enough to evaluate and the area is active, even though the condition-number step will need close scrutiny.

Referee Report

2 major / 0 minor

Summary. The paper claims to establish generalization bounds for multi-task deep neural networks via Koopman operator techniques. It asserts a tighter bound than norm-based methods by leveraging small condition numbers of weight matrices together with a tailored Sobolev space as an expanded hypothesis space. The resulting bound is stated to remain valid even in single-output settings, to outperform prior Koopman-based bounds, and to preserve flexibility and width-independence.

Significance. If the derivations hold and the condition-number premise can be justified, the work would supply a more precise operator-theoretic account of generalization in multi-task learning that connects to kernel methods without width dependence.

major comments (2)

[Abstract / main bound derivation] The central tightening of the bound is predicated on weight matrices having small condition numbers (abstract). No derivation, explicit upper bound, or verification under multi-task training dynamics is supplied to support this premise, rendering the improvement conditional on an unverified assumption rather than a derived property.
[Hypothesis space definition] The tailored Sobolev space is introduced as the expanded hypothesis space enabling the operator-theoretic improvement, yet its construction, norm equivalence, and integration with the Koopman operator are not detailed enough to confirm it avoids circularity with the condition-number hypothesis or delivers the claimed outperformance in single-output regimes.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on our manuscript. The feedback identifies important areas requiring additional justification and elaboration. We address each major comment below and will revise the manuscript to incorporate the requested details.

read point-by-point responses

Referee: [Abstract / main bound derivation] The central tightening of the bound is predicated on weight matrices having small condition numbers (abstract). No derivation, explicit upper bound, or verification under multi-task training dynamics is supplied to support this premise, rendering the improvement conditional on an unverified assumption rather than a derived property.

Authors: We agree that the small condition number of the weight matrices is presented as an assumption enabling the tighter bound rather than a property derived from the multi-task training dynamics. The manuscript derives the generalization bound conditional on this assumption but does not supply an explicit upper bound or verification. In the revision we will add a dedicated subsection providing either a theoretical argument bounding the condition number under gradient-based multi-task training or empirical measurements on representative architectures and datasets to support the premise. revision: yes
Referee: [Hypothesis space definition] The tailored Sobolev space is introduced as the expanded hypothesis space enabling the operator-theoretic improvement, yet its construction, norm equivalence, and integration with the Koopman operator are not detailed enough to confirm it avoids circularity with the condition-number hypothesis or delivers the claimed outperformance in single-output regimes.

Authors: We acknowledge that the construction of the tailored Sobolev space, its norm equivalence relations, and its precise integration with the Koopman operator require further elaboration to rule out circularity and to substantiate the single-output performance claim. In the revised manuscript we will expand the relevant section with an explicit definition of the space, a proof of norm equivalence, a step-by-step description of the operator integration, and a short argument or corollary showing the bound remains strictly tighter than prior Koopman-based results even when restricted to single-output tasks. revision: yes

Circularity Check

0 steps flagged

No circularity: bound tightening rests on explicit assumption, not self-referential construction

full rationale

The abstract presents the tighter Koopman-based bound as obtained by leveraging the assumption of small condition numbers in weight matrices together with a tailored Sobolev space. This is an external premise invoked to improve upon norm-based and prior Koopman bounds, not a quantity defined in terms of the bound itself or fitted to the target result. No equations, self-citations, or derivations are supplied that reduce the claimed outperformance (even in single-output settings) to a renaming, a fitted input, or a self-citation chain. The stated advantages of flexibility and width-independence further indicate that the derivation chain does not collapse to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Only abstract available; the work relies on the domain assumption that Koopman operators apply usefully to neural weight matrices and introduces a tailored Sobolev space whose independent justification is not shown.

axioms (1)

domain assumption Koopman operator techniques can be applied to weight matrices of deep networks to obtain generalization bounds
Stated in abstract as the core operator-theoretic technique.

invented entities (1)

tailored Sobolev space no independent evidence
purpose: Expanded hypothesis space that enables the tighter bound
Introduced in the abstract as the key modeling choice.

pith-pipeline@v0.9.0 · 5621 in / 1233 out tokens · 24277 ms · 2026-05-25T07:39:29.034977+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

leveraging small condition numbers in the weight matrices and introducing a tailored Sobolev space... Theorems 2 and 3
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Koopman operator K_f ... f = K_{W1} K_{b1} ... K_{WL} K_{bL} g

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

[1]

In: Advances in Neural Information Processing Systems

Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. In: Advances in Neural Information Processing Systems. vol. 19 (2006)

work page 2006
[2]

Machine learning73(3), 243–272 (2008)

Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Machine learning73(3), 243–272 (2008)

work page 2008
[3]

In: Advances in Neural Information Processing Systems

Bartlett, P.L., Foster, D.J., Telgarsky, M.J.: Spectrally-normalized margin bounds for neural networks. In: Advances in Neural Information Processing Systems. vol. 31 (2017)

work page 2017
[4]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Biswas, K., Kumar, S., Banerjee, S., Pandey, A.K.: Smooth maximum unit: Smooth activation function for deep networks using smoothing maximum technique. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022
[5]

In: Proceedings of the Forty-first International Conference on Ma- chine Learning (2024)

Collins, L., Hassani, H., Soltanolkotabi, M., Mokhtari, A., Shakkottai, S.: Provable multi-task representation learning by two-layer relu neural net- works. In: Proceedings of the Forty-first International Conference on Ma- chine Learning (2024)

work page 2024
[6]

Transactions on Ma- chine Learning Research (2023),https://openreview.net/forum?id= ry2qgRqTOw, pages 18, 90, 92, 110, 115 13

El Ahmad, T., Laforgue, P., d’Alch´ e Buc, F.: Fast kernel methods for generic lipschitz losses via p-sparsified sketches. Transactions on Ma- chine Learning Research (2023),https://openreview.net/forum?id= ry2qgRqTOw, pages 18, 90, 92, 110, 115 13

work page 2023
[7]

Journal of Machine Learning Research6(1), 615 (2006)

Evgeniou, T., Micchelli, C., Pontil, M.: Learning multiple tasks with kernel methods. Journal of Machine Learning Research6(1), 615 (2006)

work page 2006
[8]

In: Encyclopedia of Optimization, pp

Fatta, G.D., Nicosia, G., Ojha, V., Pardalos, P.: Multi-task deep learning as multi-objective optimization. In: Encyclopedia of Optimization, pp. 1–

work page
[9]

arXiv preprint arXiv:2407.17280 (2024)

Follain, B., Bach, F.: Enhanced feature learning via regularisa- tion: Integrating neural networks and kernel methods. arXiv preprint arXiv:2407.17280 (2024)

work page arXiv 2024
[10]

Information and Inference: A Journal of the IMA 9(2), 473–504 (6 2020)

Golowich, N., Rakhlin, A., Shamir, O.: Size-independent sample complex- ity of neural networks. Information and Inference: A Journal of the IMA 9(2), 473–504 (6 2020)

work page 2020
[11]

In: Proceedings of the 2017 Conference on Learning Theory (COLT)

Harvey, N., Liaw, C., Mehrabian, A.: Nearly-tight vc-dimension bounds for piecewise linear neural networks. In: Proceedings of the 2017 Conference on Learning Theory (COLT). pp. 1064–1068. PMLR (2017)

work page 2017
[12]

In: The Twelfth International Conference on Learning Representations (2024), https://openreview.net/forum?id=JN7TcCm9LF

Hashimoto, Y., Sonoda, S., Ishikawa, I., Nitanda, A., Suzuki, T.: Koopman-based generalization bound: New aspect for full-rank weights. In: The Twelfth International Conference on Learning Representations (2024), https://openreview.net/forum?id=JN7TcCm9LF

work page 2024
[13]

He, F., He, M., Shi, L., Huang, X., Suykens, J.A.K.: Learning analysis of kernel ridgeless regression with asymmetric kernel learning (2024),https: //arxiv.org/abs/2406.01435

work page arXiv 2024
[14]

arXiv: Functional Analysis (2012),https://api.semanticscholar

Hotz, T., Telschow, F.J.: Representation by integrating reproducing ker- nels. arXiv: Functional Analysis (2012),https://api.semanticscholar. org/CorpusID:117433321

work page 2012
[15]

In: Proceedings of the 39th International Conference on Machine Learning (ICML) (2022)

Ju, H., Li, D., Zhang, H.R.: Robust fine-tuning of deep neural networks with hessian-based generalization guarantees. In: Proceedings of the 39th International Conference on Machine Learning (ICML) (2022)

work page 2022
[16]

IEEE Transactions on Pattern Analysis and Machine Intelligence43(4), 1352–1368 (2021)

Li, S., Jia, K., Wen, Y., Liu, T., Tao, D.: Orthogonal deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence43(4), 1352–1368 (2021)

work page 2021
[17]

Journal of Machine Learning Research25(181), 1–51 (2024),http: //jmlr.org/papers/v25/23-1663.html

Li, Z., Meunier, D., Mollenhauer, M., Gretton, A.: Towards optimal sobolev norm rates for the vector-valued regularized least-squares algo- rithm. Journal of Machine Learning Research25(181), 1–51 (2024),http: //jmlr.org/papers/v25/23-1663.html

work page 2024
[18]

arXiv preprint arXiv:2310.02396 (2023)

Lindsey, J.W., Lippl, S.: Implicit regularization of multi-task learn- ing and finetuning in overparameterized neural networks. arXiv preprint arXiv:2310.02396 (2023)

work page arXiv 2023
[19]

Journal of Machine Learning Research25(138), 1–42 (2024),http://jmlr.org/papers/v25/22-1250

Liu, F., Dadi, L., Cevher, V.: Learning with norm constrained, over- parameterized, two-layer neural networks. Journal of Machine Learning Research25(138), 1–42 (2024),http://jmlr.org/papers/v25/22-1250. html 14

work page 2024
[20]

The Journal of Machine Learning Research7, 117–139 (2006)

Maurer, A.: Bounds for linear multi-task learning. The Journal of Machine Learning Research7, 117–139 (2006)

work page 2006
[21]

MIT Press, Cambridge, MA (2018)

Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. MIT Press, Cambridge, MA (2018)

work page 2018
[22]

Norm-Based Capacity Control in Neural Networks

Neyshabur, B., Tomioka, R., Srebro, N.: Norm-based capacity control in neural networks. In: Proceedings of the 28th Conference on Learning The- ory (COLT). vol. PMLR 40, pp. 1376–1401 (2015),https://arxiv.org/ pdf/1503.00036.pdf

work page internal anchor Pith review Pith/arXiv arXiv 2015
[23]

In: Proceedings of the 6th International Conference on Learning Representa- tions (ICLR)

Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: An empirical study. In: Proceedings of the 6th International Conference on Learning Representa- tions (ICLR). OpenReview.net (2018)

work page 2018
[24]

In: Proceedings of the Conference on Learning Theory

Pontil, M., Maurer, A.: Excess risk bounds for multitask learning with trace norm regularization. In: Proceedings of the Conference on Learning Theory. pp. 55–76. PMLR (2013)

work page 2013
[25]

Journal of Machine Learning Research25(231), 1–40 (2024)

Shenouda, J., Parhi, R., Lee, K., Nowak, R.D.: Variation spaces for multi- output neural networks: Insights on multi-task learning and network com- pression. Journal of Machine Learning Research25(231), 1–40 (2024)

work page 2024
[26]

In: Advances in Neural Information Pro- cessing Systems

Wei, C., Ma, T.: Data-dependent sample complexity of deep neural net- works via lipschitz augmentation. In: Advances in Neural Information Pro- cessing Systems. vol. 33 (2019)

work page 2019
[27]

In: Proceedings of the 8th International Conference on Learning Representations (ICLR) (2020)

Wei, C., Ma, T.: Improved sample complexities for deep neural networks and robust classification via an all-layer margin. In: Proceedings of the 8th International Conference on Learning Representations (ICLR) (2020)

work page 2020
[28]

Wendland, H.: Scattered data approximation. No. 17 in Cambridge mono- graphs on applied and computational mathematics, Cambridge University Press, Cambridge, UK ; New York (2005)

work page 2005
[29]

Wittwar, D.: Approximation with matrix-valued kernels and highly effec- tive error estimators for reduced basis approximations. Ph.D. thesis, Uni- versit¨ at Stuttgart, Stuttgart, Germany (April 2022)

work page 2022
[30]

The Journal of Machine Learning Research19(1), 1385–1431 (2018) 15

Yousefi, N., Lei, Y., Kloft, M., Mollaghasemi, M., Anagnostopoulos, G.C.: Local rademacher complexity-based learning guarantees for multi- task learning. The Journal of Machine Learning Research19(1), 1385–1431 (2018) 15

work page 2018

[1] [1]

In: Advances in Neural Information Processing Systems

Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. In: Advances in Neural Information Processing Systems. vol. 19 (2006)

work page 2006

[2] [2]

Machine learning73(3), 243–272 (2008)

Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Machine learning73(3), 243–272 (2008)

work page 2008

[3] [3]

In: Advances in Neural Information Processing Systems

Bartlett, P.L., Foster, D.J., Telgarsky, M.J.: Spectrally-normalized margin bounds for neural networks. In: Advances in Neural Information Processing Systems. vol. 31 (2017)

work page 2017

[4] [4]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Biswas, K., Kumar, S., Banerjee, S., Pandey, A.K.: Smooth maximum unit: Smooth activation function for deep networks using smoothing maximum technique. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022

[5] [5]

In: Proceedings of the Forty-first International Conference on Ma- chine Learning (2024)

Collins, L., Hassani, H., Soltanolkotabi, M., Mokhtari, A., Shakkottai, S.: Provable multi-task representation learning by two-layer relu neural net- works. In: Proceedings of the Forty-first International Conference on Ma- chine Learning (2024)

work page 2024

[6] [6]

Transactions on Ma- chine Learning Research (2023),https://openreview.net/forum?id= ry2qgRqTOw, pages 18, 90, 92, 110, 115 13

El Ahmad, T., Laforgue, P., d’Alch´ e Buc, F.: Fast kernel methods for generic lipschitz losses via p-sparsified sketches. Transactions on Ma- chine Learning Research (2023),https://openreview.net/forum?id= ry2qgRqTOw, pages 18, 90, 92, 110, 115 13

work page 2023

[7] [7]

Journal of Machine Learning Research6(1), 615 (2006)

Evgeniou, T., Micchelli, C., Pontil, M.: Learning multiple tasks with kernel methods. Journal of Machine Learning Research6(1), 615 (2006)

work page 2006

[8] [8]

In: Encyclopedia of Optimization, pp

Fatta, G.D., Nicosia, G., Ojha, V., Pardalos, P.: Multi-task deep learning as multi-objective optimization. In: Encyclopedia of Optimization, pp. 1–

work page

[9] [9]

arXiv preprint arXiv:2407.17280 (2024)

Follain, B., Bach, F.: Enhanced feature learning via regularisa- tion: Integrating neural networks and kernel methods. arXiv preprint arXiv:2407.17280 (2024)

work page arXiv 2024

[10] [10]

Information and Inference: A Journal of the IMA 9(2), 473–504 (6 2020)

Golowich, N., Rakhlin, A., Shamir, O.: Size-independent sample complex- ity of neural networks. Information and Inference: A Journal of the IMA 9(2), 473–504 (6 2020)

work page 2020

[11] [11]

In: Proceedings of the 2017 Conference on Learning Theory (COLT)

Harvey, N., Liaw, C., Mehrabian, A.: Nearly-tight vc-dimension bounds for piecewise linear neural networks. In: Proceedings of the 2017 Conference on Learning Theory (COLT). pp. 1064–1068. PMLR (2017)

work page 2017

[12] [12]

In: The Twelfth International Conference on Learning Representations (2024), https://openreview.net/forum?id=JN7TcCm9LF

Hashimoto, Y., Sonoda, S., Ishikawa, I., Nitanda, A., Suzuki, T.: Koopman-based generalization bound: New aspect for full-rank weights. In: The Twelfth International Conference on Learning Representations (2024), https://openreview.net/forum?id=JN7TcCm9LF

work page 2024

[13] [13]

He, F., He, M., Shi, L., Huang, X., Suykens, J.A.K.: Learning analysis of kernel ridgeless regression with asymmetric kernel learning (2024),https: //arxiv.org/abs/2406.01435

work page arXiv 2024

[14] [14]

arXiv: Functional Analysis (2012),https://api.semanticscholar

Hotz, T., Telschow, F.J.: Representation by integrating reproducing ker- nels. arXiv: Functional Analysis (2012),https://api.semanticscholar. org/CorpusID:117433321

work page 2012

[15] [15]

In: Proceedings of the 39th International Conference on Machine Learning (ICML) (2022)

Ju, H., Li, D., Zhang, H.R.: Robust fine-tuning of deep neural networks with hessian-based generalization guarantees. In: Proceedings of the 39th International Conference on Machine Learning (ICML) (2022)

work page 2022

[16] [16]

IEEE Transactions on Pattern Analysis and Machine Intelligence43(4), 1352–1368 (2021)

Li, S., Jia, K., Wen, Y., Liu, T., Tao, D.: Orthogonal deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence43(4), 1352–1368 (2021)

work page 2021

[17] [17]

Journal of Machine Learning Research25(181), 1–51 (2024),http: //jmlr.org/papers/v25/23-1663.html

Li, Z., Meunier, D., Mollenhauer, M., Gretton, A.: Towards optimal sobolev norm rates for the vector-valued regularized least-squares algo- rithm. Journal of Machine Learning Research25(181), 1–51 (2024),http: //jmlr.org/papers/v25/23-1663.html

work page 2024

[18] [18]

arXiv preprint arXiv:2310.02396 (2023)

Lindsey, J.W., Lippl, S.: Implicit regularization of multi-task learn- ing and finetuning in overparameterized neural networks. arXiv preprint arXiv:2310.02396 (2023)

work page arXiv 2023

[19] [19]

Journal of Machine Learning Research25(138), 1–42 (2024),http://jmlr.org/papers/v25/22-1250

Liu, F., Dadi, L., Cevher, V.: Learning with norm constrained, over- parameterized, two-layer neural networks. Journal of Machine Learning Research25(138), 1–42 (2024),http://jmlr.org/papers/v25/22-1250. html 14

work page 2024

[20] [20]

The Journal of Machine Learning Research7, 117–139 (2006)

Maurer, A.: Bounds for linear multi-task learning. The Journal of Machine Learning Research7, 117–139 (2006)

work page 2006

[21] [21]

MIT Press, Cambridge, MA (2018)

Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. MIT Press, Cambridge, MA (2018)

work page 2018

[22] [22]

Norm-Based Capacity Control in Neural Networks

Neyshabur, B., Tomioka, R., Srebro, N.: Norm-based capacity control in neural networks. In: Proceedings of the 28th Conference on Learning The- ory (COLT). vol. PMLR 40, pp. 1376–1401 (2015),https://arxiv.org/ pdf/1503.00036.pdf

work page internal anchor Pith review Pith/arXiv arXiv 2015

[23] [23]

In: Proceedings of the 6th International Conference on Learning Representa- tions (ICLR)

Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: An empirical study. In: Proceedings of the 6th International Conference on Learning Representa- tions (ICLR). OpenReview.net (2018)

work page 2018

[24] [24]

In: Proceedings of the Conference on Learning Theory

Pontil, M., Maurer, A.: Excess risk bounds for multitask learning with trace norm regularization. In: Proceedings of the Conference on Learning Theory. pp. 55–76. PMLR (2013)

work page 2013

[25] [25]

Journal of Machine Learning Research25(231), 1–40 (2024)

Shenouda, J., Parhi, R., Lee, K., Nowak, R.D.: Variation spaces for multi- output neural networks: Insights on multi-task learning and network com- pression. Journal of Machine Learning Research25(231), 1–40 (2024)

work page 2024

[26] [26]

In: Advances in Neural Information Pro- cessing Systems

Wei, C., Ma, T.: Data-dependent sample complexity of deep neural net- works via lipschitz augmentation. In: Advances in Neural Information Pro- cessing Systems. vol. 33 (2019)

work page 2019

[27] [27]

In: Proceedings of the 8th International Conference on Learning Representations (ICLR) (2020)

Wei, C., Ma, T.: Improved sample complexities for deep neural networks and robust classification via an all-layer margin. In: Proceedings of the 8th International Conference on Learning Representations (ICLR) (2020)

work page 2020

[28] [28]

Wendland, H.: Scattered data approximation. No. 17 in Cambridge mono- graphs on applied and computational mathematics, Cambridge University Press, Cambridge, UK ; New York (2005)

work page 2005

[29] [29]

Wittwar, D.: Approximation with matrix-valued kernels and highly effec- tive error estimators for reduced basis approximations. Ph.D. thesis, Uni- versit¨ at Stuttgart, Stuttgart, Germany (April 2022)

work page 2022

[30] [30]

The Journal of Machine Learning Research19(1), 1385–1431 (2018) 15

Yousefi, N., Lei, Y., Kloft, M., Mollaghasemi, M., Anagnostopoulos, G.C.: Local rademacher complexity-based learning guarantees for multi- task learning. The Journal of Machine Learning Research19(1), 1385–1431 (2018) 15

work page 2018