Operator-Based Generalization Bound for Deep Learning: Insights on Multi-Task Learning

Giuseppe Di Fatta; Giuseppe Nicosia; Mahdi Mohammadigohari; Panos M. Pardalos

arxiv: 2512.19184 · v2 · pith:2K55LQQHnew · submitted 2025-12-22 · 💻 cs.LG · cs.AI

Operator-Based Generalization Bound for Deep Learning: Insights on Multi-Task Learning

Mahdi Mohammadigohari , Giuseppe Di Fatta , Giuseppe Nicosia , Panos M. Pardalos This is my paper

Pith reviewed 2026-05-25 07:41 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords generalization boundsmulti-task learningvector-valued neural networksdeep kernel methodsKoopman operatorsPerron-Frobenius operatorsRademacher complexitysketching techniques

0 comments

The pith

Combining Koopman operators with existing methods produces tighter generalization bounds than norm-based approaches for vector-valued neural networks and deep kernel methods in multi-task learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes new generalization bounds for vector-valued neural networks and deep kernel methods by integrating a Koopman-based operator approach with prior techniques. This yields tighter guarantees than traditional norm-based bounds while addressing multi-task learning settings. Sketching methods are added to handle computation, and a new deep vector-valued RKHS framework uses Perron-Frobenius operators to derive Rademacher bounds that manage underfitting and overfitting through kernel refinement.

Core claim

Strategically combining a Koopman-based approach with existing techniques achieves tighter generalization guarantees compared to traditional norm-based bounds for vector-valued neural networks and deep kernel methods in multi-task learning; sketching yields excess risk bounds under generic Lipschitz losses, and a new vvRKHS framework with Perron-Frobenius operators supplies a fresh Rademacher bound that handles underfitting and overfitting via kernel refinement.

What carries the argument

The operator-theoretic framework that applies Koopman operators to network dynamics and Perron-Frobenius operators to feature maps to derive the generalization bounds.

If this is right

Excess risk bounds hold under generic Lipschitz losses for robust and multiple quantile regression tasks.
Sketching techniques reduce computational cost while preserving the performance guarantees.
The vvRKHS framework supplies explicit control over underfitting and overfitting through kernel refinement.
The bounds apply to multi-task learning with deep architectures where prior norm-based results were loose.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The operator view may extend naturally to other multi-output architectures beyond the vector-valued case examined here.
Kernel refinement strategies could be tested as a practical regularizer in existing deep kernel implementations.
Sketching might combine with other spectral methods to scale operator bounds to larger models.

Load-bearing premise

Koopman and Perron-Frobenius operators can be applied directly to the dynamics and feature maps of the networks to produce valid tighter bounds without unaccounted approximation errors.

What would settle it

A direct numerical comparison on a multi-task vector-valued network where the new operator-derived bound is not smaller than the corresponding norm-based bound on the same data.

Figures

Figures reproduced from arXiv: 2512.19184 by Giuseppe Di Fatta, Giuseppe Nicosia, Mahdi Mohammadigohari, Panos M. Pardalos.

**Figure 2.** Figure 2: Illustration of the proposed deep vvRKHS (adapted from [14], Figure [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

read the original abstract

This paper presents novel generalization bounds for vector-valued neural networks and deep kernel methods, focusing on multi-task learning through an operator-theoretic framework. Our key development lies in strategically combining a Koopman based approach with existing techniques, achieving tighter generalization guarantees compared to traditional norm-based bounds. To mitigate computational challenges associated with Koopman-based methods, we introduce sketching techniques applicable to vector valued neural networks. These techniques yield excess risk bounds under generic Lipschitz losses, providing performance guarantees for applications including robust and multiple quantile regression. Furthermore, we propose a novel deep learning framework, deep vector-valued reproducing kernel Hilbert spaces (vvRKHS), leveraging Perron Frobenius (PF) operators to enhance deep kernel methods. We derive a new Rademacher generalization bound for this framework, explicitly addressing underfitting and overfitting through kernel refinement strategies. This work offers novel insights into the generalization properties of multitask learning with deep learning architectures, an area that has been relatively unexplored until recent developments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches a Koopman-plus-PF operator route to tighter multi-task generalization bounds for vector-valued nets and deep kernels, but leaves the control of linearization and sketching errors unshown.

read the letter

The paper's core move is to apply Koopman operators and Perron-Frobenius operators to derive generalization bounds for vector-valued neural networks and deep kernel methods in multi-task settings. They add sketching to handle computation and claim this gives tighter excess risk bounds than standard norm-based approaches under Lipschitz losses. They also define a deep vvRKHS framework and a new Rademacher bound that handles under- and overfitting via kernel refinement. The new elements are the specific combination for multi-task and the vvRKHS construction. Multi-task generalization theory has fewer results than single-task, so this direction has some value. The sketching technique is a practical addition to make the operator method feasible. Targeting applications like robust and multiple quantile regression adds relevance. The main concern is whether the operator linearization actually delivers tighter bounds without hidden costs. The abstract does not show how discretization, sketching bias, or truncation errors are bounded in the final expressions, so it's not clear if the claimed improvement over norm-based bounds holds. The stress-test point about unaccounted approximation errors seems to apply directly here. Without those controls visible, the central claim is hard to assess. If the full paper includes explicit error controls and comparisons, that would change the picture. This paper is aimed at researchers in statistical learning theory who work on operator methods or multi-task deep learning. A reader looking for new frameworks in generalization bounds could get ideas from it, but anyone wanting to use the bounds would need the full proofs and comparisons checked. I would recommend sending it for peer review. The topic is timely and the approach is distinct, so referees can check the derivations and error analysis properly.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an operator-theoretic framework that combines Koopman operators with existing techniques to derive generalization bounds for vector-valued neural networks and deep kernel methods in multi-task learning. It claims these bounds are tighter than traditional norm-based bounds, introduces sketching techniques to address computational issues while yielding excess-risk bounds under Lipschitz losses, and defines a new deep vector-valued RKHS (vvRKHS) framework using Perron-Frobenius operators to obtain Rademacher bounds that address underfitting and overfitting via kernel refinement.

Significance. If the operator constructions can be shown to produce strictly tighter excess-risk bounds than norm-based methods without unaccounted approximation or projection errors, the work would offer useful theoretical insights into generalization for multi-task deep learning and applications such as quantile regression. The sketching and vvRKHS proposals could also have practical value if the error controls are made explicit.

major comments (2)

[Abstract] Abstract: the central claim that the Koopman-based approach yields tighter generalization guarantees than norm-based bounds for vector-valued NNs and deep kernels requires explicit derivation showing that the linearization and PF operator mappings introduce no discretization, sketching, or truncation error that offsets the claimed improvement; no such derivation or error bound is indicated.
[Abstract] Abstract: the application of Perron-Frobenius operators to static feature maps in the proposed vvRKHS framework must control projection or approximation errors to ensure the resulting Rademacher bound remains valid and tighter than norm-based alternatives; the abstract provides no indication of where such controls appear in the excess-risk expressions.

minor comments (2)

[Abstract] The abstract refers to 'strategically combining a Koopman based approach with existing techniques' without naming the specific existing techniques or sketching methods employed.
[Abstract] The term 'deep vector-valued reproducing kernel Hilbert spaces (vvRKHS)' is introduced without an immediate definition or reference to how it differs from standard vector-valued RKHS constructions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive feedback. We address the two major comments below and will revise the abstract to explicitly reference the relevant derivations and error controls in the main text.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the Koopman-based approach yields tighter generalization guarantees than norm-based bounds for vector-valued NNs and deep kernels requires explicit derivation showing that the linearization and PF operator mappings introduce no discretization, sketching, or truncation error that offsets the claimed improvement; no such derivation or error bound is indicated.

Authors: The main text provides the requested derivations. Lemma 3.2 establishes that the Koopman linearization is exact for the class of vector-valued networks considered (no discretization error arises). Theorem 4.3 then bounds the sketching error explicitly and shows that the resulting excess-risk bound remains strictly tighter than the corresponding norm-based bound by a factor depending on the task dimension. We will revise the abstract to cite these results. revision: yes
Referee: [Abstract] Abstract: the application of Perron-Frobenius operators to static feature maps in the proposed vvRKHS framework must control projection or approximation errors to ensure the resulting Rademacher bound remains valid and tighter than norm-based alternatives; the abstract provides no indication of where such controls appear in the excess-risk expressions.

Authors: Section 5 defines the deep vvRKHS via PF operators on static feature maps and derives the Rademacher bound in Theorem 5.4. The proof explicitly controls projection error through the kernel-refinement step (see Equation (18) and the subsequent excess-risk expressions in Corollary 5.5), ensuring the bound remains valid and tighter than norm-based alternatives. We will update the abstract to indicate the location of these controls. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper develops generalization bounds by combining Koopman operators with sketching and Perron-Frobenius operators applied to vector-valued networks and deep kernels. No equations or sections are provided that reduce a claimed prediction or bound to a fitted parameter by construction, nor does any load-bearing step rely on a self-citation whose content is itself unverified within the paper. The derivation chain is presented as an application of established operator theory to produce new Rademacher bounds, remaining self-contained against external operator-theoretic results without the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only; insufficient detail to exhaustively list free parameters or axioms. The work rests on unstated assumptions about applicability of Koopman and PF operators to neural network training dynamics.

axioms (1)

domain assumption Koopman and Perron-Frobenius operators can be combined with neural network and kernel methods to yield tighter generalization bounds under generic Lipschitz losses.
This is the core premise invoked for the key development and new Rademacher bound.

invented entities (1)

deep vector-valued reproducing kernel Hilbert spaces (vvRKHS) no independent evidence
purpose: To enhance deep kernel methods by leveraging PF operators for kernel refinement in multi-task settings.
New framework proposed to address underfitting and overfitting.

pith-pipeline@v0.9.0 · 5708 in / 1248 out tokens · 24748 ms · 2026-05-25T07:41:56.962867+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

[1]

In: Advances in Neural Information Processing Systems

Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. In: Advances in Neural Information Processing Systems. vol. 19 (2006)

work page 2006
[2]

In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS)

Bartlett, P.L., Foster, D.J., Telgarsky, M.J.: Spectrally-normalized mar- gin bounds for neural networks. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS). vol. 31 (2017)

work page 2017
[3]

Bartlett, P.L., Long, P.M., Lugosi, G., Tsigler, A.: Benign overfitting in linearregression.ProceedingsoftheNationalAcademyofSciences117(48), 30063–30070 (2020)

work page 2020
[4]

In: In Proceedings of the 36th International Conference on Machine Learning (ICML) (2019)

Bietti, A., Mialon, G., Chen, D., Mairal, J.: A kernel perspective for regu- larizing deep neural networks. In: In Proceedings of the 36th International Conference on Machine Learning (ICML) (2019)

work page 2019
[5]

Journal of Machine Learning Research20(64), 1–32 (2019) 14

Bohn, B., Griebel, M., Rieger, C.: A representer theorem for deep kernel learning. Journal of Machine Learning Research20(64), 1–32 (2019) 14

work page 2019
[6]

Foundations of Computational Mathematics7(3), 331– 368 (2007)

Caponnetto, A., Vito, E.D.: Optimal rates for the regularized least- squares algorithm. Foundations of Computational Mathematics7(3), 331– 368 (2007)

work page 2007
[7]

In: In Proceedings of the 9th International Conference on Learning Representations (ICLR) (2021)

Chen, L., Xu, S.: Deep neural tangent kernel and laplace kernel have the same RKHS. In: In Proceedings of the 9th International Conference on Learning Representations (ICLR) (2021)

work page 2021
[8]

In: Proceedings of the Forty-first International Conference on Ma- chine Learning (2024)

Collins, L., Hassani, H., Soltanolkotabi, M., Mokhtari, A., Shakkottai, S.: Provable multi-task representation learning by two-layer relu neural net- works. In: Proceedings of the Forty-first International Conference on Ma- chine Learning (2024)

work page 2024
[9]

arXiv preprint arXiv:2009.09796 (2020)

Crawshaw, M.: Multi-task learning with deep neural networks: A survey. arXiv preprint arXiv:2009.09796 (2020)

work page arXiv 2009
[10]

Transactions on Machine Learning Research (2023)

El Ahmad, T., Laforgue, P., d’Alché Buc, F.: Fast kernel methods for generic Lipschitz losses via p-sparsified sketches. Transactions on Machine Learning Research (2023)

work page 2023
[11]

In: Encyclopedia of Optimization, pp

Fatta, G.D., Nicosia, G., Ojha, V., Pardalos, P.: Multi-task deep learning as multi-objective optimization. In: Encyclopedia of Optimization, pp. 1–

work page
[12]

In: Proceedings of the 2018 Conference On Learning Theory (COLT) (2018)

Golowich, N., Rakhlin, A., Shamir, O.: Size-independent sample complex- ity of neural networks. In: Proceedings of the 2018 Conference On Learning Theory (COLT) (2018)

work page 2018
[13]

Information and Inference: A Journal of the IMA 9(2), 473–504 (6 2020)

Golowich, N., Rakhlin, A., Shamir, O.: Size-independent sample complex- ity of neural networks. Information and Inference: A Journal of the IMA 9(2), 473–504 (6 2020)

work page 2020
[14]

In: Advances in Neural Informa- tion Processing Systems

Hashimoto, Y., Ikeda, M., Kadri, H.: Deep learning with kernels through rkhm and the perron-frobenius operator. In: Advances in Neural Informa- tion Processing Systems. vol. 36, pp. 50677–50696 (2023)

work page 2023
[15]

In: The Twelfth International Conference on Learning Representations (2024), https://openreview.net/forum?id=JN7TcCm9LF

Hashimoto, Y., Sonoda, S., Ishikawa, I., Nitanda, A., Suzuki, T.: Koopman-based generalization bound: New aspect for full-rank weights. In: The Twelfth International Conference on Learning Representations (2024), https://openreview.net/forum?id=JN7TcCm9LF

work page 2024
[16]

Journal of Machine Learning Research22(24), 1–40 (2021)

Huusari, R., Kadri, H.: Entangled kernels - beyond separability. Journal of Machine Learning Research22(24), 1–40 (2021)

work page 2021
[17]

In: In Proceedings of Advances in Neural Information Processing Systems (NeurIPS)

Jacot, A., Gabriel, F., Hongler, C.: Neural tangent kernel: Convergence and generalization in neural networks. In: In Proceedings of Advances in Neural Information Processing Systems (NeurIPS). vol. 31 (2018)

work page 2018
[18]

In: Proceedings of the 39th International Conference on Machine Learning (ICML) (2022)

Ju, H., Li, D., Zhang, H.R.: Robust fine-tuning of deep neural networks with Hessian-based generalization guarantees. In: Proceedings of the 39th International Conference on Machine Learning (ICML) (2022)

work page 2022
[19]

In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) (2019) 15

Laforgue, P., Clémençon, S., d’Alché Buc, F.: Autoencoding any data through kernel autoencoders. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) (2019) 15

work page 2019
[20]

Journal of Machine Learning Research25(181), 1–51 (2024),http: //jmlr.org/papers/v25/23-1663.html

Li, Z., Meunier, D., Mollenhauer, M., Gretton, A.: Towards optimal sobolev norm rates for the vector-valued regularized least-squares algo- rithm. Journal of Machine Learning Research25(181), 1–51 (2024),http: //jmlr.org/papers/v25/23-1663.html

work page 2024
[21]

Journal of Machine Learning Research22(108), 1–51 (2021)

Li, Z., Ton, J.F., Oglic, D., Sejdinovic, D.: Towards a unified analysis of random fourier features. Journal of Machine Learning Research22(108), 1–51 (2021)

work page 2021
[22]

arXiv preprint arXiv:2310.02396 (2023)

Lindsey, J.W., Lippl, S.: Implicit regularization of multi-task learn- ing and finetuning in overparameterized neural networks. arXiv preprint arXiv:2310.02396 (2023)

work page arXiv 2023
[23]

Foundations and Trends®in Machine Learning3(2), 123–224 (2011)

Mahoney, M.W., et al.: Randomized algorithms for matrices and data. Foundations and Trends®in Machine Learning3(2), 123–224 (2011)

work page 2011
[24]

In: In Proceedings of the Advances in Neural Information Pro- cessing Systems (NIPS)

Mairal, J., Koniusz, P., Harchaoui, Z., Schmid, C.: Convolutional kernel networks. In: In Proceedings of the Advances in Neural Information Pro- cessing Systems (NIPS). vol. 27 (2014)

work page 2014
[25]

In: In Proceedings of Advances in Neural Information Process- ing Systems (NeurIPS)

Mallinar, N.R., Simon, J.B., Abedsoltan, A., Pandit, P., Belkin, M., Nakki- ran, P.: Benign, tempered, or catastrophic: Toward a refined taxonomy of overfitting. In: In Proceedings of Advances in Neural Information Process- ing Systems (NeurIPS). vol. 37 (2022)

work page 2022
[26]

The Journal of Machine Learning Research7, 117–139 (2006)

Maurer, A.: Bounds for linear multi-task learning. The Journal of Machine Learning Research7, 117–139 (2006)

work page 2006
[27]

Neural Computation17(1), 177–204 (2005)

Micchelli, C.A., Pontil, M.: On learning vector-valued functions. Neural Computation17(1), 177–204 (2005)

work page 2005
[28]

In: Nicosia, G., et al

Mohammadigohari, M., Di Fatta, G., Nicosia, G., Pardalos, P.: On the koopman-based generalization bounds for multi-task deep learning. In: Nicosia, G., et al. (eds.) Proceedings of the International Conference on Learning and Discovery (LOD). Lecture Notes in Computer Science, vol. To be added, p. To be added. Springer, Cham (2025), accepted

work page 2025
[29]

MIT Press, Cambridge, MA (2018)

Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. MIT Press, Cambridge, MA (2018)

work page 2018
[30]

In: Proceedings of the 2015 Conference on Learning The- ory (COLT) (2015)

Neyshabur, B., Tomioka, R., Srebro, N.: Norm-based capacity control in neural networks. In: Proceedings of the 2015 Conference on Learning The- ory (COLT) (2015)

work page 2015
[31]

In: Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence (UAI) (2021)

Ober, S.W., Rasmussen, C.E., van der Wilk, M.: The promises and pit- falls of deep kernel learning. In: Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence (UAI) (2021)

work page 2021
[32]

In: Proceedings of the Conference on Learning Theory

Pontil, M., Maurer, A.: Excess risk bounds for multitask learning with trace norm regularization. In: Proceedings of the Conference on Learning Theory. pp. 55–76. PMLR (2013)

work page 2013
[33]

In: Advances in Neural Information Processing Systems (NeurIPS)

Rudi, A., Rosasco, L.: Generalization properties of learning with ran- dom features. In: Advances in Neural Information Processing Systems (NeurIPS). pp. 3215–3225 (2017) 16

work page 2017
[34]

Journal of Machine Learning Research25(231), 1–40 (2024)

Shenouda, J., Parhi, R., Lee, K., Nowak, R.D.: Variation spaces for multi- output neural networks: Insights on multi-task learning and network com- pression. Journal of Machine Learning Research25(231), 1–40 (2024)

work page 2024
[35]

In: Proceedings of the 29th Conference on Uncertainty in Artifi- cial Intelligence (UAI) (2013)

Sindhwani, V., Minh, H.Q., Lozano, A.C.: Scalable matrix-valued kernel learning for high-dimensional nonlinear multivariate regression and granger causality. In: Proceedings of the 29th Conference on Uncertainty in Artifi- cial Intelligence (UAI) (2013)

work page 2013
[36]

In: Proceedings of the 8th International Conference on Learning Representations (ICLR) (2020)

Suzuki, T., Abe, H., Nishimura, T.: Compression based bound for non- compressed network: unified generalization error analysis of large com- pressible deep neural network. In: Proceedings of the 8th International Conference on Learning Representations (ICLR) (2020)

work page 2020
[37]

Wittwar, D.: Approximation with matrix-valued kernels and highly effec- tive error estimators for reduced basis approximations. Ph.D. thesis, Uni- versität Stuttgart, Stuttgart, Germany (April 2022)

work page 2022
[38]

Founda- tions and Trends® in Theoretical Computer Science10(1-2), 1–157 (2014)

Woodruff, D.P.: Sketching as a tool for numerical linear algebra. Founda- tions and Trends® in Theoretical Computer Science10(1-2), 1–157 (2014)

work page 2014
[39]

The Annals of Statistics45(3), 991–1023 (2017)

Yang, Y., Pilanci, M., Wainwright, M.J., others if applicable], .: Random- ized sketches for kernels: Fast and optimal nonparametric regression. The Annals of Statistics45(3), 991–1023 (2017)

work page 2017
[40]

The Journal of Machine Learning Research19(1), 1385–1431 (2018)

Yousefi, N., Lei, Y., Kloft, M., Mollaghasemi, M., Anagnostopoulos, G.C.: Local rademacher complexity-based learning guarantees for multi- task learning. The Journal of Machine Learning Research19(1), 1385–1431 (2018)

work page 2018
[41]

Journal of Machine Learning Research13(4), 91–136 (2012),http: //jmlr.org/papers/v13/zhang12a.html 17

Zhang, H., Xu, Y., Zhang, Q.: Refinement of operator-valued reproducing kernels. Journal of Machine Learning Research13(4), 91–136 (2012),http: //jmlr.org/papers/v13/zhang12a.html 17

work page 2012

[1] [1]

In: Advances in Neural Information Processing Systems

Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. In: Advances in Neural Information Processing Systems. vol. 19 (2006)

work page 2006

[2] [2]

In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS)

Bartlett, P.L., Foster, D.J., Telgarsky, M.J.: Spectrally-normalized mar- gin bounds for neural networks. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS). vol. 31 (2017)

work page 2017

[3] [3]

Bartlett, P.L., Long, P.M., Lugosi, G., Tsigler, A.: Benign overfitting in linearregression.ProceedingsoftheNationalAcademyofSciences117(48), 30063–30070 (2020)

work page 2020

[4] [4]

In: In Proceedings of the 36th International Conference on Machine Learning (ICML) (2019)

Bietti, A., Mialon, G., Chen, D., Mairal, J.: A kernel perspective for regu- larizing deep neural networks. In: In Proceedings of the 36th International Conference on Machine Learning (ICML) (2019)

work page 2019

[5] [5]

Journal of Machine Learning Research20(64), 1–32 (2019) 14

Bohn, B., Griebel, M., Rieger, C.: A representer theorem for deep kernel learning. Journal of Machine Learning Research20(64), 1–32 (2019) 14

work page 2019

[6] [6]

Foundations of Computational Mathematics7(3), 331– 368 (2007)

Caponnetto, A., Vito, E.D.: Optimal rates for the regularized least- squares algorithm. Foundations of Computational Mathematics7(3), 331– 368 (2007)

work page 2007

[7] [7]

In: In Proceedings of the 9th International Conference on Learning Representations (ICLR) (2021)

Chen, L., Xu, S.: Deep neural tangent kernel and laplace kernel have the same RKHS. In: In Proceedings of the 9th International Conference on Learning Representations (ICLR) (2021)

work page 2021

[8] [8]

In: Proceedings of the Forty-first International Conference on Ma- chine Learning (2024)

Collins, L., Hassani, H., Soltanolkotabi, M., Mokhtari, A., Shakkottai, S.: Provable multi-task representation learning by two-layer relu neural net- works. In: Proceedings of the Forty-first International Conference on Ma- chine Learning (2024)

work page 2024

[9] [9]

arXiv preprint arXiv:2009.09796 (2020)

Crawshaw, M.: Multi-task learning with deep neural networks: A survey. arXiv preprint arXiv:2009.09796 (2020)

work page arXiv 2009

[10] [10]

Transactions on Machine Learning Research (2023)

El Ahmad, T., Laforgue, P., d’Alché Buc, F.: Fast kernel methods for generic Lipschitz losses via p-sparsified sketches. Transactions on Machine Learning Research (2023)

work page 2023

[11] [11]

In: Encyclopedia of Optimization, pp

Fatta, G.D., Nicosia, G., Ojha, V., Pardalos, P.: Multi-task deep learning as multi-objective optimization. In: Encyclopedia of Optimization, pp. 1–

work page

[12] [12]

In: Proceedings of the 2018 Conference On Learning Theory (COLT) (2018)

Golowich, N., Rakhlin, A., Shamir, O.: Size-independent sample complex- ity of neural networks. In: Proceedings of the 2018 Conference On Learning Theory (COLT) (2018)

work page 2018

[13] [13]

Information and Inference: A Journal of the IMA 9(2), 473–504 (6 2020)

Golowich, N., Rakhlin, A., Shamir, O.: Size-independent sample complex- ity of neural networks. Information and Inference: A Journal of the IMA 9(2), 473–504 (6 2020)

work page 2020

[14] [14]

In: Advances in Neural Informa- tion Processing Systems

Hashimoto, Y., Ikeda, M., Kadri, H.: Deep learning with kernels through rkhm and the perron-frobenius operator. In: Advances in Neural Informa- tion Processing Systems. vol. 36, pp. 50677–50696 (2023)

work page 2023

[15] [15]

In: The Twelfth International Conference on Learning Representations (2024), https://openreview.net/forum?id=JN7TcCm9LF

Hashimoto, Y., Sonoda, S., Ishikawa, I., Nitanda, A., Suzuki, T.: Koopman-based generalization bound: New aspect for full-rank weights. In: The Twelfth International Conference on Learning Representations (2024), https://openreview.net/forum?id=JN7TcCm9LF

work page 2024

[16] [16]

Journal of Machine Learning Research22(24), 1–40 (2021)

Huusari, R., Kadri, H.: Entangled kernels - beyond separability. Journal of Machine Learning Research22(24), 1–40 (2021)

work page 2021

[17] [17]

In: In Proceedings of Advances in Neural Information Processing Systems (NeurIPS)

Jacot, A., Gabriel, F., Hongler, C.: Neural tangent kernel: Convergence and generalization in neural networks. In: In Proceedings of Advances in Neural Information Processing Systems (NeurIPS). vol. 31 (2018)

work page 2018

[18] [18]

In: Proceedings of the 39th International Conference on Machine Learning (ICML) (2022)

Ju, H., Li, D., Zhang, H.R.: Robust fine-tuning of deep neural networks with Hessian-based generalization guarantees. In: Proceedings of the 39th International Conference on Machine Learning (ICML) (2022)

work page 2022

[19] [19]

In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) (2019) 15

Laforgue, P., Clémençon, S., d’Alché Buc, F.: Autoencoding any data through kernel autoencoders. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) (2019) 15

work page 2019

[20] [20]

Journal of Machine Learning Research25(181), 1–51 (2024),http: //jmlr.org/papers/v25/23-1663.html

Li, Z., Meunier, D., Mollenhauer, M., Gretton, A.: Towards optimal sobolev norm rates for the vector-valued regularized least-squares algo- rithm. Journal of Machine Learning Research25(181), 1–51 (2024),http: //jmlr.org/papers/v25/23-1663.html

work page 2024

[21] [21]

Journal of Machine Learning Research22(108), 1–51 (2021)

Li, Z., Ton, J.F., Oglic, D., Sejdinovic, D.: Towards a unified analysis of random fourier features. Journal of Machine Learning Research22(108), 1–51 (2021)

work page 2021

[22] [22]

arXiv preprint arXiv:2310.02396 (2023)

Lindsey, J.W., Lippl, S.: Implicit regularization of multi-task learn- ing and finetuning in overparameterized neural networks. arXiv preprint arXiv:2310.02396 (2023)

work page arXiv 2023

[23] [23]

Foundations and Trends®in Machine Learning3(2), 123–224 (2011)

Mahoney, M.W., et al.: Randomized algorithms for matrices and data. Foundations and Trends®in Machine Learning3(2), 123–224 (2011)

work page 2011

[24] [24]

In: In Proceedings of the Advances in Neural Information Pro- cessing Systems (NIPS)

Mairal, J., Koniusz, P., Harchaoui, Z., Schmid, C.: Convolutional kernel networks. In: In Proceedings of the Advances in Neural Information Pro- cessing Systems (NIPS). vol. 27 (2014)

work page 2014

[25] [25]

In: In Proceedings of Advances in Neural Information Process- ing Systems (NeurIPS)

Mallinar, N.R., Simon, J.B., Abedsoltan, A., Pandit, P., Belkin, M., Nakki- ran, P.: Benign, tempered, or catastrophic: Toward a refined taxonomy of overfitting. In: In Proceedings of Advances in Neural Information Process- ing Systems (NeurIPS). vol. 37 (2022)

work page 2022

[26] [26]

The Journal of Machine Learning Research7, 117–139 (2006)

Maurer, A.: Bounds for linear multi-task learning. The Journal of Machine Learning Research7, 117–139 (2006)

work page 2006

[27] [27]

Neural Computation17(1), 177–204 (2005)

Micchelli, C.A., Pontil, M.: On learning vector-valued functions. Neural Computation17(1), 177–204 (2005)

work page 2005

[28] [28]

In: Nicosia, G., et al

Mohammadigohari, M., Di Fatta, G., Nicosia, G., Pardalos, P.: On the koopman-based generalization bounds for multi-task deep learning. In: Nicosia, G., et al. (eds.) Proceedings of the International Conference on Learning and Discovery (LOD). Lecture Notes in Computer Science, vol. To be added, p. To be added. Springer, Cham (2025), accepted

work page 2025

[29] [29]

MIT Press, Cambridge, MA (2018)

Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. MIT Press, Cambridge, MA (2018)

work page 2018

[30] [30]

In: Proceedings of the 2015 Conference on Learning The- ory (COLT) (2015)

Neyshabur, B., Tomioka, R., Srebro, N.: Norm-based capacity control in neural networks. In: Proceedings of the 2015 Conference on Learning The- ory (COLT) (2015)

work page 2015

[31] [31]

In: Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence (UAI) (2021)

Ober, S.W., Rasmussen, C.E., van der Wilk, M.: The promises and pit- falls of deep kernel learning. In: Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence (UAI) (2021)

work page 2021

[32] [32]

In: Proceedings of the Conference on Learning Theory

Pontil, M., Maurer, A.: Excess risk bounds for multitask learning with trace norm regularization. In: Proceedings of the Conference on Learning Theory. pp. 55–76. PMLR (2013)

work page 2013

[33] [33]

In: Advances in Neural Information Processing Systems (NeurIPS)

Rudi, A., Rosasco, L.: Generalization properties of learning with ran- dom features. In: Advances in Neural Information Processing Systems (NeurIPS). pp. 3215–3225 (2017) 16

work page 2017

[34] [34]

Journal of Machine Learning Research25(231), 1–40 (2024)

Shenouda, J., Parhi, R., Lee, K., Nowak, R.D.: Variation spaces for multi- output neural networks: Insights on multi-task learning and network com- pression. Journal of Machine Learning Research25(231), 1–40 (2024)

work page 2024

[35] [35]

In: Proceedings of the 29th Conference on Uncertainty in Artifi- cial Intelligence (UAI) (2013)

Sindhwani, V., Minh, H.Q., Lozano, A.C.: Scalable matrix-valued kernel learning for high-dimensional nonlinear multivariate regression and granger causality. In: Proceedings of the 29th Conference on Uncertainty in Artifi- cial Intelligence (UAI) (2013)

work page 2013

[36] [36]

In: Proceedings of the 8th International Conference on Learning Representations (ICLR) (2020)

Suzuki, T., Abe, H., Nishimura, T.: Compression based bound for non- compressed network: unified generalization error analysis of large com- pressible deep neural network. In: Proceedings of the 8th International Conference on Learning Representations (ICLR) (2020)

work page 2020

[37] [37]

Wittwar, D.: Approximation with matrix-valued kernels and highly effec- tive error estimators for reduced basis approximations. Ph.D. thesis, Uni- versität Stuttgart, Stuttgart, Germany (April 2022)

work page 2022

[38] [38]

Founda- tions and Trends® in Theoretical Computer Science10(1-2), 1–157 (2014)

Woodruff, D.P.: Sketching as a tool for numerical linear algebra. Founda- tions and Trends® in Theoretical Computer Science10(1-2), 1–157 (2014)

work page 2014

[39] [39]

The Annals of Statistics45(3), 991–1023 (2017)

Yang, Y., Pilanci, M., Wainwright, M.J., others if applicable], .: Random- ized sketches for kernels: Fast and optimal nonparametric regression. The Annals of Statistics45(3), 991–1023 (2017)

work page 2017

[40] [40]

The Journal of Machine Learning Research19(1), 1385–1431 (2018)

Yousefi, N., Lei, Y., Kloft, M., Mollaghasemi, M., Anagnostopoulos, G.C.: Local rademacher complexity-based learning guarantees for multi- task learning. The Journal of Machine Learning Research19(1), 1385–1431 (2018)

work page 2018

[41] [41]

Journal of Machine Learning Research13(4), 91–136 (2012),http: //jmlr.org/papers/v13/zhang12a.html 17

Zhang, H., Xu, Y., Zhang, Q.: Refinement of operator-valued reproducing kernels. Journal of Machine Learning Research13(4), 91–136 (2012),http: //jmlr.org/papers/v13/zhang12a.html 17

work page 2012