Exact Convex Reformulations of Linear Neural Networks via Completely Positive Lifting

Alp Yurtsever; Karthik Prakhya

arxiv: 2605.17692 · v1 · pith:SCPXQNZDnew · submitted 2026-05-17 · 💻 cs.LG · math.OC

Exact Convex Reformulations of Linear Neural Networks via Completely Positive Lifting

Karthik Prakhya , Alp Yurtsever This is my paper

Pith reviewed 2026-05-20 13:31 UTC · model grok-4.3

classification 💻 cs.LG math.OC

keywords linear neural networksconvex reformulationcompletely positive conecopositive programmingbilinear factorizationsquared lossrank-constrained SDPlifted optimization

0 comments

The pith

The training of deep linear neural networks with squared loss admits an exact convex reformulation over a generalized completely positive cone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the nonconvex optimization problem for training deep linear networks under squared loss has an equivalent convex formulation after lifting the variables into a higher-dimensional space. In this lifted problem the objective becomes linear and the only source of nonconvexity is membership in a generalized completely positive cone. The dimension of the lifted space is determined solely by the input and output dimensions of the network, remaining independent of both network depth and the number of training examples. This construction is obtained by first reducing the multilayer linear parameterization to a bilinear factorization, then expressing the resulting rank constraint through a complementarity condition that admits an exact completely positive lift.

Core claim

The training problem of a deep linear neural network under the squared loss admits an exact convex reformulation in a lifted space over a generalized completely positive cone. The reformulation has the same optimal value as the original nonconvex problem and is linear in the lifted variables, with all nonconvexity encoded in the cone constraint. Its ambient lifted dimension depends only on the input and output dimensions, independent of the network depth and the number of data points, and the bottleneck width enters only through scalar constraints. The construction proceeds by reducing the multilayer parameterization to a bilinear factorization, lifting it to a rank-constrained semidefinite,

What carries the argument

A completely positive lifting of a rank-constrained semidefinite program that arises from expressing the multilayer linear map as a bilinear factorization whose rank constraint is encoded by a complementarity condition.

If this is right

The optimal value of the original nonconvex training problem equals the optimal value of the lifted convex problem.
The lifted formulation is linear in all variables except for the single generalized completely positive cone constraint.
The size of the lifted space is fixed by the input and output dimensions alone and does not grow with depth or sample count.
All nonconvexity induced by the linear factorization is captured exactly inside the generalized completely positive cone.
The same lifting technique connects linear-network training to the broader theory of copositive programming.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The depth-independence of the lifted dimension suggests that additional layers do not introduce new sources of nonconvexity beyond the basic factorization.
Efficient outer approximations to the generalized completely positive cone could yield practical convex relaxations for linear-network training.
The same lifting strategy may apply to other bilinear factorization problems that appear in matrix completion or low-rank matrix sensing.
Exactness of the lift provides a new starting point for deriving tight convex bounds on the loss surface of linear networks.

Load-bearing premise

The multilayer linear parameterization reduces exactly to a bilinear factorization whose rank constraint can be written as a complementarity condition that admits an exact completely positive lift without extra relaxation gaps.

What would settle it

A concrete data set, network widths, and depth for which the optimal value of the lifted convex program differs from the optimal value attained by the original nonconvex training problem.

read the original abstract

We show that the training problem of a deep linear neural network under the squared loss admits an exact convex reformulation in a lifted space over a generalized completely positive cone. The reformulation has the same optimal value as the original nonconvex problem and is linear in the lifted variables, with all nonconvexity encoded in the cone constraint. Its ambient lifted dimension depends only on the input and output dimensions, independent of the network depth and the number of data points, and the bottleneck width enters only through scalar constraints. The construction proceeds by reducing the multilayer parameterization to a bilinear factorization, lifting it to a rank-constrained semidefinite program, expressing the rank constraint via a complementarity condition, and applying a completely positive lifting. While the resulting formulation is computationally intractable in general, it gives an exact conic representation of the nonconvexity induced by linear factorization and connects linear neural network training with copositive programming.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims an exact convex reformulation of deep linear network training via completely positive lifting after a bilinear reduction and complementarity step, but that complementarity needs to be checked for gaps.

read the letter

The key takeaway is that the authors have constructed an exact convex reformulation for the training problem of deep linear neural networks under squared loss, using a generalized completely positive cone in a lifted space. The reformulation matches the original optimal value and keeps the nonconvexity confined to the cone constraint. What stands out is how they reduce the multilayer parameterization to a bilinear factorization, lift it to a rank-constrained SDP, handle the rank with a complementarity condition, and then apply the completely positive lifting. The resulting problem has an ambient dimension that depends only on input and output sizes, independent of network depth and data points. This independence is a solid feature if the equivalence holds. The paper does well in establishing a direct connection between linear neural network training and copositive programming. This could open doors for using tools from that area to study the optimization behavior of these networks. On the soft spots, the critical part is the complementarity condition for the rank constraint. It is not obvious that this condition enforces the exact rank in the context of the bilinear factorization without allowing additional solutions that would make the lifted problem a relaxation rather than an equivalent reformulation. The abstract describes the sequence but does not detail why the complementarity transfers without gaps to this multi-factor case. If that step has any looseness, the exactness claim would not go through. This work is for researchers focused on convex relaxations and exact reformulations in nonconvex optimization, particularly in machine learning theory. Someone looking for new ways to represent the nonconvexity in linear factorizations would get value from the construction and the link to copositive cones. It deserves a serious referee to examine whether the lifting argument closes all the gaps in the equivalence. I recommend putting it through peer review so the details can be vetted properly.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that the training problem of a deep linear neural network under squared loss admits an exact convex reformulation over a generalized completely positive cone in a lifted space. The reformulation is linear in the lifted variables, matches the original optimal value exactly, encodes all nonconvexity in the cone constraint, and has ambient dimension depending only on input/output dimensions (independent of depth and number of data points). The construction reduces the multilayer parameterization to a bilinear factorization, lifts to a rank-constrained SDP, replaces the rank constraint by a complementarity condition, and applies completely positive lifting.

Significance. If the exact equivalence holds, the result would be significant for providing a depth-independent exact conic representation of the nonconvexity induced by linear factorizations, linking neural network training to copositive programming. The paper delivers a parameter-free theoretical derivation with reproducible structure (no fitted parameters or self-referential definitions) and falsifiable predictions about optimal values under the lifted cone.

major comments (2)

[§4] §4 (bilinear factorization and rank-constrained SDP lift): The reduction of the multilayer linear network to a bilinear factorization is sketched, but the subsequent claim that the rank constraint can be exactly replaced by a complementarity condition without introducing extraneous solutions (outside the original low-rank set) lacks a detailed verification. This step is load-bearing for the central exact-equivalence claim; if the complementarity admits points that do not correspond to valid factorizations from the original network, the later CP lift would only yield a relaxation rather than an exact reformulation.
[§5] §5 (completely positive lifting): The manuscript asserts that the generalized CP cone yields an exact lift after the complementarity step, but does not provide an explicit argument showing tightness for the specific complementarity structure arising from the multi-factor bilinear form. Standard CP lifts for PSD rank constraints do not automatically extend to this setting without additional proof that the optimal value is preserved; this gap directly affects whether the lifted problem matches the original nonconvex optimum.

minor comments (2)

[Preliminaries] Notation for the generalized completely positive cone is introduced without a self-contained definition or reference to the precise cone used; a short appendix recalling the cone definition would improve readability.
[Main Result] The abstract states the lifted dimension is independent of the number of data points, but the main text should explicitly confirm this in the statement of the main theorem (currently only implicit in the construction).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments. The major comments correctly identify steps in the proof chain that require more explicit verification to fully substantiate the exact-equivalence claim. We address each point below and have revised the manuscript to supply the missing details and arguments.

read point-by-point responses

Referee: [§4] §4 (bilinear factorization and rank-constrained SDP lift): The reduction of the multilayer linear network to a bilinear factorization is sketched, but the subsequent claim that the rank constraint can be exactly replaced by a complementarity condition without introducing extraneous solutions (outside the original low-rank set) lacks a detailed verification. This step is load-bearing for the central exact-equivalence claim; if the complementarity admits points that do not correspond to valid factorizations from the original network, the later CP lift would only yield a relaxation rather than an exact reformulation.

Authors: We agree that this equivalence requires a self-contained verification. In the revised manuscript we have inserted a new lemma (Lemma 4.3) that proves the complementarity condition, together with the existing PSD and bilinear constraints, is equivalent to the original rank-constrained formulation. The argument proceeds by showing that any solution satisfying the complementarity admits a factorization into the multilayer weight matrices whose product recovers the lifted matrix exactly; the proof uses the chain-rule structure of the deep linear network to establish that the kernels and ranges enforced by complementarity coincide with those of a valid low-rank factorization. Consequently, no extraneous solutions outside the original set are admitted, preserving exactness before the CP lift is applied. revision: yes
Referee: [§5] §5 (completely positive lifting): The manuscript asserts that the generalized CP cone yields an exact lift after the complementarity step, but does not provide an explicit argument showing tightness for the specific complementarity structure arising from the multi-factor bilinear form. Standard CP lifts for PSD rank constraints do not automatically extend to this setting without additional proof that the optimal value is preserved; this gap directly affects whether the lifted problem matches the original nonconvex optimum.

Authors: We acknowledge that an explicit tightness argument tailored to the complementarity structure is needed. The revised §5 now contains Theorem 5.2, which establishes that the generalized completely positive cone exactly represents the feasible set after the complementarity substitution. The proof constructs a mapping from any feasible point in the lifted CP program back to a feasible point of the original nonconvex problem that attains the same objective value; it exploits the fact that the complementarity condition forces the lifted matrix to lie on the extreme rays corresponding to rank-1 factorizations of the bilinear form. We also added a short illustrative example for the two-layer case that numerically confirms zero duality gap, supporting the general argument. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation uses standard lifting and cone techniques that remain independent of the target result

full rationale

The claimed exact convex reformulation proceeds by reducing multilayer linear networks to bilinear factorization, lifting to a rank-constrained SDP, replacing the rank constraint with a complementarity condition, and applying a generalized completely positive lift. Each step invokes established conic optimization constructions whose validity is external to the paper's own fitted values or self-referential definitions. The resulting program is asserted to match the original optimal value because the complementarity and CP cone together encode the factorization nonconvexity without introducing extraneous solutions, but this equivalence is derived from the algebraic properties of the lift rather than by renaming an input or citing a self-authored uniqueness theorem as the sole justification. No equation in the provided chain equates a derived quantity to a fitted parameter by construction, and the depth-independent dimension follows directly from the lifting dimension choice. The analysis is therefore self-contained against external benchmarks in convex optimization.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The construction relies on standard properties of semidefinite and completely positive cones together with complementarity conditions for rank constraints; no free parameters are introduced and no new entities are postulated beyond the generalized cone, which is presented as an extension of known objects.

axioms (1)

standard math Properties of generalized completely positive cones and complementarity conditions suffice to encode rank constraints exactly
Invoked in the lifting step after reducing to bilinear factorization and rank-constrained SDP

pith-pipeline@v0.9.0 · 5684 in / 1199 out tokens · 39648 ms · 2026-05-20T13:31:24.074165+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

[1]

Implicit regularization in deep matrix factorization

Sanjeev Arora, Nadav Cohen, Wei Hu, and Yuping Luo. Implicit regularization in deep matrix factorization. Advances in Neural Information Processing Systems, 2019

work page 2019
[2]

Breaking the curse of dimensionality with convex neural networks

Francis Bach. Breaking the curse of dimensionality with convex neural networks. Journal of Machine Learning Research, 18 0 (19): 0 1--53, 2017

work page 2017
[3]

On conic QPCC s, conic QCQP s and completely positive programs

Lijie Bai, John E Mitchell, and Jong-Shi Pang. On conic QPCC s, conic QCQP s and completely positive programs. Mathematical Programming, 159 0 (1): 0 109--136, 2016

work page 2016
[4]

Universal approximation bounds for superpositions of a sigmoidal function

Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39 0 (3): 0 930--945, 1993

work page 1993
[5]

Training quantized neural networks to global optimality via semidefinite programming

Burak Bartan and Mert Pilanci. Training quantized neural networks to global optimality via semidefinite programming. In International Conference on Machine Learning. PMLR, 2021

work page 2021
[6]

Convex optimization of deep polynomial and ReLU activation neural networks

Burak Bartan and Mert Pilanci. Convex optimization of deep polynomial and ReLU activation neural networks. IEEE International Conference on Acoustics, Speech and Signal Processing, 2023

work page 2023
[7]

Neural spectrahedra and semidefinite lifts: Global convex optimization of degree-two polynomial activation neural networks in polynomial-time

Burak Bartan and Mert Pilanci. Neural spectrahedra and semidefinite lifts: Global convex optimization of degree-two polynomial activation neural networks in polynomial-time. Mathematical Programming, 213 0 (1): 0 737--769, 2025

work page 2025
[8]

Convex neural networks

Yoshua Bengio, Nicolas Roux, Pascal Vincent, Olivier Delalleau, and Patrice Marcotte. Convex neural networks. Advances in Neural Information Processing Systems, 2005

work page 2005
[9]

Implicit convex regularizers of CNN architectures: Convex optimization of two-and three-layer networks in polynomial time

Tolga Ergen and Mert Pilanci. Implicit convex regularizers of CNN architectures: Convex optimization of two-and three-layer networks in polynomial time. International Conference on Learning Representations, 2021 a

work page 2021
[10]

Convex geometry and duality of over-parameterized neural networks

Tolga Ergen and Mert Pilanci. Convex geometry and duality of over-parameterized neural networks. Journal of Machine Learning Research, 22 0 (212): 0 1--63, 2021 b

work page 2021
[11]

Global optimality beyond two layers: Training deep relu networks via convex programs

Tolga Ergen and Mert Pilanci. Global optimality beyond two layers: Training deep relu networks via convex programs. International Conference on Machine Learning, 2021 c

work page 2021
[12]

Revealing the structure of deep neural networks via convex duality

Tolga Ergen and Mert Pilanci. Revealing the structure of deep neural networks via convex duality. International Conference on Machine Learning, 2021 d

work page 2021
[13]

The convex landscape of neural networks: Characterizing global optima and stationary points via lasso models

Tolga Ergen and Mert Pilanci. The convex landscape of neural networks: Characterizing global optima and stationary points via lasso models. IEEE Transactions on Information Theory, 2025

work page 2025
[14]

Convex formulation of overparameterized deep neural networks

Cong Fang, Yihong Gu, Weizhong Zhang, and Tong Zhang. Convex formulation of overparameterized deep neural networks. IEEE Transactions on Information Theory, 68 0 (8): 0 5340--5352, 2022

work page 2022
[15]

Rank diminishing in deep neural networks

Ruili Feng, Kecheng Zheng, Yukun Huang, Deli Zhao, Michael Jordan, and Zheng-Jun Zha. Rank diminishing in deep neural networks. Advances in Neural Information Processing Systems, 2022

work page 2022
[16]

Deep learning without poor local minima

Kenji Kawaguchi. Deep learning without poor local minima. Advances in Neural Information Processing Systems, 2016

work page 2016
[17]

Integral representations of S obolev spaces via ReLU k activation function and optimal error estimates for linearized networks

Xinliang Liu, Tong Mao, and Jinchao Xu. Integral representations of S obolev spaces via ReLU k activation function and optimal error estimates for linearized networks. arXiv preprint arXiv:2505.00351, 2025

work page arXiv 2025
[18]

A new function space from B arron class and application to neural network approximation

Yan Meng and Pingbing Ming. A new function space from B arron class and application to neural network approximation. Communications in Computational Physics, 32 0 (5): 0 1361--1400, 2022

work page 2022
[19]

On the tractability of multivariate integration and approximation by neural networks

Hrushikesh Narhar Mhaskar. On the tractability of multivariate integration and approximation by neural networks. Journal of Complexity, 20 0 (4): 0 561--590, 2004

work page 2004
[20]

Approximation hierarchies for copositive cone over symmetric cone and their comparison

Mitsuhiro Nishijima and Kazuhide Nakata. Approximation hierarchies for copositive cone over symmetric cone and their comparison. Journal of Global Optimization, 88 0 (4): 0 831--870, 2024

work page 2024
[21]

Banach space representer theorems for neural networks and ridge splines

Rahul Parhi and Robert D Nowak. Banach space representer theorems for neural networks and ridge splines. Journal of Machine Learning Research, 22 0 (43): 0 1--40, 2021

work page 2021
[22]

Neural networks are convex regularizers: E xact polynomial-time convex optimization formulations for two-layer networks

Mert Pilanci and Tolga Ergen. Neural networks are convex regularizers: E xact polynomial-time convex optimization formulations for two-layer networks. International Conference on Machine Learning, 2020

work page 2020
[23]

Convex formulations for training two-layer ReLU neural networks

Karthik Prakhya, Tolga Birdal, and Alp Yurtsever. Convex formulations for training two-layer ReLU neural networks. International Conference on Learning Representations, 2025

work page 2025
[24]

1 regularization in infinite dimensional feature spaces

Saharon Rosset, Grzegorz Swirszcz, Nathan Srebro, and Ji Zhu. 1 regularization in infinite dimensional feature spaces. International Conference on Computational Learning Theory, pages 544--558, 2007

work page 2007
[25]

Vector-output relu neural network problems are copositive programs: C onvex analysis of two layer networks and polynomial-time algorithms

Arda Sahiner, Tolga Ergen, John Pauly, and Mert Pilanci. Vector-output relu neural network problems are copositive programs: C onvex analysis of two layer networks and polynomial-time algorithms. International Conference on Learning Representations, 2021

work page 2021
[26]

Unraveling attention via convex duality: Analysis and interpretations of vision transformers

Arda Sahiner, Tolga Ergen, Batu Ozturkler, John Pauly, Morteza Mardani, and Mert Pilanci. Unraveling attention via convex duality: Analysis and interpretations of vision transformers. International Conference on Machine Learning, 2022

work page 2022
[27]

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Andrew M Saxe, James L McClelland, and Surya Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. International Conference on Learning Representations, 2014

work page 2014
[28]

Variation spaces for multi-output neural networks: Insights on multi-task learning and network compression

Joseph Shenouda, Rahul Parhi, Kangwook Lee, and Robert D Nowak. Variation spaces for multi-output neural networks: Insights on multi-task learning and network compression. Journal of Machine Learning Research, 25 0 (231): 0 1--40, 2024

work page 2024
[29]

Duality for neural networks through reproducing kernel B anach spaces

Len Spek, Tjeerd Jan Heeringa, Felix Schwenninger, and Christoph Brune. Duality for neural networks through reproducing kernel B anach spaces. Applied and Computational Harmonic Analysis, 78: 0 101765, 2025

work page 2025

[1] [1]

Implicit regularization in deep matrix factorization

Sanjeev Arora, Nadav Cohen, Wei Hu, and Yuping Luo. Implicit regularization in deep matrix factorization. Advances in Neural Information Processing Systems, 2019

work page 2019

[2] [2]

Breaking the curse of dimensionality with convex neural networks

Francis Bach. Breaking the curse of dimensionality with convex neural networks. Journal of Machine Learning Research, 18 0 (19): 0 1--53, 2017

work page 2017

[3] [3]

On conic QPCC s, conic QCQP s and completely positive programs

Lijie Bai, John E Mitchell, and Jong-Shi Pang. On conic QPCC s, conic QCQP s and completely positive programs. Mathematical Programming, 159 0 (1): 0 109--136, 2016

work page 2016

[4] [4]

Universal approximation bounds for superpositions of a sigmoidal function

Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39 0 (3): 0 930--945, 1993

work page 1993

[5] [5]

Training quantized neural networks to global optimality via semidefinite programming

Burak Bartan and Mert Pilanci. Training quantized neural networks to global optimality via semidefinite programming. In International Conference on Machine Learning. PMLR, 2021

work page 2021

[6] [6]

Convex optimization of deep polynomial and ReLU activation neural networks

Burak Bartan and Mert Pilanci. Convex optimization of deep polynomial and ReLU activation neural networks. IEEE International Conference on Acoustics, Speech and Signal Processing, 2023

work page 2023

[7] [7]

Neural spectrahedra and semidefinite lifts: Global convex optimization of degree-two polynomial activation neural networks in polynomial-time

Burak Bartan and Mert Pilanci. Neural spectrahedra and semidefinite lifts: Global convex optimization of degree-two polynomial activation neural networks in polynomial-time. Mathematical Programming, 213 0 (1): 0 737--769, 2025

work page 2025

[8] [8]

Convex neural networks

Yoshua Bengio, Nicolas Roux, Pascal Vincent, Olivier Delalleau, and Patrice Marcotte. Convex neural networks. Advances in Neural Information Processing Systems, 2005

work page 2005

[9] [9]

Implicit convex regularizers of CNN architectures: Convex optimization of two-and three-layer networks in polynomial time

Tolga Ergen and Mert Pilanci. Implicit convex regularizers of CNN architectures: Convex optimization of two-and three-layer networks in polynomial time. International Conference on Learning Representations, 2021 a

work page 2021

[10] [10]

Convex geometry and duality of over-parameterized neural networks

Tolga Ergen and Mert Pilanci. Convex geometry and duality of over-parameterized neural networks. Journal of Machine Learning Research, 22 0 (212): 0 1--63, 2021 b

work page 2021

[11] [11]

Global optimality beyond two layers: Training deep relu networks via convex programs

Tolga Ergen and Mert Pilanci. Global optimality beyond two layers: Training deep relu networks via convex programs. International Conference on Machine Learning, 2021 c

work page 2021

[12] [12]

Revealing the structure of deep neural networks via convex duality

Tolga Ergen and Mert Pilanci. Revealing the structure of deep neural networks via convex duality. International Conference on Machine Learning, 2021 d

work page 2021

[13] [13]

The convex landscape of neural networks: Characterizing global optima and stationary points via lasso models

Tolga Ergen and Mert Pilanci. The convex landscape of neural networks: Characterizing global optima and stationary points via lasso models. IEEE Transactions on Information Theory, 2025

work page 2025

[14] [14]

Convex formulation of overparameterized deep neural networks

Cong Fang, Yihong Gu, Weizhong Zhang, and Tong Zhang. Convex formulation of overparameterized deep neural networks. IEEE Transactions on Information Theory, 68 0 (8): 0 5340--5352, 2022

work page 2022

[15] [15]

Rank diminishing in deep neural networks

Ruili Feng, Kecheng Zheng, Yukun Huang, Deli Zhao, Michael Jordan, and Zheng-Jun Zha. Rank diminishing in deep neural networks. Advances in Neural Information Processing Systems, 2022

work page 2022

[16] [16]

Deep learning without poor local minima

Kenji Kawaguchi. Deep learning without poor local minima. Advances in Neural Information Processing Systems, 2016

work page 2016

[17] [17]

Integral representations of S obolev spaces via ReLU k activation function and optimal error estimates for linearized networks

Xinliang Liu, Tong Mao, and Jinchao Xu. Integral representations of S obolev spaces via ReLU k activation function and optimal error estimates for linearized networks. arXiv preprint arXiv:2505.00351, 2025

work page arXiv 2025

[18] [18]

A new function space from B arron class and application to neural network approximation

Yan Meng and Pingbing Ming. A new function space from B arron class and application to neural network approximation. Communications in Computational Physics, 32 0 (5): 0 1361--1400, 2022

work page 2022

[19] [19]

On the tractability of multivariate integration and approximation by neural networks

Hrushikesh Narhar Mhaskar. On the tractability of multivariate integration and approximation by neural networks. Journal of Complexity, 20 0 (4): 0 561--590, 2004

work page 2004

[20] [20]

Approximation hierarchies for copositive cone over symmetric cone and their comparison

Mitsuhiro Nishijima and Kazuhide Nakata. Approximation hierarchies for copositive cone over symmetric cone and their comparison. Journal of Global Optimization, 88 0 (4): 0 831--870, 2024

work page 2024

[21] [21]

Banach space representer theorems for neural networks and ridge splines

Rahul Parhi and Robert D Nowak. Banach space representer theorems for neural networks and ridge splines. Journal of Machine Learning Research, 22 0 (43): 0 1--40, 2021

work page 2021

[22] [22]

Neural networks are convex regularizers: E xact polynomial-time convex optimization formulations for two-layer networks

Mert Pilanci and Tolga Ergen. Neural networks are convex regularizers: E xact polynomial-time convex optimization formulations for two-layer networks. International Conference on Machine Learning, 2020

work page 2020

[23] [23]

Convex formulations for training two-layer ReLU neural networks

Karthik Prakhya, Tolga Birdal, and Alp Yurtsever. Convex formulations for training two-layer ReLU neural networks. International Conference on Learning Representations, 2025

work page 2025

[24] [24]

1 regularization in infinite dimensional feature spaces

Saharon Rosset, Grzegorz Swirszcz, Nathan Srebro, and Ji Zhu. 1 regularization in infinite dimensional feature spaces. International Conference on Computational Learning Theory, pages 544--558, 2007

work page 2007

[25] [25]

Vector-output relu neural network problems are copositive programs: C onvex analysis of two layer networks and polynomial-time algorithms

Arda Sahiner, Tolga Ergen, John Pauly, and Mert Pilanci. Vector-output relu neural network problems are copositive programs: C onvex analysis of two layer networks and polynomial-time algorithms. International Conference on Learning Representations, 2021

work page 2021

[26] [26]

Unraveling attention via convex duality: Analysis and interpretations of vision transformers

Arda Sahiner, Tolga Ergen, Batu Ozturkler, John Pauly, Morteza Mardani, and Mert Pilanci. Unraveling attention via convex duality: Analysis and interpretations of vision transformers. International Conference on Machine Learning, 2022

work page 2022

[27] [27]

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Andrew M Saxe, James L McClelland, and Surya Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. International Conference on Learning Representations, 2014

work page 2014

[28] [28]

Variation spaces for multi-output neural networks: Insights on multi-task learning and network compression

Joseph Shenouda, Rahul Parhi, Kangwook Lee, and Robert D Nowak. Variation spaces for multi-output neural networks: Insights on multi-task learning and network compression. Journal of Machine Learning Research, 25 0 (231): 0 1--40, 2024

work page 2024

[29] [29]

Duality for neural networks through reproducing kernel B anach spaces

Len Spek, Tjeerd Jan Heeringa, Felix Schwenninger, and Christoph Brune. Duality for neural networks through reproducing kernel B anach spaces. Applied and Computational Harmonic Analysis, 78: 0 101765, 2025

work page 2025