pith. sign in

arxiv: 2605.17692 · v1 · pith:SCPXQNZDnew · submitted 2026-05-17 · 💻 cs.LG · math.OC

Exact Convex Reformulations of Linear Neural Networks via Completely Positive Lifting

Pith reviewed 2026-05-20 13:31 UTC · model grok-4.3

classification 💻 cs.LG math.OC
keywords linear neural networksconvex reformulationcompletely positive conecopositive programmingbilinear factorizationsquared lossrank-constrained SDPlifted optimization
0
0 comments X

The pith

The training of deep linear neural networks with squared loss admits an exact convex reformulation over a generalized completely positive cone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the nonconvex optimization problem for training deep linear networks under squared loss has an equivalent convex formulation after lifting the variables into a higher-dimensional space. In this lifted problem the objective becomes linear and the only source of nonconvexity is membership in a generalized completely positive cone. The dimension of the lifted space is determined solely by the input and output dimensions of the network, remaining independent of both network depth and the number of training examples. This construction is obtained by first reducing the multilayer linear parameterization to a bilinear factorization, then expressing the resulting rank constraint through a complementarity condition that admits an exact completely positive lift.

Core claim

The training problem of a deep linear neural network under the squared loss admits an exact convex reformulation in a lifted space over a generalized completely positive cone. The reformulation has the same optimal value as the original nonconvex problem and is linear in the lifted variables, with all nonconvexity encoded in the cone constraint. Its ambient lifted dimension depends only on the input and output dimensions, independent of the network depth and the number of data points, and the bottleneck width enters only through scalar constraints. The construction proceeds by reducing the multilayer parameterization to a bilinear factorization, lifting it to a rank-constrained semidefinite,

What carries the argument

A completely positive lifting of a rank-constrained semidefinite program that arises from expressing the multilayer linear map as a bilinear factorization whose rank constraint is encoded by a complementarity condition.

If this is right

  • The optimal value of the original nonconvex training problem equals the optimal value of the lifted convex problem.
  • The lifted formulation is linear in all variables except for the single generalized completely positive cone constraint.
  • The size of the lifted space is fixed by the input and output dimensions alone and does not grow with depth or sample count.
  • All nonconvexity induced by the linear factorization is captured exactly inside the generalized completely positive cone.
  • The same lifting technique connects linear-network training to the broader theory of copositive programming.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The depth-independence of the lifted dimension suggests that additional layers do not introduce new sources of nonconvexity beyond the basic factorization.
  • Efficient outer approximations to the generalized completely positive cone could yield practical convex relaxations for linear-network training.
  • The same lifting strategy may apply to other bilinear factorization problems that appear in matrix completion or low-rank matrix sensing.
  • Exactness of the lift provides a new starting point for deriving tight convex bounds on the loss surface of linear networks.

Load-bearing premise

The multilayer linear parameterization reduces exactly to a bilinear factorization whose rank constraint can be written as a complementarity condition that admits an exact completely positive lift without extra relaxation gaps.

What would settle it

A concrete data set, network widths, and depth for which the optimal value of the lifted convex program differs from the optimal value attained by the original nonconvex training problem.

read the original abstract

We show that the training problem of a deep linear neural network under the squared loss admits an exact convex reformulation in a lifted space over a generalized completely positive cone. The reformulation has the same optimal value as the original nonconvex problem and is linear in the lifted variables, with all nonconvexity encoded in the cone constraint. Its ambient lifted dimension depends only on the input and output dimensions, independent of the network depth and the number of data points, and the bottleneck width enters only through scalar constraints. The construction proceeds by reducing the multilayer parameterization to a bilinear factorization, lifting it to a rank-constrained semidefinite program, expressing the rank constraint via a complementarity condition, and applying a completely positive lifting. While the resulting formulation is computationally intractable in general, it gives an exact conic representation of the nonconvexity induced by linear factorization and connects linear neural network training with copositive programming.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that the training problem of a deep linear neural network under squared loss admits an exact convex reformulation over a generalized completely positive cone in a lifted space. The reformulation is linear in the lifted variables, matches the original optimal value exactly, encodes all nonconvexity in the cone constraint, and has ambient dimension depending only on input/output dimensions (independent of depth and number of data points). The construction reduces the multilayer parameterization to a bilinear factorization, lifts to a rank-constrained SDP, replaces the rank constraint by a complementarity condition, and applies completely positive lifting.

Significance. If the exact equivalence holds, the result would be significant for providing a depth-independent exact conic representation of the nonconvexity induced by linear factorizations, linking neural network training to copositive programming. The paper delivers a parameter-free theoretical derivation with reproducible structure (no fitted parameters or self-referential definitions) and falsifiable predictions about optimal values under the lifted cone.

major comments (2)
  1. [§4] §4 (bilinear factorization and rank-constrained SDP lift): The reduction of the multilayer linear network to a bilinear factorization is sketched, but the subsequent claim that the rank constraint can be exactly replaced by a complementarity condition without introducing extraneous solutions (outside the original low-rank set) lacks a detailed verification. This step is load-bearing for the central exact-equivalence claim; if the complementarity admits points that do not correspond to valid factorizations from the original network, the later CP lift would only yield a relaxation rather than an exact reformulation.
  2. [§5] §5 (completely positive lifting): The manuscript asserts that the generalized CP cone yields an exact lift after the complementarity step, but does not provide an explicit argument showing tightness for the specific complementarity structure arising from the multi-factor bilinear form. Standard CP lifts for PSD rank constraints do not automatically extend to this setting without additional proof that the optimal value is preserved; this gap directly affects whether the lifted problem matches the original nonconvex optimum.
minor comments (2)
  1. [Preliminaries] Notation for the generalized completely positive cone is introduced without a self-contained definition or reference to the precise cone used; a short appendix recalling the cone definition would improve readability.
  2. [Main Result] The abstract states the lifted dimension is independent of the number of data points, but the main text should explicitly confirm this in the statement of the main theorem (currently only implicit in the construction).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments. The major comments correctly identify steps in the proof chain that require more explicit verification to fully substantiate the exact-equivalence claim. We address each point below and have revised the manuscript to supply the missing details and arguments.

read point-by-point responses
  1. Referee: [§4] §4 (bilinear factorization and rank-constrained SDP lift): The reduction of the multilayer linear network to a bilinear factorization is sketched, but the subsequent claim that the rank constraint can be exactly replaced by a complementarity condition without introducing extraneous solutions (outside the original low-rank set) lacks a detailed verification. This step is load-bearing for the central exact-equivalence claim; if the complementarity admits points that do not correspond to valid factorizations from the original network, the later CP lift would only yield a relaxation rather than an exact reformulation.

    Authors: We agree that this equivalence requires a self-contained verification. In the revised manuscript we have inserted a new lemma (Lemma 4.3) that proves the complementarity condition, together with the existing PSD and bilinear constraints, is equivalent to the original rank-constrained formulation. The argument proceeds by showing that any solution satisfying the complementarity admits a factorization into the multilayer weight matrices whose product recovers the lifted matrix exactly; the proof uses the chain-rule structure of the deep linear network to establish that the kernels and ranges enforced by complementarity coincide with those of a valid low-rank factorization. Consequently, no extraneous solutions outside the original set are admitted, preserving exactness before the CP lift is applied. revision: yes

  2. Referee: [§5] §5 (completely positive lifting): The manuscript asserts that the generalized CP cone yields an exact lift after the complementarity step, but does not provide an explicit argument showing tightness for the specific complementarity structure arising from the multi-factor bilinear form. Standard CP lifts for PSD rank constraints do not automatically extend to this setting without additional proof that the optimal value is preserved; this gap directly affects whether the lifted problem matches the original nonconvex optimum.

    Authors: We acknowledge that an explicit tightness argument tailored to the complementarity structure is needed. The revised §5 now contains Theorem 5.2, which establishes that the generalized completely positive cone exactly represents the feasible set after the complementarity substitution. The proof constructs a mapping from any feasible point in the lifted CP program back to a feasible point of the original nonconvex problem that attains the same objective value; it exploits the fact that the complementarity condition forces the lifted matrix to lie on the extreme rays corresponding to rank-1 factorizations of the bilinear form. We also added a short illustrative example for the two-layer case that numerically confirms zero duality gap, supporting the general argument. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation uses standard lifting and cone techniques that remain independent of the target result

full rationale

The claimed exact convex reformulation proceeds by reducing multilayer linear networks to bilinear factorization, lifting to a rank-constrained SDP, replacing the rank constraint with a complementarity condition, and applying a generalized completely positive lift. Each step invokes established conic optimization constructions whose validity is external to the paper's own fitted values or self-referential definitions. The resulting program is asserted to match the original optimal value because the complementarity and CP cone together encode the factorization nonconvexity without introducing extraneous solutions, but this equivalence is derived from the algebraic properties of the lift rather than by renaming an input or citing a self-authored uniqueness theorem as the sole justification. No equation in the provided chain equates a derived quantity to a fitted parameter by construction, and the depth-independent dimension follows directly from the lifting dimension choice. The analysis is therefore self-contained against external benchmarks in convex optimization.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The construction relies on standard properties of semidefinite and completely positive cones together with complementarity conditions for rank constraints; no free parameters are introduced and no new entities are postulated beyond the generalized cone, which is presented as an extension of known objects.

axioms (1)
  • standard math Properties of generalized completely positive cones and complementarity conditions suffice to encode rank constraints exactly
    Invoked in the lifting step after reducing to bilinear factorization and rank-constrained SDP

pith-pipeline@v0.9.0 · 5684 in / 1199 out tokens · 39648 ms · 2026-05-20T13:31:24.074165+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

  1. [1]

    Implicit regularization in deep matrix factorization

    Sanjeev Arora, Nadav Cohen, Wei Hu, and Yuping Luo. Implicit regularization in deep matrix factorization. Advances in Neural Information Processing Systems, 2019

  2. [2]

    Breaking the curse of dimensionality with convex neural networks

    Francis Bach. Breaking the curse of dimensionality with convex neural networks. Journal of Machine Learning Research, 18 0 (19): 0 1--53, 2017

  3. [3]

    On conic QPCC s, conic QCQP s and completely positive programs

    Lijie Bai, John E Mitchell, and Jong-Shi Pang. On conic QPCC s, conic QCQP s and completely positive programs. Mathematical Programming, 159 0 (1): 0 109--136, 2016

  4. [4]

    Universal approximation bounds for superpositions of a sigmoidal function

    Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39 0 (3): 0 930--945, 1993

  5. [5]

    Training quantized neural networks to global optimality via semidefinite programming

    Burak Bartan and Mert Pilanci. Training quantized neural networks to global optimality via semidefinite programming. In International Conference on Machine Learning. PMLR, 2021

  6. [6]

    Convex optimization of deep polynomial and ReLU activation neural networks

    Burak Bartan and Mert Pilanci. Convex optimization of deep polynomial and ReLU activation neural networks. IEEE International Conference on Acoustics, Speech and Signal Processing, 2023

  7. [7]

    Neural spectrahedra and semidefinite lifts: Global convex optimization of degree-two polynomial activation neural networks in polynomial-time

    Burak Bartan and Mert Pilanci. Neural spectrahedra and semidefinite lifts: Global convex optimization of degree-two polynomial activation neural networks in polynomial-time. Mathematical Programming, 213 0 (1): 0 737--769, 2025

  8. [8]

    Convex neural networks

    Yoshua Bengio, Nicolas Roux, Pascal Vincent, Olivier Delalleau, and Patrice Marcotte. Convex neural networks. Advances in Neural Information Processing Systems, 2005

  9. [9]

    Implicit convex regularizers of CNN architectures: Convex optimization of two-and three-layer networks in polynomial time

    Tolga Ergen and Mert Pilanci. Implicit convex regularizers of CNN architectures: Convex optimization of two-and three-layer networks in polynomial time. International Conference on Learning Representations, 2021 a

  10. [10]

    Convex geometry and duality of over-parameterized neural networks

    Tolga Ergen and Mert Pilanci. Convex geometry and duality of over-parameterized neural networks. Journal of Machine Learning Research, 22 0 (212): 0 1--63, 2021 b

  11. [11]

    Global optimality beyond two layers: Training deep relu networks via convex programs

    Tolga Ergen and Mert Pilanci. Global optimality beyond two layers: Training deep relu networks via convex programs. International Conference on Machine Learning, 2021 c

  12. [12]

    Revealing the structure of deep neural networks via convex duality

    Tolga Ergen and Mert Pilanci. Revealing the structure of deep neural networks via convex duality. International Conference on Machine Learning, 2021 d

  13. [13]

    The convex landscape of neural networks: Characterizing global optima and stationary points via lasso models

    Tolga Ergen and Mert Pilanci. The convex landscape of neural networks: Characterizing global optima and stationary points via lasso models. IEEE Transactions on Information Theory, 2025

  14. [14]

    Convex formulation of overparameterized deep neural networks

    Cong Fang, Yihong Gu, Weizhong Zhang, and Tong Zhang. Convex formulation of overparameterized deep neural networks. IEEE Transactions on Information Theory, 68 0 (8): 0 5340--5352, 2022

  15. [15]

    Rank diminishing in deep neural networks

    Ruili Feng, Kecheng Zheng, Yukun Huang, Deli Zhao, Michael Jordan, and Zheng-Jun Zha. Rank diminishing in deep neural networks. Advances in Neural Information Processing Systems, 2022

  16. [16]

    Deep learning without poor local minima

    Kenji Kawaguchi. Deep learning without poor local minima. Advances in Neural Information Processing Systems, 2016

  17. [17]

    Integral representations of S obolev spaces via ReLU k activation function and optimal error estimates for linearized networks

    Xinliang Liu, Tong Mao, and Jinchao Xu. Integral representations of S obolev spaces via ReLU k activation function and optimal error estimates for linearized networks. arXiv preprint arXiv:2505.00351, 2025

  18. [18]

    A new function space from B arron class and application to neural network approximation

    Yan Meng and Pingbing Ming. A new function space from B arron class and application to neural network approximation. Communications in Computational Physics, 32 0 (5): 0 1361--1400, 2022

  19. [19]

    On the tractability of multivariate integration and approximation by neural networks

    Hrushikesh Narhar Mhaskar. On the tractability of multivariate integration and approximation by neural networks. Journal of Complexity, 20 0 (4): 0 561--590, 2004

  20. [20]

    Approximation hierarchies for copositive cone over symmetric cone and their comparison

    Mitsuhiro Nishijima and Kazuhide Nakata. Approximation hierarchies for copositive cone over symmetric cone and their comparison. Journal of Global Optimization, 88 0 (4): 0 831--870, 2024

  21. [21]

    Banach space representer theorems for neural networks and ridge splines

    Rahul Parhi and Robert D Nowak. Banach space representer theorems for neural networks and ridge splines. Journal of Machine Learning Research, 22 0 (43): 0 1--40, 2021

  22. [22]

    Neural networks are convex regularizers: E xact polynomial-time convex optimization formulations for two-layer networks

    Mert Pilanci and Tolga Ergen. Neural networks are convex regularizers: E xact polynomial-time convex optimization formulations for two-layer networks. International Conference on Machine Learning, 2020

  23. [23]

    Convex formulations for training two-layer ReLU neural networks

    Karthik Prakhya, Tolga Birdal, and Alp Yurtsever. Convex formulations for training two-layer ReLU neural networks. International Conference on Learning Representations, 2025

  24. [24]

    1 regularization in infinite dimensional feature spaces

    Saharon Rosset, Grzegorz Swirszcz, Nathan Srebro, and Ji Zhu. 1 regularization in infinite dimensional feature spaces. International Conference on Computational Learning Theory, pages 544--558, 2007

  25. [25]

    Vector-output relu neural network problems are copositive programs: C onvex analysis of two layer networks and polynomial-time algorithms

    Arda Sahiner, Tolga Ergen, John Pauly, and Mert Pilanci. Vector-output relu neural network problems are copositive programs: C onvex analysis of two layer networks and polynomial-time algorithms. International Conference on Learning Representations, 2021

  26. [26]

    Unraveling attention via convex duality: Analysis and interpretations of vision transformers

    Arda Sahiner, Tolga Ergen, Batu Ozturkler, John Pauly, Morteza Mardani, and Mert Pilanci. Unraveling attention via convex duality: Analysis and interpretations of vision transformers. International Conference on Machine Learning, 2022

  27. [27]

    Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

    Andrew M Saxe, James L McClelland, and Surya Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. International Conference on Learning Representations, 2014

  28. [28]

    Variation spaces for multi-output neural networks: Insights on multi-task learning and network compression

    Joseph Shenouda, Rahul Parhi, Kangwook Lee, and Robert D Nowak. Variation spaces for multi-output neural networks: Insights on multi-task learning and network compression. Journal of Machine Learning Research, 25 0 (231): 0 1--40, 2024

  29. [29]

    Duality for neural networks through reproducing kernel B anach spaces

    Len Spek, Tjeerd Jan Heeringa, Felix Schwenninger, and Christoph Brune. Duality for neural networks through reproducing kernel B anach spaces. Applied and Computational Harmonic Analysis, 78: 0 101765, 2025