Neural Networks and (Virtual) Extended Formulations
read the original abstract
Neural networks with piecewise linear activation functions, such as rectified linear units (ReLU) or maxout, are among the most fundamental models in modern machine learning. We make a step towards proving lower bounds on the size of such neural networks by linking their representative capabilities to the notion of the extension complexity $\mathrm{xc}(P)$ of a polytope $P$. This is a well-studied quantity in combinatorial optimization and polyhedral geometry describing the number of inequalities needed to model $P$ as a linear program. We show that $\mathrm{xc}(P)$ is a lower bound on the size of any monotone or input-convex neural network that solves the linear optimization problem over $P$. This implies exponential lower bounds on such neural networks for a variety of problems, including the polynomially solvable maximum weight matching problem. In an attempt to prove similar bounds also for general neural networks, we introduce the notion of virtual extension complexity $\mathrm{vxc}(P)$, which generalizes $\mathrm{xc}(P)$ and describes the number of inequalities needed to represent the linear optimization problem over $P$ as a difference of two linear programs. We prove that $\mathrm{vxc}(P)$ is a lower bound on the size of any neural network that optimizes over $P$. While it remains an open question to derive useful lower bounds on $\mathrm{vxc}(P)$, we argue that this quantity deserves to be studied independently from neural networks by proving that one can efficiently optimize over a polytope $P$ given a virtual extended formulation with small encoding size.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Parameterized Hardness of Zonotope Containment and Neural Network Verification
The paper proves W[1]-hardness parameterized by dimension d for positivity, zonotope containment, max approximation, and L_p-Lipschitz constants in 2- and 3-layer ReLU networks, showing enumeration methods are optimal...
-
Parameterized Complexity of Stationarity Testing for Piecewise-Affine Functions and Shallow CNN Losses
The paper fully characterizes the parameterized complexity of approximate first-order stationarity testing for continuous piecewise-affine functions and shallow ReLU CNN losses with respect to the dimension parameter,...
-
Parameterized Complexity of Stationarity Testing for Piecewise-Affine Functions and Shallow CNN Losses
Testing approximate first-order stationarity for continuous PA functions is XP-tractable but W[1]-hard (with ETH lower bounds) when parameterized by dimension d, and the same parameterized complexity classification ho...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.