Parameterized Complexity of Stationarity Testing for Piecewise-Affine Functions and Shallow CNN Losses
Pith reviewed 2026-05-25 06:05 UTC · model grok-4.3
The pith
Testing approximate stationarity for continuous piecewise-affine functions is XP-tractable in fixed dimension for some variants but W[1]-hard for others.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We give XP algorithms in fixed dimension for the tractable sides, and prove W[1]-hardness for the complementary sides. Moreover, lower bounds under the Exponential Time Hypothesis rule out algorithms running in time rho(d) size^{o(d)} for any computable function rho. As a further consequence, our results yield the corresponding parameterized complexity picture for testing local minimality of continuous PA functions. We further extend our hardness results to a family of shallow ReLU CNN training losses, with stationarity tested in the trainable weight space.
What carries the argument
Parameterized complexity analysis with ambient dimension d as the single parameter, classifying approximate first-order stationarity testing problems on continuous piecewise-affine functions (and their shallow ReLU CNN realizations) into XP-tractable versus W[1]-hard cases.
If this is right
- The identical XP versus W[1]-hardness classification holds for testing local minimality of continuous piecewise-affine functions.
- Stationarity testing for shallow ReLU CNN training losses in the trainable weight space obeys the same parameterized complexity picture.
- No algorithm exists for the hard cases that runs in time rho(d) size^{o(d)} under the Exponential Time Hypothesis.
Where Pith is reading between the lines
- High-dimensional instances of nonsmooth stationarity testing may require time exponential in the dimension even for approximate checks.
- The extension to CNN losses points to possible computational barriers when verifying optimality conditions during training of even simple neural networks.
- Restrictions on the form of the piecewise-affine functions or on the allowed approximation tolerance might restore fixed-parameter tractability in the hard cases.
Load-bearing premise
The stationarity-testing instances admit a standard binary encoding of total length denoted size, and the approximate stationarity notion is the one already shown classically intractable by prior work.
What would settle it
An algorithm solving the W[1]-hard stationarity-testing instances in time rho(d) times size to the o(d) for some computable rho would contradict the Exponential Time Hypothesis lower bound.
Figures
read the original abstract
We study the parameterized complexity of testing approximate first-order stationarity at a prescribed point for continuous piecewise-affine (PA) functions, a basic task in nonsmooth optimization. PA functions form a canonical model for nonsmooth stationarity testing and capture the local polyhedral geometry that appears in ReLU-type training losses. Recent work by Tian and So (SODA 2025) shows that testing approximate stationarity notions for PA functions is computationally intractable in the worst case, and identifies fixed-dimensional tractability as an open direction. We address this direction from the viewpoint of parameterized complexity, with the ambient dimension $d$ as the parameter. In this paper, we give XP algorithms in fixed dimension for the tractable sides, and prove W[1]-hardness for the complementary sides. Moreover, lower bounds under the Exponential Time Hypothesis rule out algorithms running in time $\rho(d)\size^{o(d)}$ for any computable function $\rho$, where $\size$ denotes the total binary encoding length of the stationarity-testing instance. As a further consequence, our results yield the corresponding parameterized complexity picture for testing local minimality of continuous PA functions. We further extend our hardness results to a family of shallow ReLU CNN training losses, with stationarity tested in the trainable weight space. Thus, the same parameterized-complexity picture also appears for simple CNN training losses.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to establish a parameterized complexity dichotomy for testing approximate first-order stationarity of continuous piecewise-affine (PA) functions with ambient dimension d as the parameter: XP algorithms for the tractable sides, W[1]-hardness for the complementary sides, and ETH lower bounds ruling out any algorithm running in time ρ(d)·size^{o(d)} for computable ρ. It further claims the same picture for testing local minimality of PA functions and extends the hardness results to stationarity testing for shallow ReLU CNN losses in weight space, building on the classical intractability shown by Tian and So (SODA 2025).
Significance. If the claimed results hold with full supporting arguments, they would resolve the fixed-dimensional tractability question left open by Tian and So, delivering a complete fine-grained complexity classification for a canonical task in nonsmooth optimization. The direct extension to simple CNN losses in weight space is a notable strength with relevance to machine learning, and the use of standard parameterized-complexity machinery together with the external base intractability result is appropriately grounded.
major comments (2)
- [Abstract] Abstract: the manuscript asserts the existence of XP algorithms, W[1]-hardness reductions, and ETH lower bounds but supplies no proof sketches, reduction details, or verification that the reductions preserve the continuous PA property; this is load-bearing because the reductions are the sole support for the hardness side of the claimed dichotomy.
- [Abstract] Abstract: the extension of the hardness results to shallow ReLU CNN training losses is stated without any description of the mapping from stationarity-testing instances to the trainable weight space or confirmation that the continuous PA property is preserved under this realization; this is load-bearing for the claimed applicability to CNN losses.
Simulated Author's Rebuttal
We thank the referee for the careful review and for identifying these points of presentation. We address each major comment below, noting that the full technical arguments appear in the body of the manuscript as is conventional for a research article.
read point-by-point responses
-
Referee: [Abstract] Abstract: the manuscript asserts the existence of XP algorithms, W[1]-hardness reductions, and ETH lower bounds but supplies no proof sketches, reduction details, or verification that the reductions preserve the continuous PA property; this is load-bearing because the reductions are the sole support for the hardness side of the claimed dichotomy.
Authors: The abstract is a concise summary. The manuscript supplies the requested material in the body: Section 3 gives the XP algorithms together with their parameterized running-time analysis; Section 4 contains the W[1]-hardness reductions (from Multicolored Clique) and the ETH lower bounds, including an explicit verification (Lemma 4.2) that each constructed instance remains a continuous piecewise-affine function. We are willing to insert a one-sentence pointer to these sections in the abstract if the editor prefers. revision: no
-
Referee: [Abstract] Abstract: the extension of the hardness results to shallow ReLU CNN training losses is stated without any description of the mapping from stationarity-testing instances to the trainable weight space or confirmation that the continuous PA property is preserved under this realization; this is load-bearing for the claimed applicability to CNN losses.
Authors: Section 5 provides the details. Definition 5.2 gives an explicit, polynomial-time mapping that encodes any continuous PA function as the training loss of a shallow ReLU CNN (one hidden layer) by placing the affine pieces into the weight and bias parameters. Proposition 5.3 proves that the mapping preserves continuity and piecewise affinity and that first-order stationarity at a point in weight space is equivalent to stationarity for the original PA instance. The hardness statements therefore transfer verbatim. revision: no
Circularity Check
No significant circularity identified
full rationale
The paper establishes XP algorithms, W[1]-hardness, and ETH lower bounds for approximate stationarity testing of PA functions (parameterized by dimension d) and extends the hardness to shallow ReLU CNN losses. These rest on standard parameterized-complexity machinery and an external reference to Tian and So (SODA 2025) for classical intractability; the stationarity definition and encoding length 'size' are taken as given from that prior work. No derivation reduces by construction to a fitted quantity, self-referential definition, or load-bearing self-citation chain. The central claims remain independent of the paper's own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Exponential Time Hypothesis (ETH)
Reference graph
Works this paper leans on
- [1]
-
[2]
Hanyang Li and Ying Cui. Subgradient Regularization: A descent-oriented subgradient method for nonsmooth optimization.arXiv preprint arXiv:2505.07143,
-
[3]
doi: 10.48550/arXiv.2505.07143. URLhttps://arxiv.org/abs/2505.07143. Jiajin Li, Anthony Man-Cho So, and Wing-Kin Ma. Understanding notions of stationarity in nonsmooth optimization: A guided tour of various constructions of subdifferential for nonsmooth functions.IEEE Signal Processing Magazine, 37(5):18–31, September
-
[4]
Testing approximate stationarity concepts for piecewise affine functions and extensions
Lai Tian and Anthony Man-Cho So. Testing approximate stationarity concepts for piecewise affine functions and extensions. NeurIPS 2023 Workshop on Optimization for Machine Learning (OPT), poster,
work page 2023
-
[5]
Testing approximate stationarity concepts for piecewise affine functions
Lai Tian and Anthony Man-Cho So. Testing approximate stationarity concepts for piecewise affine functions. InProceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2213–2224. SIAM,
work page 2025
-
[6]
14 A Supplementary material for Section 2 A.1 A geometric guide to Clarke and Fréchet stationarity This subsection gives a geometric reading of the two stationarity notions used in the paper. The general definitions follow the standard nonsmooth-analysis viewpoint; see, for example, Clarke [1990], Rockafellar and Wets
work page 1990
-
[7]
and the monograph of Cui and Pang [2021]. The key distinction is that Fréchet stationarity is governed by first-order local lower supports centered at the base point, whereas Clarke stationarity, for locally Lipschitz functions, is governed by the convex hull of limiting gradients from nearby differentiability points. General definitions.For an extended-r...
work page 2021
-
[8]
This example separates the two notions without relying on an empty Fréchet subdifferential
Therefore ∂Ff(0) ={0}⊊conv{0, e 1, e2}=∂ Cf(0). This example separates the two notions without relying on an empty Fréchet subdifferential. A.2 Deferred proofs for local PA calculus Proof of Lemma 2.3.Take a finite polyhedral partition on whichf is affine. After translating the base point to the origin, each cell whose closure containsw is locally describ...
work page 1975
-
[9]
We record only the projection forms needed below
for the rational-data ellipsoid framework. We record only the projection forms needed below. Lemma B.3(Exact polynomial-time comparison for rational quadratic subproblems).LetQ⪰ 0, q,r,A,b, andτbe rational input data. The threshold decision problem inf{z ⊤Qz+ 2q ⊤z+r:Az≤b} ≤τ can be decided in polynomial time in the binary encoding length of the displayed...
work page 1980
-
[10]
Proof.Forα= (i, j, u, v)∈ A G, let Fα :=Q× · · · ×Q× {p u} ×Q× · · · ×Q× {p v} ×Q× · · · ×Q⊆Q k, where the singleton factors occur in blocksi and j. Then ψα = σFα. By the support-function calculus for polytopes [Ziegler, 1995], the maximum of support functions is the support function of the convex hull of the union, and therefore h= max α σFα =σ conv(∪αFα...
work page 1995
-
[11]
The local-minimality statement follows from Lemma 2.3, sincefF is continuous PA and0∈∂ FfF (0)is equivalent to local minimality at the origin. To obtain Clarke hardness, we use the seesaw gadget introduced by [Tian and So, 2025], where a scalar variable t and the branch−|t|/2are added. In the no-clique case,fF ≡ 0, so the gadget is just t/2. In the clique...
work page 2025
-
[12]
Hence the affine piece with gradient0is attained on a full-dimensional region adjacent to the origin. For PA functions, the Clarke subdifferential at the origin is the convex hull of gradients on full-dimensional adjacent cells. Therefore0∈∂ CfC(0,0). It remains to check that the hard functions just constructed are given efficiently in the two input model...
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.