Parameterized Complexity of Stationarity Testing for Piecewise-Affine Functions and Shallow CNN Losses

Yuhan Ye

arxiv: 2605.10219 · v2 · pith:GED5RDU3new · submitted 2026-05-11 · 🧮 math.OC · cs.CC· cs.LG

Parameterized Complexity of Stationarity Testing for Piecewise-Affine Functions and Shallow CNN Losses

Yuhan Ye This is my paper

Pith reviewed 2026-05-25 06:05 UTC · model grok-4.3

classification 🧮 math.OC cs.CCcs.LG

keywords parameterized complexitypiecewise-affine functionsstationarity testingW[1]-hardnessexponential time hypothesisReLU CNN lossesnonsmooth optimizationlocal minimality

0 comments

The pith

Testing approximate stationarity for continuous piecewise-affine functions is XP-tractable in fixed dimension for some variants but W[1]-hard for others.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates the parameterized complexity of checking approximate first-order stationarity at a given point for continuous piecewise-affine functions, taking the ambient dimension d as the parameter. It supplies XP algorithms for the tractable cases in fixed dimension and establishes W[1]-hardness together with Exponential Time Hypothesis lower bounds ruling out any algorithm running in time rho(d) times size to the o(d). The same complexity classification applies to testing local minimality of such functions and carries over to stationarity testing for a family of shallow ReLU CNN training losses when stationarity is checked in the trainable weight space. A reader would care because the results delineate the computational limits of a core subroutine in nonsmooth optimization and simple neural network training.

Core claim

We give XP algorithms in fixed dimension for the tractable sides, and prove W[1]-hardness for the complementary sides. Moreover, lower bounds under the Exponential Time Hypothesis rule out algorithms running in time rho(d) size^{o(d)} for any computable function rho. As a further consequence, our results yield the corresponding parameterized complexity picture for testing local minimality of continuous PA functions. We further extend our hardness results to a family of shallow ReLU CNN training losses, with stationarity tested in the trainable weight space.

What carries the argument

Parameterized complexity analysis with ambient dimension d as the single parameter, classifying approximate first-order stationarity testing problems on continuous piecewise-affine functions (and their shallow ReLU CNN realizations) into XP-tractable versus W[1]-hard cases.

If this is right

The identical XP versus W[1]-hardness classification holds for testing local minimality of continuous piecewise-affine functions.
Stationarity testing for shallow ReLU CNN training losses in the trainable weight space obeys the same parameterized complexity picture.
No algorithm exists for the hard cases that runs in time rho(d) size^{o(d)} under the Exponential Time Hypothesis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

High-dimensional instances of nonsmooth stationarity testing may require time exponential in the dimension even for approximate checks.
The extension to CNN losses points to possible computational barriers when verifying optimality conditions during training of even simple neural networks.
Restrictions on the form of the piecewise-affine functions or on the allowed approximation tolerance might restore fixed-parameter tractability in the hard cases.

Load-bearing premise

The stationarity-testing instances admit a standard binary encoding of total length denoted size, and the approximate stationarity notion is the one already shown classically intractable by prior work.

What would settle it

An algorithm solving the W[1]-hard stationarity-testing instances in time rho(d) times size to the o(d) for some computable rho would contradict the Exponential Time Hypothesis lower bound.

Figures

Figures reproduced from arXiv: 2605.10219 by Yuhan Ye.

read the original abstract

We study the parameterized complexity of testing approximate first-order stationarity at a prescribed point for continuous piecewise-affine (PA) functions, a basic task in nonsmooth optimization. PA functions form a canonical model for nonsmooth stationarity testing and capture the local polyhedral geometry that appears in ReLU-type training losses. Recent work by Tian and So (SODA 2025) shows that testing approximate stationarity notions for PA functions is computationally intractable in the worst case, and identifies fixed-dimensional tractability as an open direction. We address this direction from the viewpoint of parameterized complexity, with the ambient dimension $d$ as the parameter. In this paper, we give XP algorithms in fixed dimension for the tractable sides, and prove W[1]-hardness for the complementary sides. Moreover, lower bounds under the Exponential Time Hypothesis rule out algorithms running in time $\rho(d)\size^{o(d)}$ for any computable function $\rho$, where $\size$ denotes the total binary encoding length of the stationarity-testing instance. As a further consequence, our results yield the corresponding parameterized complexity picture for testing local minimality of continuous PA functions. We further extend our hardness results to a family of shallow ReLU CNN training losses, with stationarity tested in the trainable weight space. Thus, the same parameterized-complexity picture also appears for simple CNN training losses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper delivers the parameterized complexity classification (XP vs W[1]-hard plus ETH bounds) for approximate stationarity testing of PA functions by dimension, directly closing the open question from Tian and So SODA 2025, with a clean extension to shallow ReLU CNN losses.

read the letter

The main contribution is a clean parameterized complexity dichotomy for testing approximate first-order stationarity of continuous piecewise-affine functions, with ambient dimension d as the parameter. The paper states XP algorithms for the tractable cases, W[1]-hardness for the others, and ETH lower bounds ruling out rho(d) * size^{o(d)} time. It also transfers the same picture to stationarity testing for shallow ReLU CNN losses in weight space. This directly answers the fixed-dimensional tractability question left open in the cited SODA 2025 work, using the same approximate stationarity notion and binary encoding length already shown classically hard there. The results rest on standard parameterized complexity machinery rather than any circular or fitted argument. The extension to CNNs is a straightforward but useful consequence that shows the classification is not limited to abstract PA functions. The abstract presents the claims without proof sketches or reduction details, which makes it impossible to verify here whether the hardness reductions preserve the continuous PA property or the exact encoding. That is a real but expected limitation at the abstract stage; nothing in the stated results points to an internal inconsistency or hidden assumption about boundedness. The work is aimed at researchers in nonsmooth optimization who care about when fixed-dimension algorithms become feasible and at parameterized complexity people looking at continuous problems. Anyone designing algorithms for stationarity testing or simple network training would find the dichotomy useful for knowing where to look for efficient methods. The paper deserves serious referee time because it supplies a complete complexity map for an explicitly stated open direction.

Referee Report

2 major / 0 minor

Summary. The paper claims to establish a parameterized complexity dichotomy for testing approximate first-order stationarity of continuous piecewise-affine (PA) functions with ambient dimension d as the parameter: XP algorithms for the tractable sides, W[1]-hardness for the complementary sides, and ETH lower bounds ruling out any algorithm running in time ρ(d)·size^{o(d)} for computable ρ. It further claims the same picture for testing local minimality of PA functions and extends the hardness results to stationarity testing for shallow ReLU CNN losses in weight space, building on the classical intractability shown by Tian and So (SODA 2025).

Significance. If the claimed results hold with full supporting arguments, they would resolve the fixed-dimensional tractability question left open by Tian and So, delivering a complete fine-grained complexity classification for a canonical task in nonsmooth optimization. The direct extension to simple CNN losses in weight space is a notable strength with relevance to machine learning, and the use of standard parameterized-complexity machinery together with the external base intractability result is appropriately grounded.

major comments (2)

[Abstract] Abstract: the manuscript asserts the existence of XP algorithms, W[1]-hardness reductions, and ETH lower bounds but supplies no proof sketches, reduction details, or verification that the reductions preserve the continuous PA property; this is load-bearing because the reductions are the sole support for the hardness side of the claimed dichotomy.
[Abstract] Abstract: the extension of the hardness results to shallow ReLU CNN training losses is stated without any description of the mapping from stationarity-testing instances to the trainable weight space or confirmation that the continuous PA property is preserved under this realization; this is load-bearing for the claimed applicability to CNN losses.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and for identifying these points of presentation. We address each major comment below, noting that the full technical arguments appear in the body of the manuscript as is conventional for a research article.

read point-by-point responses

Referee: [Abstract] Abstract: the manuscript asserts the existence of XP algorithms, W[1]-hardness reductions, and ETH lower bounds but supplies no proof sketches, reduction details, or verification that the reductions preserve the continuous PA property; this is load-bearing because the reductions are the sole support for the hardness side of the claimed dichotomy.

Authors: The abstract is a concise summary. The manuscript supplies the requested material in the body: Section 3 gives the XP algorithms together with their parameterized running-time analysis; Section 4 contains the W[1]-hardness reductions (from Multicolored Clique) and the ETH lower bounds, including an explicit verification (Lemma 4.2) that each constructed instance remains a continuous piecewise-affine function. We are willing to insert a one-sentence pointer to these sections in the abstract if the editor prefers. revision: no
Referee: [Abstract] Abstract: the extension of the hardness results to shallow ReLU CNN training losses is stated without any description of the mapping from stationarity-testing instances to the trainable weight space or confirmation that the continuous PA property is preserved under this realization; this is load-bearing for the claimed applicability to CNN losses.

Authors: Section 5 provides the details. Definition 5.2 gives an explicit, polynomial-time mapping that encodes any continuous PA function as the training loss of a shallow ReLU CNN (one hidden layer) by placing the affine pieces into the weight and bias parameters. Proposition 5.3 proves that the mapping preserves continuity and piecewise affinity and that first-order stationarity at a point in weight space is equivalent to stationarity for the original PA instance. The hardness statements therefore transfer verbatim. revision: no

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper establishes XP algorithms, W[1]-hardness, and ETH lower bounds for approximate stationarity testing of PA functions (parameterized by dimension d) and extends the hardness to shallow ReLU CNN losses. These rest on standard parameterized-complexity machinery and an external reference to Tian and So (SODA 2025) for classical intractability; the stationarity definition and encoding length 'size' are taken as given from that prior work. No derivation reduces by construction to a fitted quantity, self-referential definition, or load-bearing self-citation chain. The central claims remain independent of the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the Exponential Time Hypothesis (standard in fine-grained complexity) and on the definition of continuous piecewise-affine functions and approximate stationarity already used in the cited SODA 2025 paper; no new free parameters or invented entities are introduced.

axioms (1)

domain assumption Exponential Time Hypothesis (ETH)
Invoked to obtain the size^{o(d)} lower bound; standard assumption in the field, cited in the abstract.

pith-pipeline@v0.9.0 · 5777 in / 1467 out tokens · 45400 ms · 2026-05-25T06:05:46.628206+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

Komei Fukuda

ICLR 2026 poster. Komei Fukuda. Polyhedral computation,

work page 2026
[2]

Subgradient Regularization: A descent-oriented subgradient method for nonsmooth optimization.arXiv preprint arXiv:2505.07143,

Hanyang Li and Ying Cui. Subgradient Regularization: A descent-oriented subgradient method for nonsmooth optimization.arXiv preprint arXiv:2505.07143,

work page arXiv
[3]

Subgradient Regularization: A descent-oriented subgradient method for nonsmooth optimization.arXiv preprint arXiv:2505.07143,

doi: 10.48550/arXiv.2505.07143. URLhttps://arxiv.org/abs/2505.07143. Jiajin Li, Anthony Man-Cho So, and Wing-Kin Ma. Understanding notions of stationarity in nonsmooth optimization: A guided tour of various constructions of subdifferential for nonsmooth functions.IEEE Signal Processing Magazine, 37(5):18–31, September

work page doi:10.48550/arxiv.2505.07143
[4]

Testing approximate stationarity concepts for piecewise affine functions and extensions

Lai Tian and Anthony Man-Cho So. Testing approximate stationarity concepts for piecewise affine functions and extensions. NeurIPS 2023 Workshop on Optimization for Machine Learning (OPT), poster,

work page 2023
[5]

Testing approximate stationarity concepts for piecewise affine functions

Lai Tian and Anthony Man-Cho So. Testing approximate stationarity concepts for piecewise affine functions. InProceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2213–2224. SIAM,

work page 2025
[6]

The general definitions follow the standard nonsmooth-analysis viewpoint; see, for example, Clarke [1990], Rockafellar and Wets

14 A Supplementary material for Section 2 A.1 A geometric guide to Clarke and Fréchet stationarity This subsection gives a geometric reading of the two stationarity notions used in the paper. The general definitions follow the standard nonsmooth-analysis viewpoint; see, for example, Clarke [1990], Rockafellar and Wets

work page 1990
[7]

and the monograph of Cui and Pang [2021]. The key distinction is that Fréchet stationarity is governed by first-order local lower supports centered at the base point, whereas Clarke stationarity, for locally Lipschitz functions, is governed by the convex hull of limiting gradients from nearby differentiability points. General definitions.For an extended-r...

work page 2021
[8]

This example separates the two notions without relying on an empty Fréchet subdifferential

Therefore ∂Ff(0) ={0}⊊conv{0, e 1, e2}=∂ Cf(0). This example separates the two notions without relying on an empty Fréchet subdifferential. A.2 Deferred proofs for local PA calculus Proof of Lemma 2.3.Take a finite polyhedral partition on whichf is affine. After translating the base point to the origin, each cell whose closure containsw is locally describ...

work page 1975
[9]

We record only the projection forms needed below

for the rational-data ellipsoid framework. We record only the projection forms needed below. Lemma B.3(Exact polynomial-time comparison for rational quadratic subproblems).LetQ⪰ 0, q,r,A,b, andτbe rational input data. The threshold decision problem inf{z ⊤Qz+ 2q ⊤z+r:Az≤b} ≤τ can be decided in polynomial time in the binary encoding length of the displayed...

work page 1980
[10]

Then ψα = σFα

Proof.Forα= (i, j, u, v)∈ A G, let Fα :=Q× · · · ×Q× {p u} ×Q× · · · ×Q× {p v} ×Q× · · · ×Q⊆Q k, where the singleton factors occur in blocksi and j. Then ψα = σFα. By the support-function calculus for polytopes [Ziegler, 1995], the maximum of support functions is the support function of the convex hull of the union, and therefore h= max α σFα =σ conv(∪αFα...

work page 1995
[11]

To obtain Clarke hardness, we use the seesaw gadget introduced by [Tian and So, 2025], where a scalar variable t and the branch−|t|/2are added

The local-minimality statement follows from Lemma 2.3, sincefF is continuous PA and0∈∂ FfF (0)is equivalent to local minimality at the origin. To obtain Clarke hardness, we use the seesaw gadget introduced by [Tian and So, 2025], where a scalar variable t and the branch−|t|/2are added. In the no-clique case,fF ≡ 0, so the gadget is just t/2. In the clique...

work page 2025
[12]

For PA functions, the Clarke subdifferential at the origin is the convex hull of gradients on full-dimensional adjacent cells

Hence the affine piece with gradient0is attained on a full-dimensional region adjacent to the origin. For PA functions, the Clarke subdifferential at the origin is the convex hull of gradients on full-dimensional adjacent cells. Therefore0∈∂ CfC(0,0). It remains to check that the hard functions just constructed are given efficiently in the two input model...

work page 2015

[1] [1]

Komei Fukuda

ICLR 2026 poster. Komei Fukuda. Polyhedral computation,

work page 2026

[2] [2]

Subgradient Regularization: A descent-oriented subgradient method for nonsmooth optimization.arXiv preprint arXiv:2505.07143,

Hanyang Li and Ying Cui. Subgradient Regularization: A descent-oriented subgradient method for nonsmooth optimization.arXiv preprint arXiv:2505.07143,

work page arXiv

[3] [3]

Subgradient Regularization: A descent-oriented subgradient method for nonsmooth optimization.arXiv preprint arXiv:2505.07143,

doi: 10.48550/arXiv.2505.07143. URLhttps://arxiv.org/abs/2505.07143. Jiajin Li, Anthony Man-Cho So, and Wing-Kin Ma. Understanding notions of stationarity in nonsmooth optimization: A guided tour of various constructions of subdifferential for nonsmooth functions.IEEE Signal Processing Magazine, 37(5):18–31, September

work page doi:10.48550/arxiv.2505.07143

[4] [4]

Testing approximate stationarity concepts for piecewise affine functions and extensions

Lai Tian and Anthony Man-Cho So. Testing approximate stationarity concepts for piecewise affine functions and extensions. NeurIPS 2023 Workshop on Optimization for Machine Learning (OPT), poster,

work page 2023

[5] [5]

Testing approximate stationarity concepts for piecewise affine functions

Lai Tian and Anthony Man-Cho So. Testing approximate stationarity concepts for piecewise affine functions. InProceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2213–2224. SIAM,

work page 2025

[6] [6]

The general definitions follow the standard nonsmooth-analysis viewpoint; see, for example, Clarke [1990], Rockafellar and Wets

14 A Supplementary material for Section 2 A.1 A geometric guide to Clarke and Fréchet stationarity This subsection gives a geometric reading of the two stationarity notions used in the paper. The general definitions follow the standard nonsmooth-analysis viewpoint; see, for example, Clarke [1990], Rockafellar and Wets

work page 1990

[7] [7]

and the monograph of Cui and Pang [2021]. The key distinction is that Fréchet stationarity is governed by first-order local lower supports centered at the base point, whereas Clarke stationarity, for locally Lipschitz functions, is governed by the convex hull of limiting gradients from nearby differentiability points. General definitions.For an extended-r...

work page 2021

[8] [8]

This example separates the two notions without relying on an empty Fréchet subdifferential

Therefore ∂Ff(0) ={0}⊊conv{0, e 1, e2}=∂ Cf(0). This example separates the two notions without relying on an empty Fréchet subdifferential. A.2 Deferred proofs for local PA calculus Proof of Lemma 2.3.Take a finite polyhedral partition on whichf is affine. After translating the base point to the origin, each cell whose closure containsw is locally describ...

work page 1975

[9] [9]

We record only the projection forms needed below

for the rational-data ellipsoid framework. We record only the projection forms needed below. Lemma B.3(Exact polynomial-time comparison for rational quadratic subproblems).LetQ⪰ 0, q,r,A,b, andτbe rational input data. The threshold decision problem inf{z ⊤Qz+ 2q ⊤z+r:Az≤b} ≤τ can be decided in polynomial time in the binary encoding length of the displayed...

work page 1980

[10] [10]

Then ψα = σFα

Proof.Forα= (i, j, u, v)∈ A G, let Fα :=Q× · · · ×Q× {p u} ×Q× · · · ×Q× {p v} ×Q× · · · ×Q⊆Q k, where the singleton factors occur in blocksi and j. Then ψα = σFα. By the support-function calculus for polytopes [Ziegler, 1995], the maximum of support functions is the support function of the convex hull of the union, and therefore h= max α σFα =σ conv(∪αFα...

work page 1995

[11] [11]

To obtain Clarke hardness, we use the seesaw gadget introduced by [Tian and So, 2025], where a scalar variable t and the branch−|t|/2are added

The local-minimality statement follows from Lemma 2.3, sincefF is continuous PA and0∈∂ FfF (0)is equivalent to local minimality at the origin. To obtain Clarke hardness, we use the seesaw gadget introduced by [Tian and So, 2025], where a scalar variable t and the branch−|t|/2are added. In the no-clique case,fF ≡ 0, so the gadget is just t/2. In the clique...

work page 2025

[12] [12]

For PA functions, the Clarke subdifferential at the origin is the convex hull of gradients on full-dimensional adjacent cells

Hence the affine piece with gradient0is attained on a full-dimensional region adjacent to the origin. For PA functions, the Clarke subdifferential at the origin is the convex hull of gradients on full-dimensional adjacent cells. Therefore0∈∂ CfC(0,0). It remains to check that the hard functions just constructed are given efficiently in the two input model...

work page 2015