Learning Decision-Sufficient Representations for Linear Optimization
Pith reviewed 2026-05-25 06:34 UTC · model grok-4.3
The pith
Pointwise-sufficient datasets of size at most the decision dimension d* recover optimal linear-program solutions with PAC failure probability scaling as d*/n.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We establish hardness results showing that computing d* is NP-hard and deciding whether a dataset is globally sufficient is coNP-hard. Under nondegeneracy, we provide a polynomial-time cutting-plane algorithm for constructing pointwise-sufficient decision datasets. In a data-driven regime with i.i.d. costs, we further propose a cumulative algorithm that aggregates decision-relevant directions across samples, yielding a stable compression scheme of size at most d*. This leads to a distribution-free PAC guarantee: with high probability over the training sample, the pointwise sufficiency failure probability on a fresh draw is at most O~(d*/n), and this rate is tight up to logarithmic factors.
What carries the argument
The intrinsic decision-relevant dimension d* together with the cutting-plane procedure that builds pointwise-sufficient datasets by successively separating violated cost vectors.
If this is right
- A polynomial-time algorithm exists for building pointwise-sufficient datasets under nondegeneracy.
- A compression scheme of size at most d* is stable across samples and yields the stated PAC bound.
- In contextual linear optimization the generalization error of the learned predictor scales as O~(sqrt(d*/n)) instead of O~(sqrt(d/n)).
- The PAC rate is tight up to logarithmic factors, so the dependence on d* cannot be improved in the worst case.
Where Pith is reading between the lines
- The same geometric separation idea could be tested on other parametric optimization families whose feasible sets admit analogous decision-relevant subspaces.
- When the nondegeneracy condition can be verified on a given sample, practitioners obtain an explicit certificate that the compressed dataset is sufficient for every realized cost.
Load-bearing premise
The cost vectors must satisfy a nondegeneracy condition so that the cutting-plane algorithm terminates in polynomial time.
What would settle it
An explicit nondegenerate instance in which the cutting-plane output fails to recover the optimal decision for at least one cost vector in the prior set C.
Figures
read the original abstract
We study how to construct compressed datasets that suffice to recover optimal decisions in linear programs with an unknown cost vector $c$ lying in a prior set $\mathcal{C}$. Recent work by Bennouna et al. provides an exact geometric characterization of sufficient decision datasets (SDDs) via an intrinsic decision-relevant dimension $d^\star$. However, their algorithm for constructing minimum-size SDDs requires solving mixed-integer programs. In this paper, we establish hardness results showing that computing $d^\star$ is NP-hard and deciding whether a dataset is globally sufficient is coNP-hard, thereby resolving a recent open problem posed by Bennouna et al. To address this worst-case intractability, we introduce pointwise sufficiency, a relaxation that requires sufficiency for an individual cost vector. Under nondegeneracy, we provide a polynomial-time cutting-plane algorithm for constructing pointwise-sufficient decision datasets. In a data-driven regime with i.i.d.\ costs, we further propose a cumulative algorithm that aggregates decision-relevant directions across samples, yielding a stable compression scheme of size at most $d^\star$. This leads to a distribution-free PAC guarantee: with high probability over the training sample, the pointwise sufficiency failure probability on a fresh draw is at most $\tilde{O}(d^\star/n)$, and this rate is tight up to logarithmic factors. Finally, we apply decision-sufficient representations to contextual linear optimization, obtaining compressed predictors with generalization bounds scaling as $\tilde{O}(\sqrt{d^\star/n})$ rather than $\tilde{O}(\sqrt{d/n})$, where $d$ is the ambient cost dimension.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies construction of compressed datasets sufficient to recover optimal decisions for linear programs whose unknown cost vector lies in a prior set C. It resolves an open problem by proving that computing the intrinsic decision-relevant dimension d* is NP-hard and that deciding global sufficiency of a dataset is coNP-hard. Under a nondegeneracy assumption it gives a polynomial-time cutting-plane algorithm for pointwise-sufficient datasets; it then introduces a cumulative aggregation procedure that produces a stable compression of size at most d* and derives a distribution-free PAC bound of order O~(d*/n) on the pointwise-sufficiency failure probability. The same construction is applied to contextual linear optimization, replacing the ambient dimension d by d* in the generalization rate.
Significance. If the hardness proofs and the PAC derivation hold, the work supplies the first rigorous intractability results for the global problem and a practical route to dimension reduction via pointwise sufficiency. The distribution-free guarantee that improves from O~(sqrt(d/n)) to O~(sqrt(d*/n)) and the explicit credit to a stable compression scheme of size at most d* are concrete strengths. The results are independent of fitted parameters and rest on geometric definitions of d* rather than post-hoc choices.
major comments (2)
- [Abstract] Abstract (pointwise-sufficiency paragraph): the polynomial-time guarantee for the cutting-plane algorithm is stated to hold only under nondegeneracy, yet the manuscript supplies neither a quantitative characterization of how restrictive the assumption is nor a data-driven test for it; because the NP-hardness result already rules out a general polynomial algorithm, this assumption is load-bearing for the central constructive claim.
- [PAC guarantee paragraph] PAC guarantee paragraph: the claim that the O~(d*/n) rate is tight up to logarithmic factors requires an explicit matching lower-bound construction; without seeing the argument that produces the Omega(d*/n) term it is impossible to verify whether the tightness statement survives the same nondegeneracy condition used for the upper bound.
minor comments (1)
- [Abstract] Notation: the symbol d* is introduced geometrically from C but its precise definition (e.g., as the dimension of the union of certain cones) should be restated once in the abstract for readers who encounter the paper only through the abstract.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. We address the two major comments point by point below, indicating the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract (pointwise-sufficiency paragraph): the polynomial-time guarantee for the cutting-plane algorithm is stated to hold only under nondegeneracy, yet the manuscript supplies neither a quantitative characterization of how restrictive the assumption is nor a data-driven test for it; because the NP-hardness result already rules out a general polynomial algorithm, this assumption is load-bearing for the central constructive claim.
Authors: We agree that the nondegeneracy assumption is essential for the polynomial-time claim. In the revision we will add a dedicated paragraph (and a short appendix subsection) that (i) shows the assumption holds with probability 1 for cost vectors drawn from any absolutely continuous distribution and (ii) supplies a simple, data-driven diagnostic that checks whether the active constraint gradients at the sampled points are linearly independent. This will make the practical scope of the algorithm explicit. revision: partial
-
Referee: [PAC guarantee paragraph] PAC guarantee paragraph: the claim that the O~(d*/n) rate is tight up to logarithmic factors requires an explicit matching lower-bound construction; without seeing the argument that produces the Omega(d*/n) term it is impossible to verify whether the tightness statement survives the same nondegeneracy condition used for the upper bound.
Authors: Theorem 4.5 already contains an explicit lower-bound construction that produces an Omega(d*/n) term. The construction is stated for nondegenerate instances (by a standard perturbation argument that preserves optimality and linear independence of active gradients). In the revision we will (i) move the statement of the lower bound into the main text immediately after the upper-bound theorem, (ii) add an explicit remark confirming that the same nondegeneracy condition is used, and (iii) update the abstract to reference the theorem number. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's core contributions—NP-hardness and coNP-hardness results resolving an open problem from Bennouna et al., the definition of pointwise sufficiency, the cutting-plane algorithm under nondegeneracy, the cumulative aggregation scheme, and the PAC bound scaling as O~(d*/n)—rest on independent geometric characterizations of d* from the prior set C, new algorithmic constructions, and standard concentration arguments. d* is defined directly from C without reference to fitted outputs or predictions; no step reduces by construction to a self-citation, ansatz, or renamed empirical pattern. The nondegeneracy assumption is stated explicitly as a precondition rather than derived from the results themselves.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Nondegeneracy assumption on the linear programs
- domain assumption Costs are drawn i.i.d. from an unknown distribution
Reference graph
Works this paper leans on
-
[1]
Maria-Florina Balcan, Dan DeBlasio, Travis Dick, Carl Kingsford, Tuomas Sandholm, and Ellen Vitercik. How much data is sufficient to learn high-performing algorithms? Generalization guarantees for data-driven algorithm design.Journal of the ACM, 71(5):32:1–32:58, 2024a. Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning to br...
work page 2013
-
[2]
Omar Bennouna, Jiawei Zhang, Saurabh Amin, and Asuman E
arXiv:2505.21692. Omar Bennouna, Jiawei Zhang, Saurabh Amin, and Asuman E. Ozdaglar. Contextual optimization under model misspecification: A tractable and generalizable approach. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 3749–3775. PMLR, 2025b. Omar Bennouna, Amine ...
-
[3]
Yonatan Bilu and Nathan Linial
URLhttps://arxiv.org/abs/2602.15365. Yonatan Bilu and Nathan Linial. Are stable instances easy?Combinatorics, Probability and Computing, 21 (5):643–660,
-
[4]
Elmachtoub, Paul Grigas, and Ambuj Tewari
21 Othman El Balghiti, Adam N. Elmachtoub, Paul Grigas, and Ambuj Tewari. Generalization bounds in the predict-then-optimize framework.Mathematics of Operations Research, 48(4):2043–2065,
work page 2043
-
[5]
Andrew Guillory and Jeff A. Bilmes. Interactive submodular set cover. InProceedings of the 27th Inter- national Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, pages 415–422,
work page 2010
-
[6]
arXiv:2305.06584v2 (Jan 2025). Zakaria Mhammedi. Online convex optimization with a separation oracle. In Nika Haghtalab and Ankur Moitra, editors,Proceedings of the 38th Conference on Learning Theory (COLT), volume 291 ofProceed- ings of Machine Learning Research, pages 4033–4077. PMLR,
-
[7]
24 A Formal statement for hardness and other proofs for Section 4 A.1 Proof of Theorem 5 Theorem 30(Formal version of Theorem 5).Fix a coordinate index r∈[n] . The following decision problem is NP-hard: given a bounded polytope X⊆R n and a polyhedral uncertainty set C ⊆R n specified in H-representation, decide whether dim dir(X⋆(C))>dim dir(X ⋆ r (C)),(16...
work page 2025
-
[8]
+ 2(n−i+j) + (1 + 2(m−j)) = 2n+ 2m. (iv) The first statement follows since each variable gadget offers exactly two disjoint choices and each clause gadget offers exactly three disjoint choices. For the last statement, if bjk is false under xP , then P must have visited the opposite variable vertex (¯xi if ℓjk =x i, and xi if ℓjk = ¯xi), which is exactly t...
work page 2025
-
[9]
Given the instance (G, d, κ, B, r), let p:=|A| and index coordinates of Rp by arcs. Define the s–t unit-flow polytope X:= n x∈R p ≥0 : X a∈δ+(v) xa − X a∈δ−(v) xa = 1v=s, −1v=t, 0otherwise, o . Because G is acyclic, the extreme points of X are exactly the incidence vectors of s–t paths. Let Xr :={x∈ X:x r = 0}be the face of flows that avoid the re...
work page 2025
-
[10]
On the other hand, every u∈dir X ⋆ r (C) satisfies ur = 0, and therefore v /∈dir X ⋆ r (C) . Combining this with (18) yields dir X ⋆ r (C) ⊊dir X ⋆(C) , and thusdim dir(X ⋆(C))>dim dir(X ⋆ r (C)). (⇐)We prove the contrapositive. Assume φ is not satisfiable. Then, for C=C cl the corresponding PISPP-W+ instance is infeasible by Ley and Merkert (2025), and f...
work page 2025
-
[11]
Consequently, computing the size of a minimum pointwise SDD (and the corresponding search problem of finding one) is coNP-hard. TheH-in-Vpolytope containment problem.An instance consists of two polytopesP, Q⊆R d, where P={z∈R d :Hz≤h}andQ= conv{v 1, . . . , vM }, i.e., P is given in H-representationand Q is given in V -representation. The decision problem...
work page 1985
-
[12]
The proof has three ingredients: (i) we bound the complexity of the induced decision class x⋆ ◦ HU⋆,d⋆ via its Natarajan dimension, (ii) we plug this bound into the Natarajan-dimension generalization theorem for the SPO loss due to El Balghiti et al. (2023), and (iii) we show that under a global sufficient dataset, projecting to W⋆ and lifting back is dec...
work page 2023
-
[13]
Define the feature map Ψaff(ξ, x) := " ⃗ (L⊤ U⋆x)ξ⊤ c⊤ 0 x # ∈R d⋆p+1, w B,b := ⃗(B) b ∈R d⋆p+1
39 Proof.For anyx∈ X ∠, fB,b(ξ)⊤x=bc ⊤ 0 x+ LU⋆Bξ ⊤x=bc ⊤ 0 x+ D⃗(B),⃗ (L⊤ U⋆x)ξ⊤ E . Define the feature map Ψaff(ξ, x) := " ⃗ (L⊤ U⋆x)ξ⊤ c⊤ 0 x # ∈R d⋆p+1, w B,b := ⃗(B) b ∈R d⋆p+1. ThenF aff U⋆,d⋆ is a subset of the multiclass linear hypothesis class HΨaff :={ξ7→arg min x∈X ∠ ⟨w,Ψ aff(ξ, x)⟩:w∈R d⋆p+1}, because each fB,b induces the decision rule x⋆(fB,...
work page 2014
-
[14]
log(n|X ∠|2) n +ω X (C) r log(1/δ) 2n , simultaneously for allf∈ H U⋆,d⋆, whereω X (C) := supc∈C maxx∈X c⊤x−min x∈X c⊤x . Proof. This follows from the Natarajan-dimension generalization bound for SPO loss in El Balghiti et al. (2023). The SPO loss is uniformly bounded by ωX (C) when c∈ C . Using Lemma 42 and the inclusion HU⋆,d⋆ ⊆ Haff U⋆,d⋆ (take b= 1 ),...
work page 2023
-
[15]
Part (1) follows from Lemma 44 by taking f=f ⋆ ∈ H(C) : the compressed predictor ˆf⋆(ξ) = liftU⋆(U ⊤ ⋆ (f⋆(ξ)−c 0)) induces the same decisions as f⋆, hence has the same population SPO risk, and is a risk minimizer inH U⋆,d⋆(C). For part (2), apply Lemma 43, which gives a uniform generalization bound over the larger class HU⋆,d⋆ (and therefore also over it...
work page 2012
-
[16]
42 Thus, conditional onX,u ⊤gj isσ-subgaussian for every unit vectoru
Since {ϵi,j}nµ i=1 are independent and each is σ-subgaussian, we have for allλ∈R, E h exp λ u⊤gj |X i = nµY i=1 E[exp(λaiϵi,j)|X]≤ nµY i=1 exp(λ2σ2a2 i /2) = exp(λ2σ2/2). 42 Thus, conditional onX,u ⊤gj isσ-subgaussian for every unit vectoru. Let N be a 1/2-net of the Euclidean unit sphere in Rp with |N | ≤5 p (Vershynin, 2018). For any fixed u∈ N and any ...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.