pith. sign in

arxiv: 2603.18551 · v2 · pith:Z2M5L2PEnew · submitted 2026-03-19 · 🧮 math.OC · cs.CC· cs.LG

Learning Decision-Sufficient Representations for Linear Optimization

Pith reviewed 2026-05-25 06:34 UTC · model grok-4.3

classification 🧮 math.OC cs.CCcs.LG
keywords decision-sufficient datasetslinear optimizationpointwise sufficiencyPAC guaranteesdata compressioncontextual optimizationcutting-plane algorithms
0
0 comments X

The pith

Pointwise-sufficient datasets of size at most the decision dimension d* recover optimal linear-program solutions with PAC failure probability scaling as d*/n.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that computing the intrinsic decision-relevant dimension d* is NP-hard and that verifying global sufficiency of a dataset is coNP-hard. It relaxes the requirement to pointwise sufficiency for individual cost vectors and supplies a cutting-plane algorithm that constructs such datasets in polynomial time when a nondegeneracy condition holds. In an i.i.d. data-driven regime the same machinery produces a stable compression whose size never exceeds d* and whose pointwise failure probability on new draws is at most order d*/n with high probability. The resulting representations are then used inside contextual linear optimization, replacing the usual sqrt(d/n) generalization rate with the tighter sqrt(d*/n) rate.

Core claim

We establish hardness results showing that computing d* is NP-hard and deciding whether a dataset is globally sufficient is coNP-hard. Under nondegeneracy, we provide a polynomial-time cutting-plane algorithm for constructing pointwise-sufficient decision datasets. In a data-driven regime with i.i.d. costs, we further propose a cumulative algorithm that aggregates decision-relevant directions across samples, yielding a stable compression scheme of size at most d*. This leads to a distribution-free PAC guarantee: with high probability over the training sample, the pointwise sufficiency failure probability on a fresh draw is at most O~(d*/n), and this rate is tight up to logarithmic factors.

What carries the argument

The intrinsic decision-relevant dimension d* together with the cutting-plane procedure that builds pointwise-sufficient datasets by successively separating violated cost vectors.

If this is right

  • A polynomial-time algorithm exists for building pointwise-sufficient datasets under nondegeneracy.
  • A compression scheme of size at most d* is stable across samples and yields the stated PAC bound.
  • In contextual linear optimization the generalization error of the learned predictor scales as O~(sqrt(d*/n)) instead of O~(sqrt(d/n)).
  • The PAC rate is tight up to logarithmic factors, so the dependence on d* cannot be improved in the worst case.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same geometric separation idea could be tested on other parametric optimization families whose feasible sets admit analogous decision-relevant subspaces.
  • When the nondegeneracy condition can be verified on a given sample, practitioners obtain an explicit certificate that the compressed dataset is sufficient for every realized cost.

Load-bearing premise

The cost vectors must satisfy a nondegeneracy condition so that the cutting-plane algorithm terminates in polynomial time.

What would settle it

An explicit nondegenerate instance in which the cutting-plane output fails to recover the optimal decision for at least one cost vector in the prior set C.

Figures

Figures reproduced from arXiv: 2603.18551 by Asuman Ozdaglar, Saurabh Amin, Yuhan Ye.

Figure 1
Figure 1. Figure 1: Stage I. Learned dimension t = dim(Wˆ ) (mean over 10 trials). 19 [PITH_FULL_IMAGE:figures/full_fig_p019_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Stage II. SPO risk vs. number of labeled samples (mean [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
read the original abstract

We study how to construct compressed datasets that suffice to recover optimal decisions in linear programs with an unknown cost vector $c$ lying in a prior set $\mathcal{C}$. Recent work by Bennouna et al. provides an exact geometric characterization of sufficient decision datasets (SDDs) via an intrinsic decision-relevant dimension $d^\star$. However, their algorithm for constructing minimum-size SDDs requires solving mixed-integer programs. In this paper, we establish hardness results showing that computing $d^\star$ is NP-hard and deciding whether a dataset is globally sufficient is coNP-hard, thereby resolving a recent open problem posed by Bennouna et al. To address this worst-case intractability, we introduce pointwise sufficiency, a relaxation that requires sufficiency for an individual cost vector. Under nondegeneracy, we provide a polynomial-time cutting-plane algorithm for constructing pointwise-sufficient decision datasets. In a data-driven regime with i.i.d.\ costs, we further propose a cumulative algorithm that aggregates decision-relevant directions across samples, yielding a stable compression scheme of size at most $d^\star$. This leads to a distribution-free PAC guarantee: with high probability over the training sample, the pointwise sufficiency failure probability on a fresh draw is at most $\tilde{O}(d^\star/n)$, and this rate is tight up to logarithmic factors. Finally, we apply decision-sufficient representations to contextual linear optimization, obtaining compressed predictors with generalization bounds scaling as $\tilde{O}(\sqrt{d^\star/n})$ rather than $\tilde{O}(\sqrt{d/n})$, where $d$ is the ambient cost dimension.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper studies construction of compressed datasets sufficient to recover optimal decisions for linear programs whose unknown cost vector lies in a prior set C. It resolves an open problem by proving that computing the intrinsic decision-relevant dimension d* is NP-hard and that deciding global sufficiency of a dataset is coNP-hard. Under a nondegeneracy assumption it gives a polynomial-time cutting-plane algorithm for pointwise-sufficient datasets; it then introduces a cumulative aggregation procedure that produces a stable compression of size at most d* and derives a distribution-free PAC bound of order O~(d*/n) on the pointwise-sufficiency failure probability. The same construction is applied to contextual linear optimization, replacing the ambient dimension d by d* in the generalization rate.

Significance. If the hardness proofs and the PAC derivation hold, the work supplies the first rigorous intractability results for the global problem and a practical route to dimension reduction via pointwise sufficiency. The distribution-free guarantee that improves from O~(sqrt(d/n)) to O~(sqrt(d*/n)) and the explicit credit to a stable compression scheme of size at most d* are concrete strengths. The results are independent of fitted parameters and rest on geometric definitions of d* rather than post-hoc choices.

major comments (2)
  1. [Abstract] Abstract (pointwise-sufficiency paragraph): the polynomial-time guarantee for the cutting-plane algorithm is stated to hold only under nondegeneracy, yet the manuscript supplies neither a quantitative characterization of how restrictive the assumption is nor a data-driven test for it; because the NP-hardness result already rules out a general polynomial algorithm, this assumption is load-bearing for the central constructive claim.
  2. [PAC guarantee paragraph] PAC guarantee paragraph: the claim that the O~(d*/n) rate is tight up to logarithmic factors requires an explicit matching lower-bound construction; without seeing the argument that produces the Omega(d*/n) term it is impossible to verify whether the tightness statement survives the same nondegeneracy condition used for the upper bound.
minor comments (1)
  1. [Abstract] Notation: the symbol d* is introduced geometrically from C but its precise definition (e.g., as the dimension of the union of certain cones) should be restated once in the abstract for readers who encounter the paper only through the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. We address the two major comments point by point below, indicating the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract (pointwise-sufficiency paragraph): the polynomial-time guarantee for the cutting-plane algorithm is stated to hold only under nondegeneracy, yet the manuscript supplies neither a quantitative characterization of how restrictive the assumption is nor a data-driven test for it; because the NP-hardness result already rules out a general polynomial algorithm, this assumption is load-bearing for the central constructive claim.

    Authors: We agree that the nondegeneracy assumption is essential for the polynomial-time claim. In the revision we will add a dedicated paragraph (and a short appendix subsection) that (i) shows the assumption holds with probability 1 for cost vectors drawn from any absolutely continuous distribution and (ii) supplies a simple, data-driven diagnostic that checks whether the active constraint gradients at the sampled points are linearly independent. This will make the practical scope of the algorithm explicit. revision: partial

  2. Referee: [PAC guarantee paragraph] PAC guarantee paragraph: the claim that the O~(d*/n) rate is tight up to logarithmic factors requires an explicit matching lower-bound construction; without seeing the argument that produces the Omega(d*/n) term it is impossible to verify whether the tightness statement survives the same nondegeneracy condition used for the upper bound.

    Authors: Theorem 4.5 already contains an explicit lower-bound construction that produces an Omega(d*/n) term. The construction is stated for nondegenerate instances (by a standard perturbation argument that preserves optimality and linear independence of active gradients). In the revision we will (i) move the statement of the lower bound into the main text immediately after the upper-bound theorem, (ii) add an explicit remark confirming that the same nondegeneracy condition is used, and (iii) update the abstract to reference the theorem number. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's core contributions—NP-hardness and coNP-hardness results resolving an open problem from Bennouna et al., the definition of pointwise sufficiency, the cutting-plane algorithm under nondegeneracy, the cumulative aggregation scheme, and the PAC bound scaling as O~(d*/n)—rest on independent geometric characterizations of d* from the prior set C, new algorithmic constructions, and standard concentration arguments. d* is defined directly from C without reference to fitted outputs or predictions; no step reduces by construction to a self-citation, ansatz, or renamed empirical pattern. The nondegeneracy assumption is stated explicitly as a precondition rather than derived from the results themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claims rest on the nondegeneracy assumption for the cutting-plane algorithm and the i.i.d. sampling model for the PAC bound; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Nondegeneracy assumption on the linear programs
    Required for the polynomial-time cutting-plane algorithm to construct pointwise-sufficient datasets (abstract).
  • domain assumption Costs are drawn i.i.d. from an unknown distribution
    Used to obtain the distribution-free PAC guarantee of O~(d*/n).

pith-pipeline@v0.9.0 · 5822 in / 1422 out tokens · 22731 ms · 2026-05-25T06:34:30.941866+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    How much data is sufficient to learn high-performing algorithms? Generalization guarantees for data-driven algorithm design.Journal of the ACM, 71(5):32:1–32:58, 2024a

    Maria-Florina Balcan, Dan DeBlasio, Travis Dick, Carl Kingsford, Tuomas Sandholm, and Ellen Vitercik. How much data is sufficient to learn high-performing algorithms? Generalization guarantees for data-driven algorithm design.Journal of the ACM, 71(5):32:1–32:58, 2024a. Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning to br...

  2. [2]

    Omar Bennouna, Jiawei Zhang, Saurabh Amin, and Asuman E

    arXiv:2505.21692. Omar Bennouna, Jiawei Zhang, Saurabh Amin, and Asuman E. Ozdaglar. Contextual optimization under model misspecification: A tractable and generalizable approach. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 3749–3775. PMLR, 2025b. Omar Bennouna, Amine ...

  3. [3]

    Yonatan Bilu and Nathan Linial

    URLhttps://arxiv.org/abs/2602.15365. Yonatan Bilu and Nathan Linial. Are stable instances easy?Combinatorics, Probability and Computing, 21 (5):643–660,

  4. [4]

    Elmachtoub, Paul Grigas, and Ambuj Tewari

    21 Othman El Balghiti, Adam N. Elmachtoub, Paul Grigas, and Ambuj Tewari. Generalization bounds in the predict-then-optimize framework.Mathematics of Operations Research, 48(4):2043–2065,

  5. [5]

    Andrew Guillory and Jeff A. Bilmes. Interactive submodular set cover. InProceedings of the 27th Inter- national Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, pages 415–422,

  6. [6]

    Zakaria Mhammedi

    arXiv:2305.06584v2 (Jan 2025). Zakaria Mhammedi. Online convex optimization with a separation oracle. In Nika Haghtalab and Ankur Moitra, editors,Proceedings of the 38th Conference on Learning Theory (COLT), volume 291 ofProceed- ings of Machine Learning Research, pages 4033–4077. PMLR,

  7. [7]

    r or one shortcut

    24 A Formal statement for hardness and other proofs for Section 4 A.1 Proof of Theorem 5 Theorem 30(Formal version of Theorem 5).Fix a coordinate index r∈[n] . The following decision problem is NP-hard: given a bounded polytope X⊆R n and a polyhedral uncertainty set C ⊆R n specified in H-representation, decide whether dim dir(X⋆(C))>dim dir(X ⋆ r (C)),(16...

  8. [8]

    (iv) The first statement follows since each variable gadget offers exactly two disjoint choices and each clause gadget offers exactly three disjoint choices

    + 2(n−i+j) + (1 + 2(m−j)) = 2n+ 2m. (iv) The first statement follows since each variable gadget offers exactly two disjoint choices and each clause gadget offers exactly three disjoint choices. For the last statement, if bjk is false under xP , then P must have visited the opposite variable vertex (¯xi if ℓjk =x i, and xi if ℓjk = ¯xi), which is exactly t...

  9. [9]

    Define the s–t unit-flow polytope X:= n x∈R p ≥0 : X a∈δ+(v) xa − X a∈δ−(v) xa =    1v=s, −1v=t, 0otherwise, o

    Given the instance (G, d, κ, B, r), let p:=|A| and index coordinates of Rp by arcs. Define the s–t unit-flow polytope X:= n x∈R p ≥0 : X a∈δ+(v) xa − X a∈δ−(v) xa =    1v=s, −1v=t, 0otherwise, o . Because G is acyclic, the extreme points of X are exactly the incidence vectors of s–t paths. Let Xr :={x∈ X:x r = 0}be the face of flows that avoid the re...

  10. [10]

    Combining this with (18) yields dir X ⋆ r (C) ⊊dir X ⋆(C) , and thusdim dir(X ⋆(C))>dim dir(X ⋆ r (C))

    On the other hand, every u∈dir X ⋆ r (C) satisfies ur = 0, and therefore v /∈dir X ⋆ r (C) . Combining this with (18) yields dir X ⋆ r (C) ⊊dir X ⋆(C) , and thusdim dir(X ⋆(C))>dim dir(X ⋆ r (C)). (⇐)We prove the contrapositive. Assume φ is not satisfiable. Then, for C=C cl the corresponding PISPP-W+ instance is infeasible by Ley and Merkert (2025), and f...

  11. [11]

    TheH-in-Vpolytope containment problem.An instance consists of two polytopesP, Q⊆R d, where P={z∈R d :Hz≤h}andQ= conv{v 1,

    Consequently, computing the size of a minimum pointwise SDD (and the corresponding search problem of finding one) is coNP-hard. TheH-in-Vpolytope containment problem.An instance consists of two polytopesP, Q⊆R d, where P={z∈R d :Hz≤h}andQ= conv{v 1, . . . , vM }, i.e., P is given in H-representationand Q is given in V -representation. The decision problem...

  12. [12]

    The proof has three ingredients: (i) we bound the complexity of the induced decision class x⋆ ◦ HU⋆,d⋆ via its Natarajan dimension, (ii) we plug this bound into the Natarajan-dimension generalization theorem for the SPO loss due to El Balghiti et al. (2023), and (iii) we show that under a global sufficient dataset, projecting to W⋆ and lifting back is dec...

  13. [13]

    Define the feature map Ψaff(ξ, x) := " ⃗ (L⊤ U⋆x)ξ⊤ c⊤ 0 x # ∈R d⋆p+1, w B,b := ⃗(B) b ∈R d⋆p+1

    39 Proof.For anyx∈ X ∠, fB,b(ξ)⊤x=bc ⊤ 0 x+ LU⋆Bξ ⊤x=bc ⊤ 0 x+ D⃗(B),⃗ (L⊤ U⋆x)ξ⊤ E . Define the feature map Ψaff(ξ, x) := " ⃗ (L⊤ U⋆x)ξ⊤ c⊤ 0 x # ∈R d⋆p+1, w B,b := ⃗(B) b ∈R d⋆p+1. ThenF aff U⋆,d⋆ is a subset of the multiclass linear hypothesis class HΨaff :={ξ7→arg min x∈X ∠ ⟨w,Ψ aff(ξ, x)⟩:w∈R d⋆p+1}, because each fB,b induces the decision rule x⋆(fB,...

  14. [14]

    log(n|X ∠|2) n +ω X (C) r log(1/δ) 2n , simultaneously for allf∈ H U⋆,d⋆, whereω X (C) := supc∈C maxx∈X c⊤x−min x∈X c⊤x . Proof. This follows from the Natarajan-dimension generalization bound for SPO loss in El Balghiti et al. (2023). The SPO loss is uniformly bounded by ωX (C) when c∈ C . Using Lemma 42 and the inclusion HU⋆,d⋆ ⊆ Haff U⋆,d⋆ (take b= 1 ),...

  15. [15]

    For part (2), apply Lemma 43, which gives a uniform generalization bound over the larger class HU⋆,d⋆ (and therefore also over its subsetH U⋆,d⋆(C))

    Part (1) follows from Lemma 44 by taking f=f ⋆ ∈ H(C) : the compressed predictor ˆf⋆(ξ) = liftU⋆(U ⊤ ⋆ (f⋆(ξ)−c 0)) induces the same decisions as f⋆, hence has the same population SPO risk, and is a risk minimizer inH U⋆,d⋆(C). For part (2), apply Lemma 43, which gives a uniform generalization bound over the larger class HU⋆,d⋆ (and therefore also over it...

  16. [16]

    42 Thus, conditional onX,u ⊤gj isσ-subgaussian for every unit vectoru

    Since {ϵi,j}nµ i=1 are independent and each is σ-subgaussian, we have for allλ∈R, E h exp λ u⊤gj |X i = nµY i=1 E[exp(λaiϵi,j)|X]≤ nµY i=1 exp(λ2σ2a2 i /2) = exp(λ2σ2/2). 42 Thus, conditional onX,u ⊤gj isσ-subgaussian for every unit vectoru. Let N be a 1/2-net of the Euclidean unit sphere in Rp with |N | ≤5 p (Vershynin, 2018). For any fixed u∈ N and any ...