Decision-Focused Learning via Tangent-Space Projection of Prediction Error
Pith reviewed 2026-05-20 23:34 UTC · model grok-4.3
The pith
The regret gradient equals the prediction error projected onto the tangent space of active constraints and scaled by local curvature.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under standard regularity with locally stable active constraints, the regret gradient admits a closed-form geometric characterization equivalent to the prediction error projected onto the tangent space of active constraints scaled by local curvature. This shows that regret gradients arise by filtering decision-irrelevant components from the MSE gradient. The authors therefore introduce a procedure that solves a reduced linear system over the active constraints to obtain the gradient directly.
What carries the argument
Tangent-space projection of the prediction error onto active constraints, which filters decision-irrelevant directions and scales the remainder by local curvature to recover the exact regret gradient.
If this is right
- Regret gradients can be obtained without differentiating through solver iterations or running extra optimization solves.
- A single reduced linear system over the active constraints is sufficient to produce the gradient.
- The resulting predictor achieves higher downstream decision quality than existing baselines on linear and quadratic programs.
- Performance advantages remain even when the underlying constraint set shifts after training.
- Filtering decision-irrelevant components from the mean-squared-error gradient produces the true regret signal.
Where Pith is reading between the lines
- The same projection idea could be applied to other bilevel problems where only a subset of constraints is active at optimality.
- If active sets change rapidly across samples, the cost of identifying the active set might offset some of the computational savings.
- The approach suggests a general recipe for turning any differentiable prediction loss into a decision-aware loss by an inexpensive geometric filter.
Load-bearing premise
Active constraints must remain locally stable and the optimization problem must satisfy standard regularity conditions so that the projection exactly recovers the regret gradient.
What would settle it
On a small linear program with known active set, compute the PEAR gradient from the reduced system and check whether it matches the regret gradient obtained by finite-difference perturbation of the prediction or by full automatic differentiation through the solver.
Figures
read the original abstract
Decision-Focused Learning (DFL) trains predictors to improve downstream decision quality, but computing regret gradients typically requires differentiating through solvers or relying on surrogate losses, which can be computationally expensive or deviate from the true objective. We show that, under standard regularity with locally stable active constraints, the regret gradient admits a closed-form geometric characterization, equivalent to the prediction error projected onto the tangent space of active constraints, scaled by local curvature. This reveals that regret gradients can be obtained by filtering decision-irrelevant components from the MSE gradient, providing a simpler and more direct alternative to existing approaches. Based on this, we propose PEAR (Projected Error As Regret-gradient), which computes regret gradients via a reduced linear system over active constraints, avoiding differentiation through solver iterations or additional optimization solves. Experiments on LP benchmarks and a real-world QP task show that PEAR achieves the best decision quality among all baselines while being the most computationally efficient, with gains that persist under constraint shifts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that, under standard regularity assumptions with locally stable active constraints, the regret gradient in decision-focused learning admits a closed-form geometric characterization: the prediction error projected onto the tangent space of the active constraints and scaled by local curvature. This yields the PEAR method, which obtains regret gradients from a reduced linear system over active constraints without differentiating through solvers or additional optimizations. Experiments on LP benchmarks and a real-world QP task report that PEAR achieves the highest decision quality among baselines while being the most computationally efficient, with gains persisting under constraint shifts.
Significance. If the derivation is correct and the local-stability assumption holds in the reported regimes, the work supplies a direct, low-overhead route to true regret gradients that filters decision-irrelevant components from the MSE gradient. The geometric insight could simplify implementation and analysis in optimization-based learning pipelines.
major comments (1)
- [Experiments] The equivalence between the tangent-space projection and the true regret gradient is load-bearing on the assumption of locally stable active constraints (KKT system insensitive to small parameter perturbations). The Experiments section reports no diagnostic—e.g., frequency of active-set changes during training or under observed prediction errors—for the LP or QP benchmarks; without such verification the claimed closed-form characterization may not equal the regret gradient when the active set flips.
minor comments (2)
- [Abstract] The abstract and introduction should explicitly list the regularity conditions (e.g., LICQ, strict complementarity, second-order sufficient conditions) rather than referring only to “standard regularity.”
- [Method] Notation for the reduced linear system (PEAR) should be cross-referenced to the KKT stationarity conditions so readers can trace the projection and curvature scaling without ambiguity.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for highlighting the importance of verifying the locally stable active constraints assumption. We address the major comment below and commit to revisions that will strengthen the empirical support for our claims.
read point-by-point responses
-
Referee: [Experiments] The equivalence between the tangent-space projection and the true regret gradient is load-bearing on the assumption of locally stable active constraints (KKT system insensitive to small parameter perturbations). The Experiments section reports no diagnostic—e.g., frequency of active-set changes during training or under observed prediction errors—for the LP or QP benchmarks; without such verification the claimed closed-form characterization may not equal the regret gradient when the active set flips.
Authors: We agree that explicit verification of active-set stability is valuable to confirm that the closed-form characterization equals the true regret gradient in the reported regimes. The theoretical derivation relies on the standard regularity assumption of locally stable active constraints, but the current Experiments section does not include diagnostics such as the frequency of active-set changes. In the revised manuscript we will add these diagnostics for both the LP benchmarks and the real-world QP task, reporting the rate of active-set flips during training and under the observed prediction errors. This will provide direct evidence that the assumption holds and that PEAR computes regret gradients without active-set changes affecting the results. revision: yes
Circularity Check
Derivation of regret gradient via tangent-space projection is self-contained first-principles result
full rationale
The paper derives the closed-form geometric characterization of the regret gradient directly from KKT conditions under the stated regularity assumptions with locally stable active constraints, showing equivalence to the prediction error projected onto the tangent space scaled by local curvature. This is obtained by filtering decision-irrelevant components from the MSE gradient and solving a reduced linear system, without any fitted parameters, self-citations for load-bearing uniqueness theorems, or renaming of known empirical patterns. The PEAR method follows as a direct computational consequence rather than a tautology or input-renamed output. No steps reduce by construction to the paper's own data or prior fitted results; the derivation remains independent and falsifiable via the provided LP/QP benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption standard regularity with locally stable active constraints
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the regret gradient admits a closed-form geometric characterization, equivalent to the prediction error projected onto the tangent space of active constraints, scaled by local curvature
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Under Assumptions 3.1–3.3, the regret gradient admits the closed form ∇_ĉR = P_H e
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., and Kolter, J. Z. Differentiable convex optimization lay- ers.Advances in neural information processing systems, 32, 2019a. Agrawal, A., Barratt, S., Boyd, S., Busseti, E., and Moursi, W. M. Differentiating through a cone program.arXiv preprint arXiv:1904.09043, 2019b. Amos, B. and Kolter, J. Z. O...
-
[2]
Bambade, A., Schramm, F., Taylor, A., and Carpentier, J. Leveraging augmented-lagrangian techniques for differ- entiating over infeasible quadratic programs in machine learning. InICLR 2024-The Twelfth International Con- ference on Learning Representations,
work page 2024
-
[3]
Hwang, Y ., Kong, Y ., Zohren, S., and Lee, Y . Decision- informed neural networks with large language model integration for portfolio optimization.arXiv preprint arXiv:2502.00828,
-
[4]
Mandi, J., Mahmuto ˘gulları, A. ˙I., Berden, S., and Guns, T. Minimizing surrogate losses for decision-focused learning using differentiable optimization.arXiv preprint arXiv:2508.11365,
-
[5]
Contrastive losses and so- lution caching for predict-and-optimize.arXiv preprint arXiv:2011.05354,
Mulamba, M., Mandi, J., Diligenti, M., Lombardi, M., Bucarey, V ., and Guns, T. Contrastive losses and so- lution caching for predict-and-optimize.arXiv preprint arXiv:2011.05354,
-
[6]
PyEPO: A PyTorch-based End-to-End Predict-then-Optimize Library for Linear and Integer Programming
Tang, B. and Khalil, E. B. Pyepo: A pytorch-based end-to- end predict-then-optimize library for linear and integer programming.arXiv preprint arXiv:2206.14234,
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
Restless multi-armed bandits for maternal and child health: Results from decision- focused learning
Verma, S., Mate, A., Wang, K., Madhiwalla, N., Hegde, A., Taneja, A., and Tambe, M. Restless multi-armed bandits for maternal and child health: Results from decision- focused learning. InProceedings of the 2023 Interna- tional Conference on Autonomous Agents and Multiagent Systems, pp. 1312–1320,
work page 2023
-
[8]
10 Decision-Focused Learning via Tangent-Space Projection of Prediction Error A. Additional Theory A.1. Regularity and Local Differentiability This appendix states standard conditions under which the solution mapping ˆc7→z ∗(ˆc)is locally single-valued and differentiable and the active set remains locally stable. Under Assumptions 3.1–3.3, classical sensi...
work page 1983
-
[9]
Under this assumption, the same Jacobian J and projection operator PH apply
Proof of Equation (15).Assume that the active set remains unchanged when replacing ˆcby ˆc+αδN for any scalar α. Under this assumption, the same Jacobian J and projection operator PH apply. By Theorem 3.5, the regret gradient is given by ∇ˆcR(ˆc;c) =PH(ˆc−c). 12 Decision-Focused Learning via Tangent-Space Projection of Prediction Error Hence, ∇ˆcR(ˆc+αδN;...
work page 2022
-
[10]
based on validation regret. For evaluation, portfolios are rebalanced every 21 trading days with a transaction cost of 0.1% applied to turnover. We report cumulative returnQT t=1(1 +r t)−1 , annualized return and volatility (scaled by √ 252), Sharpe ratio assuming zero risk-free rate, maximum drawdown, and average turnover at each rebalancing. F. Addition...
work page 2021
-
[11]
Large β (0.5) degrades performance on Knapsack, worsening monotonically with the polynomial degree
and β= 0.1 best at Deg 4, whereas the pure gradient is the weakest of the small-β settings. Large β (0.5) degrades performance on Knapsack, worsening monotonically with the polynomial degree. PEAR is otherwise stable for β∈[0.05,0.2], and we useβ= 0.1as the default. 18 Decision-Focused Learning via Tangent-Space Projection of Prediction Error Table 8.Norm...
work page 2020
-
[12]
Table 10.Real-world Knapsack (building investment).Normalized regret (%,↓) over 5 seeds
The budget is set to B= 0.4 P i wi, all other settings follow Appendix E.1, and we report normalized regret over 5 seeds. Table 10.Real-world Knapsack (building investment).Normalized regret (%,↓) over 5 seeds. Best inbold, second best underlined . MethodMSE SPO+ PFYL DBB LA V A PEAR Regret (%) 1.22±0.890.72±0.570.70±0.80 12.20±10.052.12±0.790.51±0.57 Tab...
work page 2006
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.