A Solver-Free Training Method for Predict-then-Optimize

Beichen Wan; Mo Liu

arxiv: 2606.19587 · v1 · pith:JRTC37XHnew · submitted 2026-06-17 · 📊 stat.ML · cs.LG

A Solver-Free Training Method for Predict-then-Optimize

Beichen Wan , Mo Liu This is my paper

Pith reviewed 2026-06-26 18:43 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords predict-then-optimizedecision-focused learningsurrogate lossmeasure transformationsolver-free trainingFisher consistencyexcess risk bounds

0 comments

The pith

Measure transformation produces a solver-free surrogate loss for training predict-then-optimize models with consistency guarantees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In predict-then-optimize settings a machine learning model outputs coefficients that feed into a linear optimization problem whose solution determines the final decision quality. Direct minimization of decision regret fails because the mapping from coefficients to decisions is piecewise constant and supplies zero gradients almost everywhere. The paper derives a surrogate loss by applying a measure transformation principle so that gradients during training never require calling the downstream optimizer. The resulting loss is shown to be Fisher consistent and to satisfy excess risk bounds, meaning its minimization yields predictions that are asymptotically optimal for the original decision objective. Experiments report decision quality comparable to solver-dependent baselines but with training times reduced by orders of magnitude.

Core claim

The central claim is that a measure transformation principle converts the intractable decision-regret objective into a new surrogate loss whose minimization during training requires no calls to the linear programming or combinatorial solver, while still guaranteeing Fisher consistency and excess risk bounds that link surrogate minimization to good decisions in the original problem.

What carries the argument

Measure transformation principle that re-expresses the decision regret as a surrogate loss depending only on predicted coefficients and true parameters, eliminating the need to solve the optimization problem inside the training loop.

If this is right

Training becomes independent of solver runtime and therefore scales to larger datasets and more complex predictors.
Fisher consistency ensures that as training data grows the learned model converges to the decision-optimal predictor.
Excess risk bounds quantify how closely surrogate minimization approximates the true decision regret.
The pipeline applies uniformly to linear programs and combinatorial problems without requiring differentiability of the solver.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The decoupling of training from solver calls could allow the method to be used with proprietary or non-differentiable black-box optimizers.
Similar measure transformations might be explored for other non-smooth decision mappings such as those arising in integer or stochastic programming.
Because the loss depends only on predictions and ground-truth parameters, it could support distributed or federated training where the optimizer itself cannot be shared.

Load-bearing premise

The surrogate obtained by the measure transformation has minimizers whose decisions match the quality of minimizers of the original decision regret.

What would settle it

A dataset where models trained to low surrogate loss still produce decisions with substantially higher regret than models trained by any direct regret approximation would falsify the claim that the surrogate preserves decision quality.

Figures

Figures reproduced from arXiv: 2606.19587 by Beichen Wan, Mo Liu.

**Figure 1.** Figure 1: Schematic comparison of training pipelines. (Left) Decision-blind learning minimizes prediction error without considering the downstream task. (Right) Decision-focused methods incorporate the optimization solver into the training loop, requiring computationally expensive differentiation through the solver. (Mid) Our approach integrates the optimization structure by a measure transformation, enabling a c… view at source ↗

**Figure 2.** Figure 2: Illustration of the proposed probability measure transformation process, here we ignore the dependencies on X and only focus on Y since the transformation is mainly about the marginal distribution of Y . (Left) The original source distribution Y ∼ N (µ, Σ) with µ = [0.5, 0.5]⊤ and Σ = [1, −0.3; −0.3, 1]. (Middle) The intermediate reweighted density Q˜, proportional to ∥y∥p(y), which shifts probability mass… view at source ↗

**Figure 4.** Figure 4: Average normalized regret vs. average training time for the shortest path problem. Results are from 20 independent trials with 100 training epochs, under varying degrees of misspecification (degree = [1, 4, 8]) and varying number of samples (N = 400, 1600). end learning solution. It is worth noting that while the 5 × 5 shortest path problem allows for efficient polynomial-time solutions, our method still … view at source ↗

**Figure 5.** Figure 5: Normalized testing set regret vs. number of training samples for portfolio optimization problem. Results are from 20 independent trials with 100 training epochs, under varying degrees of misspecification (degree = [1, 3, 7]). ping from prediction to decision, where small prediction errors translate to proportionally small decision regret. Nevertheless, our method maintains a consistent advantage, particu… view at source ↗

**Figure 6.** Figure 6: Visual illustration of the counterexample. The shaded area represents the feasible region S = {w ∈ R 2 : ∥w∥2 ≤ 1}. The blue arrow denotes the true expectation EP [Y ]. The red and green arrows correspond to estimators Yˆ1 and Yˆ2 derived under measures Q1 and Q2, respectively, while the purple arrow represents the estimator YˆQ from our proposed method. Fisher consistency requires the estimator to be coll… view at source ↗

**Figure 7.** Figure 7: Average normalized test regret vs average training time for the shortest path problem, under varying degrees of misspecification and numbers of training samples. Results are collected from 20 trials and 100 training epochs. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

read the original abstract

We propose a scalable method for training prediction (machine learning) models in the predict-then-optimize paradigm, where model outputs serve as coefficients for a subsequent linear optimization task. Directly minimizing the empirical decision regret is intractable for linear programming and combinatorial optimization since the decision mapping is piecewise constant, and the gradients are zero almost everywhere. While existing methods address this by smoothing the differentiation process, they suffer from scalability issues, since a computationally expensive solver call is required for every gradient evaluation. To address this, we propose a decision-focused learning pipeline based on a measure transformation principle, which yields a new surrogate loss that is completely optimization-solver-free during training. We establish theoretical guarantees, including Fisher consistency and excess risk bounds. Empirically, our method achieves decision quality competitive with state-of-the-art methods while reducing training time by orders of magnitude.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a measure-transformation surrogate that removes solver calls from training in predict-then-optimize, with claimed Fisher consistency and faster runtimes, but the link from surrogate minimizer to original decision quality still needs checking.

read the letter

The main thing here is a new surrogate loss for predict-then-optimize that avoids calling an LP solver at every gradient step. Instead of smoothing the decision map like prior work, it uses a measure transformation to create a differentiable loss that can be optimized directly with standard ML tools. That change is the concrete novelty, and if it holds it would cut training time by the orders of magnitude the abstract claims.

The paper states it proves Fisher consistency and excess risk bounds for the surrogate, and reports competitive decision quality on the usual benchmarks. Those are the parts that matter most for a reader in this subfield. The empirical speed-up is the practical payoff if the numbers check out.

The soft spot is exactly the one the stress-test flags. Fisher consistency guarantees that the surrogate recovers the right predictor in the limit, but it does not automatically guarantee that the decisions induced by the transformed measure match the regret minimizer of the original linear program, especially for finite samples or when cost vectors are not uniformly distributed. If the reweighting shifts mass near the decision boundaries, the excess-risk bound on the surrogate may not translate to low regret on the true problem. The abstract does not show the extra conditions that would close this gap, so that step needs to be explicit in the full proofs.

This is the kind of paper that belongs in a reading group for people working on decision-focused learning. It is worth sending to referees because the scalability claim addresses a real bottleneck and the proposed mechanism is distinct from existing smoothing approaches. A serious review would focus on whether the measure transformation preserves decision quality without hidden bias; if that holds, the work is useful. If the proofs only cover the surrogate and not the original regret, then the central claim needs revision.

Referee Report

2 major / 2 minor

Summary. The paper proposes a decision-focused learning pipeline for predict-then-optimize problems that derives a new surrogate loss via a measure transformation principle. This surrogate is claimed to be completely solver-free during training, with theoretical guarantees of Fisher consistency and excess risk bounds, and empirical results showing competitive decision quality with orders-of-magnitude faster training compared to solver-dependent baselines.

Significance. If the measure transformation produces a surrogate whose minimization yields decisions with low regret in the original linear program (without introducing distribution-dependent biases), the approach would meaningfully advance scalability in decision-focused learning by removing per-gradient solver calls. The asserted theoretical guarantees (Fisher consistency plus excess risk bounds) would be a notable strength if they directly connect surrogate optimality to original-problem regret.

major comments (2)

[Theoretical guarantees section (Fisher consistency and excess risk bounds)] The central claim that the measure-transformation surrogate yields decisions with low regret in the original LP rests on an unverified equivalence: Fisher consistency of the surrogate ensures recovery of the Bayes predictor under the transformed measure, but does not automatically ensure that argmin decisions match those of the original regret for finite samples or non-uniform cost distributions (see the weakest-assumption note). A direct argument or counter-example analysis linking the two is load-bearing and missing.
[Measure transformation principle and its derivation] The excess-risk bound on the surrogate is presented as translating to decision quality, yet the reweighting induced by the measure transformation can alter the relative frequency of cost vectors near decision boundaries; without an additional condition on the cost distribution or a regret-transfer lemma, the bound does not guarantee low original regret.

minor comments (2)

[Experiments] The abstract states 'orders of magnitude' training-time reduction; the main experimental section should report concrete wall-clock ratios and solver-call counts versus the strongest baselines (e.g., SPO+, DCOL) for each benchmark.
[Method] Notation for the transformed measure and the surrogate loss should be introduced with an explicit equation early in the method section to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments correctly identify that the link between surrogate optimality and original regret requires an explicit transfer argument, which is not fully developed in the current manuscript. We will revise to add this connection.

read point-by-point responses

Referee: [Theoretical guarantees section (Fisher consistency and excess risk bounds)] The central claim that the measure-transformation surrogate yields decisions with low regret in the original LP rests on an unverified equivalence: Fisher consistency of the surrogate ensures recovery of the Bayes predictor under the transformed measure, but does not automatically ensure that argmin decisions match those of the original regret for finite samples or non-uniform cost distributions (see the weakest-assumption note). A direct argument or counter-example analysis linking the two is load-bearing and missing.

Authors: We agree that Fisher consistency under the transformed measure does not by itself guarantee matching argmin decisions on the original regret for finite samples or arbitrary cost distributions. The manuscript currently stops at consistency and excess-risk bounds on the surrogate. In revision we will add a direct regret-transfer lemma (under the paper's stated assumptions on the cost distribution) that bounds the original decision regret in terms of the surrogate excess risk, together with a brief counter-example analysis showing when the link would fail without the lemma. This addition will make the equivalence explicit rather than implicit. revision: yes
Referee: [Measure transformation principle and its derivation] The excess-risk bound on the surrogate is presented as translating to decision quality, yet the reweighting induced by the measure transformation can alter the relative frequency of cost vectors near decision boundaries; without an additional condition on the cost distribution or a regret-transfer lemma, the bound does not guarantee low original regret.

Authors: The referee is correct that reweighting can change the mass near decision boundaries and that the current excess-risk bound on the surrogate therefore does not automatically imply a bound on original regret. We will revise the theoretical section to state an explicit condition on the cost distribution (or prove the bound holds without it) and insert the regret-transfer lemma mentioned above. The revised manuscript will therefore contain both the additional condition (if needed) and the lemma that converts surrogate excess risk into original regret. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The provided abstract and context describe a measure transformation principle yielding a solver-free surrogate loss, with asserted Fisher consistency and excess risk bounds. No equations, self-citations, or derivations are visible that reduce any prediction or result to fitted inputs by construction, self-definition, or load-bearing self-citation chains. The central claim rests on an independently stated principle rather than renaming known results or smuggling ansatzes via prior author work. This matches the reader's assessment that no visible reductions exist, making the derivation self-contained against external benchmarks. No load-bearing steps qualify under the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the measure transformation yielding a valid surrogate; no free parameters or invented entities are specified in the abstract.

axioms (1)

domain assumption Measure transformation principle can be applied to produce a Fisher-consistent surrogate for decision regret in linear optimization
Invoked to establish the theoretical guarantees mentioned in the abstract.

invented entities (1)

Measure-transformation-based surrogate loss no independent evidence
purpose: Enable solver-free training while approximating decision quality
New loss function introduced by the paper

pith-pipeline@v0.9.1-grok · 5664 in / 1099 out tokens · 15152 ms · 2026-06-26T18:43:37.888583+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Decision-Focused Learning: When and Why Traditional Prediction Models Fail
cs.LG 2026-06 unverdicted novelty 2.0

A tutorial reviewing why traditional prediction models often fail to improve decision quality in stochastic optimization and summarizing key properties and tools of decision-focused learning.

Reference graph

Works this paper leans on

63 extracted references · cited by 1 Pith paper

[1]

International conference on machine learning , pages=

Optnet: Differentiable optimization as a layer in neural networks , author=. International conference on machine learning , pages=. 2017 , organization=

2017
[2]

Mathematics of Operations Research , volume=

Generalization bounds in the predict-then-optimize framework , author=. Mathematics of Operations Research , volume=
[3]

Operations Research , volume=

The big data newsvendor: Practical insights from machine learning , author=. Operations Research , volume=. 2019 , publisher=

2019
[4]

Management Science , volume=

Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity , author=. Management Science , volume=. 2021 , publisher=

2021
[5]

Advances in neural information processing systems , volume=

Learning with differentiable pertubed optimizers , author=. Advances in neural information processing systems , volume=
[6]

Management Science , volume=

From predictive to prescriptive analytics , author=. Management Science , volume=. 2020 , publisher=

2020
[7]

Blondel, Mathieu and Martins, André F. T. and Niculae, Vlad , date =. Learning with. 1901.02324 , eprinttype =

arXiv 1901
[8]

Implicit

Domke, Justin , date =. Implicit. Advances in
[9]

Advances in neural information processing systems , volume=

Task-based end-to-end model learning in stochastic optimization , author=. Advances in neural information processing systems , volume=
[10]

International Conference on Artificial Intelligence and Statistics , pages=

Dissecting the Impact of Model Misspecification in Data-Driven Optimization , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2025 , organization=

2025
[12]

and Lam, Henry and Zhang, Haofeng and Zhao, Yunfan , date =

Elmachtoub, Adam N. and Lam, Henry and Zhang, Haofeng and Zhao, Yunfan , date =. Estimate-. 2304.06833 , eprinttype =

arXiv
[13]

predict, then optimize

Smart “predict, then optimize” , author=. Management Science , volume=. 2022 , publisher=

2022
[14]

Proceedings of the AAAI conference on artificial intelligence , volume=

Mipaal: Mixed integer program as a layer , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[15]

Advances in Neural Information Processing Systems , volume=

Decision-focused learning with directional gradients , author=. Advances in Neural Information Processing Systems , volume=
[16]

Prescriptive

He, Long and Mak, Ho-Yin , date =. Prescriptive. Proceedings of the 29th. 2306.02223 , eprinttype =

arXiv
[17]

Hu, Xinyi and Lee, Jasper C. H. and Lee, Jimmy H. M. , date =. Branch &. Integration of
[18]

Management Science , volume=

Fast rates for contextual linear optimization , author=. Management Science , volume=. 2022 , publisher=

2022
[19]

Hu, Xinyi and Lee, Jasper C. H. and Lee, Jimmy H. M. , date =. Predict+. 2209.03668 , eprinttype =

arXiv
[20]

and Lee, Jimmy H.M

Hu, Xinyi and Lee, Jasper C.H. and Lee, Jimmy H.M. , date =. Predict+. Proceedings of the
[21]

Hu, Xinyi and Lee, Jasper C. H. and Lee, Jimmy H. M. , date =. Two-. 2311.08022 , eprinttype =

arXiv
[22]

Im, Hyungki and Benslimane, Wyame and Grigas, Paul , date =. Smart. 2505.22881 , eprinttype =

arXiv
[23]

Advances in Neural Information Processing Systems , volume=

The bias-variance tradeoff in data-driven optimization: A local misspecification perspective , author=. Advances in Neural Information Processing Systems , volume=
[24]

Advances in Neural Information Processing Systems , volume=

Risk bounds and calibration for a smart predict-then-optimize method , author=. Advances in Neural Information Processing Systems , volume=
[25]

Decision-

Mandi, Jayanta and Kotary, James and Berden, Senne and Mulamba, Maxime and Bucarey, Victor and Guns, Tias and Fioretto, Ferdinando , date =. Decision-
[26]

Decision-

Mandi, Jayanta and Bucarey, Vı́ctor and Tchomba, Maxime Mulamba Ke and Guns, Tias , date =. Decision-. Proceedings of the 39th
[27]

Feasibility-

Mandi, Jayanta and Defresne, Marianne and Berden, Senne and Guns, Tias , date =. Feasibility-. 2510.04951 , eprinttype =

arXiv
[28]

Advances in Neural Information Processing Systems , volume=

Interior point solving for lp-based prediction+ optimisation , author=. Advances in Neural Information Processing Systems , volume=
[29]

Proceedings of the AAAI conference on artificial intelligence , volume=

Smart predict-and-optimize for hard combinatorial optimization problems , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[30]

Integrated

Qi, Meng and Grigas, Paul and Shen, Zuo-Jun Max , date =. Integrated. 2110.12351 , eprinttype =

arXiv
[31]

A Survey of Contextual Optimization Methods for Decision-Making under Uncertainty , author =
[32]

Mathematical Programming Computation , volume=

PyEPO: a PyTorch-based end-to-end predict-then-optimize library for linear and integer programming , author=. Mathematical Programming Computation , volume=. 2024 , publisher=

2024
[33]

International Conference on Learning Representations , year=

Differentiation of blackbox combinatorial solvers , author=. International Conference on Learning Representations , year=
[34]

Wang, Prince Zizhuang and Liang, Jinhao and Chen, Shuyi and Fioretto, Ferdinando and Zhu, Shixiang , date =. Gen-. 2502.05468 , eprinttype =

arXiv
[35]

Learning

Wang, Irina and Parys, Bart Van and Stellato, Bartolomeo , date =. Learning. 2305.19225 , eprinttype =

arXiv
[36]

End to End Learning and Optimization on Graphs , booktitle =

Wilder, Bryan and Ewing, Eric and Dilkina, Bistra and Tambe, Milind , date =. End to End Learning and Optimization on Graphs , booktitle =
[37]

Proceedings of the AAAI conference on artificial intelligence , volume=

Melding the data-decisions pipeline: Decision-focused learning for combinatorial optimization , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[38]

2009 , publisher=

Directional statistics , author=. 2009 , publisher=

2009
[39]

Biometrika , volume=

Spherical regression , author=. Biometrika , volume=. 2003 , publisher=

2003
[40]

1998 , publisher=

Applied regression analysis , author=. 1998 , publisher=

1998
[41]

arXiv preprint arXiv:2505.11360 , year=

Efficient End-to-End Learning for Decision-Making: A Meta-Optimization Approach , author=. arXiv preprint arXiv:2505.11360 , year=

arXiv
[42]

Advances in Neural Information Processing Systems , volume=

Decision-focused learning without decision-making: Learning locally optimized decision losses , author=. Advances in Neural Information Processing Systems , volume=
[43]

Advances in Neural Information Processing Systems , volume=

Landscape surrogate: Learning decision losses for mathematical optimization under partial information , author=. Advances in Neural Information Processing Systems , volume=
[44]

International Conference on Machine Learning , pages=

Satnet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver , author=. International Conference on Machine Learning , pages=. 2019 , organization=

2019
[45]

Advances in neural information processing systems , volume=

Differentiable convex optimization layers , author=. Advances in neural information processing systems , volume=
[46]

Advances in Neural Information Processing Systems , volume=

Automatically learning compact quality-aware surrogates for optimization problems , author=. Advances in Neural Information Processing Systems , volume=
[47]

1999 , publisher=

Integer and combinatorial optimization , author=. 1999 , publisher=

1999
[48]

Management Science , volume=

Risk guarantees for end-to-end prediction and optimization processes , author=. Management Science , volume=. 2022 , publisher=

2022
[49]

arXiv preprint arXiv:2305.06584 , year=

Active learning in the predict-then-optimize framework: A margin-based approach , author=. arXiv preprint arXiv:2305.06584 , year=

arXiv
[50]

arXiv preprint arXiv:2602.05340 , year=

Decision-Focused Sequential Experimental Design: A Directional Uncertainty-Guided Approach , author=. arXiv preprint arXiv:2602.05340 , year=

arXiv
[51]

arXiv preprint arXiv:2602.02800 , year=

Decision-Focused Optimal Transport , author=. arXiv preprint arXiv:2602.02800 , year=

arXiv
[52]

arXiv preprint arXiv:2512.15726 , year=

Decision-focused bias correction for fluid approximation , author=. arXiv preprint arXiv:2512.15726 , year=

arXiv
[53]

Management Science , volume=

Small-data, large-scale linear optimization with uncertain objectives , author=. Management Science , volume=. 2021 , publisher=

2021
[54]

Operations Research , volume=

Debiasing in-sample policy performance for small-data, large-scale optimization , author=. Operations Research , volume=. 2024 , publisher=

2024
[55]

Journal of Artificial Intelligence Research , volume=

Decision-focused learning: Foundations, state of the art, benchmark and future opportunities , author=. Journal of Artificial Intelligence Research , volume=
[56]

European Journal of Operational Research , volume=

A survey of contextual optimization methods for decision-making under uncertainty , author=. European Journal of Operational Research , volume=. 2025 , publisher=

2025
[57]

International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research , pages=

Cave: A cone-aligned approach for fast predict-then-optimize with binary linear programs , author=. International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research , pages=. 2024 , organization=

2024
[58]

Advances in Neural Information Processing Systems , volume=

Solver-free decision-focused learning for linear optimization problems , author=. Advances in Neural Information Processing Systems , volume=
[59]

From Inverse Optimization to Feasibility to

Mishra, Saurabh Kumar and Raj, Anant and Vaswani, Sharan , booktitle =. From Inverse Optimization to Feasibility to. 2024 , editor =

2024
[60]

Journal of Machine Learning Research , year =

Fast Rates in Statistical and Online Learning , author =. Journal of Machine Learning Research , year =
[61]

2016 , eprint =

Fast rates with high probability in exp-concave statistical learning , author =. 2016 , eprint =

2016
[62]

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) , series =

Fast rates with high probability in exp-concave statistical learning , author =. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) , series =. 2017 , editor =

2017
[63]

Sphere packing numbers for subsets of the Boolean n -cube with bounded

Haussler, David , journal =. Sphere packing numbers for subsets of the Boolean n -cube with bounded. 1995 , volume =

1995
[64]

Available at SSRN 4487888 , year=

Value of one data point: Active label acquisition in assortment optimization , author=. Available at SSRN 4487888 , year=

[1] [1]

International conference on machine learning , pages=

Optnet: Differentiable optimization as a layer in neural networks , author=. International conference on machine learning , pages=. 2017 , organization=

2017

[2] [2]

Mathematics of Operations Research , volume=

Generalization bounds in the predict-then-optimize framework , author=. Mathematics of Operations Research , volume=

[3] [3]

Operations Research , volume=

The big data newsvendor: Practical insights from machine learning , author=. Operations Research , volume=. 2019 , publisher=

2019

[4] [4]

Management Science , volume=

Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity , author=. Management Science , volume=. 2021 , publisher=

2021

[5] [5]

Advances in neural information processing systems , volume=

Learning with differentiable pertubed optimizers , author=. Advances in neural information processing systems , volume=

[6] [6]

Management Science , volume=

From predictive to prescriptive analytics , author=. Management Science , volume=. 2020 , publisher=

2020

[7] [7]

Blondel, Mathieu and Martins, André F. T. and Niculae, Vlad , date =. Learning with. 1901.02324 , eprinttype =

arXiv 1901

[8] [8]

Implicit

Domke, Justin , date =. Implicit. Advances in

[9] [9]

Advances in neural information processing systems , volume=

Task-based end-to-end model learning in stochastic optimization , author=. Advances in neural information processing systems , volume=

[10] [10]

International Conference on Artificial Intelligence and Statistics , pages=

Dissecting the Impact of Model Misspecification in Data-Driven Optimization , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2025 , organization=

2025

[11] [12]

and Lam, Henry and Zhang, Haofeng and Zhao, Yunfan , date =

Elmachtoub, Adam N. and Lam, Henry and Zhang, Haofeng and Zhao, Yunfan , date =. Estimate-. 2304.06833 , eprinttype =

arXiv

[12] [13]

predict, then optimize

Smart “predict, then optimize” , author=. Management Science , volume=. 2022 , publisher=

2022

[13] [14]

Proceedings of the AAAI conference on artificial intelligence , volume=

Mipaal: Mixed integer program as a layer , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

[14] [15]

Advances in Neural Information Processing Systems , volume=

Decision-focused learning with directional gradients , author=. Advances in Neural Information Processing Systems , volume=

[15] [16]

Prescriptive

He, Long and Mak, Ho-Yin , date =. Prescriptive. Proceedings of the 29th. 2306.02223 , eprinttype =

arXiv

[16] [17]

Hu, Xinyi and Lee, Jasper C. H. and Lee, Jimmy H. M. , date =. Branch &. Integration of

[17] [18]

Management Science , volume=

Fast rates for contextual linear optimization , author=. Management Science , volume=. 2022 , publisher=

2022

[18] [19]

Hu, Xinyi and Lee, Jasper C. H. and Lee, Jimmy H. M. , date =. Predict+. 2209.03668 , eprinttype =

arXiv

[19] [20]

and Lee, Jimmy H.M

Hu, Xinyi and Lee, Jasper C.H. and Lee, Jimmy H.M. , date =. Predict+. Proceedings of the

[20] [21]

Hu, Xinyi and Lee, Jasper C. H. and Lee, Jimmy H. M. , date =. Two-. 2311.08022 , eprinttype =

arXiv

[21] [22]

Im, Hyungki and Benslimane, Wyame and Grigas, Paul , date =. Smart. 2505.22881 , eprinttype =

arXiv

[22] [23]

Advances in Neural Information Processing Systems , volume=

The bias-variance tradeoff in data-driven optimization: A local misspecification perspective , author=. Advances in Neural Information Processing Systems , volume=

[23] [24]

Advances in Neural Information Processing Systems , volume=

Risk bounds and calibration for a smart predict-then-optimize method , author=. Advances in Neural Information Processing Systems , volume=

[24] [25]

Decision-

Mandi, Jayanta and Kotary, James and Berden, Senne and Mulamba, Maxime and Bucarey, Victor and Guns, Tias and Fioretto, Ferdinando , date =. Decision-

[25] [26]

Decision-

Mandi, Jayanta and Bucarey, Vı́ctor and Tchomba, Maxime Mulamba Ke and Guns, Tias , date =. Decision-. Proceedings of the 39th

[26] [27]

Feasibility-

Mandi, Jayanta and Defresne, Marianne and Berden, Senne and Guns, Tias , date =. Feasibility-. 2510.04951 , eprinttype =

arXiv

[27] [28]

Advances in Neural Information Processing Systems , volume=

Interior point solving for lp-based prediction+ optimisation , author=. Advances in Neural Information Processing Systems , volume=

[28] [29]

Proceedings of the AAAI conference on artificial intelligence , volume=

Smart predict-and-optimize for hard combinatorial optimization problems , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

[29] [30]

Integrated

Qi, Meng and Grigas, Paul and Shen, Zuo-Jun Max , date =. Integrated. 2110.12351 , eprinttype =

arXiv

[30] [31]

A Survey of Contextual Optimization Methods for Decision-Making under Uncertainty , author =

[31] [32]

Mathematical Programming Computation , volume=

PyEPO: a PyTorch-based end-to-end predict-then-optimize library for linear and integer programming , author=. Mathematical Programming Computation , volume=. 2024 , publisher=

2024

[32] [33]

International Conference on Learning Representations , year=

Differentiation of blackbox combinatorial solvers , author=. International Conference on Learning Representations , year=

[33] [34]

Wang, Prince Zizhuang and Liang, Jinhao and Chen, Shuyi and Fioretto, Ferdinando and Zhu, Shixiang , date =. Gen-. 2502.05468 , eprinttype =

arXiv

[34] [35]

Learning

Wang, Irina and Parys, Bart Van and Stellato, Bartolomeo , date =. Learning. 2305.19225 , eprinttype =

arXiv

[35] [36]

End to End Learning and Optimization on Graphs , booktitle =

Wilder, Bryan and Ewing, Eric and Dilkina, Bistra and Tambe, Milind , date =. End to End Learning and Optimization on Graphs , booktitle =

[36] [37]

Proceedings of the AAAI conference on artificial intelligence , volume=

Melding the data-decisions pipeline: Decision-focused learning for combinatorial optimization , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

[37] [38]

2009 , publisher=

Directional statistics , author=. 2009 , publisher=

2009

[38] [39]

Biometrika , volume=

Spherical regression , author=. Biometrika , volume=. 2003 , publisher=

2003

[39] [40]

1998 , publisher=

Applied regression analysis , author=. 1998 , publisher=

1998

[40] [41]

arXiv preprint arXiv:2505.11360 , year=

Efficient End-to-End Learning for Decision-Making: A Meta-Optimization Approach , author=. arXiv preprint arXiv:2505.11360 , year=

arXiv

[41] [42]

Advances in Neural Information Processing Systems , volume=

Decision-focused learning without decision-making: Learning locally optimized decision losses , author=. Advances in Neural Information Processing Systems , volume=

[42] [43]

Advances in Neural Information Processing Systems , volume=

Landscape surrogate: Learning decision losses for mathematical optimization under partial information , author=. Advances in Neural Information Processing Systems , volume=

[43] [44]

International Conference on Machine Learning , pages=

Satnet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver , author=. International Conference on Machine Learning , pages=. 2019 , organization=

2019

[44] [45]

Advances in neural information processing systems , volume=

Differentiable convex optimization layers , author=. Advances in neural information processing systems , volume=

[45] [46]

Advances in Neural Information Processing Systems , volume=

Automatically learning compact quality-aware surrogates for optimization problems , author=. Advances in Neural Information Processing Systems , volume=

[46] [47]

1999 , publisher=

Integer and combinatorial optimization , author=. 1999 , publisher=

1999

[47] [48]

Management Science , volume=

Risk guarantees for end-to-end prediction and optimization processes , author=. Management Science , volume=. 2022 , publisher=

2022

[48] [49]

arXiv preprint arXiv:2305.06584 , year=

Active learning in the predict-then-optimize framework: A margin-based approach , author=. arXiv preprint arXiv:2305.06584 , year=

arXiv

[49] [50]

arXiv preprint arXiv:2602.05340 , year=

Decision-Focused Sequential Experimental Design: A Directional Uncertainty-Guided Approach , author=. arXiv preprint arXiv:2602.05340 , year=

arXiv

[50] [51]

arXiv preprint arXiv:2602.02800 , year=

Decision-Focused Optimal Transport , author=. arXiv preprint arXiv:2602.02800 , year=

arXiv

[51] [52]

arXiv preprint arXiv:2512.15726 , year=

Decision-focused bias correction for fluid approximation , author=. arXiv preprint arXiv:2512.15726 , year=

arXiv

[52] [53]

Management Science , volume=

Small-data, large-scale linear optimization with uncertain objectives , author=. Management Science , volume=. 2021 , publisher=

2021

[53] [54]

Operations Research , volume=

Debiasing in-sample policy performance for small-data, large-scale optimization , author=. Operations Research , volume=. 2024 , publisher=

2024

[54] [55]

Journal of Artificial Intelligence Research , volume=

Decision-focused learning: Foundations, state of the art, benchmark and future opportunities , author=. Journal of Artificial Intelligence Research , volume=

[55] [56]

European Journal of Operational Research , volume=

A survey of contextual optimization methods for decision-making under uncertainty , author=. European Journal of Operational Research , volume=. 2025 , publisher=

2025

[56] [57]

International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research , pages=

Cave: A cone-aligned approach for fast predict-then-optimize with binary linear programs , author=. International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research , pages=. 2024 , organization=

2024

[57] [58]

Advances in Neural Information Processing Systems , volume=

Solver-free decision-focused learning for linear optimization problems , author=. Advances in Neural Information Processing Systems , volume=

[58] [59]

From Inverse Optimization to Feasibility to

Mishra, Saurabh Kumar and Raj, Anant and Vaswani, Sharan , booktitle =. From Inverse Optimization to Feasibility to. 2024 , editor =

2024

[59] [60]

Journal of Machine Learning Research , year =

Fast Rates in Statistical and Online Learning , author =. Journal of Machine Learning Research , year =

[60] [61]

2016 , eprint =

Fast rates with high probability in exp-concave statistical learning , author =. 2016 , eprint =

2016

[61] [62]

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) , series =

Fast rates with high probability in exp-concave statistical learning , author =. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) , series =. 2017 , editor =

2017

[62] [63]

Sphere packing numbers for subsets of the Boolean n -cube with bounded

Haussler, David , journal =. Sphere packing numbers for subsets of the Boolean n -cube with bounded. 1995 , volume =

1995

[63] [64]

Available at SSRN 4487888 , year=

Value of one data point: Active label acquisition in assortment optimization , author=. Available at SSRN 4487888 , year=