Causal-Aware Foundation-Model for Bilevel Optimization in Discrete Choice Settings

Jayant Kalagnanam; Markus Ettl; Shivaram Subramanian; Yingdong Lu; Zhengliang Xue

arxiv: 2605.06941 · v1 · submitted 2026-05-07 · 💻 cs.LG · math.OC

Causal-Aware Foundation-Model for Bilevel Optimization in Discrete Choice Settings

Shivaram Subramanian , Zhengliang Xue , Markus Ettl , Yingdong Lu , Jayant Kalagnanam This is my paper

Pith reviewed 2026-05-11 01:05 UTC · model grok-4.3

classification 💻 cs.LG math.OC

keywords bilevel optimizationdiscrete choice modelsfoundation modelspricing optimizationin-context learningimitation learningprice elasticityrevenue management

0 comments

The pith

A foundation model trained on simulated discrete choice data learns to set prices and assortments in new environments by retrieving elasticity priors and respecting constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a causal-aware foundation-model approach for real-time bilevel pricing decisions where a provider picks an assortment and prices while heterogeneous customers accept or reject based on their own preferences. It trains a constrained triple-head network called C3PO on simulated data generated from classical discrete choice models, combining imitation learning for prices, multi-task revenue prediction, and in-context retrieval of elasticity information from economics literature. The resulting model produces recommendations for entirely new choice settings without seeing the underlying preference structure. Gains appear in simulated tests and real deployments across healthcare, tender pricing, airline ancillaries, and other domains, and the improvements grow larger when customers are more price-sensitive. This matters for any setting that needs fast, constraint-aware pricing without full knowledge of customer utilities.

Core claim

The C3PO network solves the bilevel problem by integrating imitation learning of prices, multi-task learning of revenue responses, and in-context learning of price elasticity, all while enforcing business constraints. Trained solely on simulated customer segments and counterfactual pairs drawn from multiple classical discrete choice models, the network produces effective pricing recommendations for randomly generated choice environments that provide no access to the true preference structure. It consistently raises pricing KPIs, with larger gains as customer price sensitivity increases, and the tuned model yields substantial improvements when deployed on real-world problems in healthcare, t

What carries the argument

The constrained triple-head price optimization (C3PO) network, which performs simultaneous price imitation, revenue response prediction, and in-context elasticity prior retrieval while projecting outputs onto feasible business constraints.

If this is right

Pricing KPIs rise consistently, and the size of the rise increases with measured customer price sensitivity.
The same trained network produces usable recommendations for previously unseen products and choice environments.
Real deployments in healthcare, tender pricing, and airline ancillary services deliver measurable revenue or margin gains across products and markets.
Business constraints are satisfied by construction because the network projects its outputs onto the feasible set at inference time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same simulation-plus-in-context pattern could be tested on other bilevel problems such as dynamic inventory allocation or personalized recommendation with capacity limits.
Performance may degrade if real customer responses contain systematic deviations from all classical discrete choice families that were not captured in the training simulations.
Adding more recent or domain-specific elasticity sources beyond the static literature corpus could further widen the observed gains.

Load-bearing premise

That data simulated from classical discrete choice models plus in-context retrieval of elasticity priors from behavioral economics literature is sufficient for the network to generalize to real customer behavior in new choice environments without access to the underlying preference structure.

What would settle it

Run the deployed model on a live market whose observed acceptance rates deviate sharply from predictions of any classical discrete choice model (for example, strong herding or reference-point effects) and check whether the KPI lift disappears or reverses.

Figures

Figures reproduced from arXiv: 2605.06941 by Jayant Kalagnanam, Markus Ettl, Shivaram Subramanian, Yingdong Lu, Zhengliang Xue.

**Figure 1.** Figure 1: High-level C3PO model architecture. elasticity prior provides a plausible elasticity range which anchors the price predictions; and (iii) Multi-task learning of the price-revenue curve: an auxiuliary revenue head learns to predict revenue for any price vector. This head is frozen and used as a reward signal to direct the price head towards higher revenue regions. The imitation learning module learns a repr… view at source ↗

**Figure 2.** Figure 2: Revenue-loss term as a function of dataset count, including a log-scaled y-axis and a [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗

**Figure 3.** Figure 3: Training loss as a function of dataset count, including a log-scaled y-axis and a 50-point [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗

**Figure 4.** Figure 4: Average price ordering constraint violation as a function of dataset count, including a linear [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗

read the original abstract

We introduce a causal aware foundation-model framework for real time optimal decision making in discrete choice environments. We propose a constrained triple-head price optimization (C3PO) network to solve a bilevel decision problem in which a service provider selects an optimal assortment while heterogeneous users make personalized acceptance or rejection choices optimizing their own personalized preferences. C3PO integrates imitation learning of prices, multi-task learning of revenue responses, and in context learning of price elasticity to generate pricing recommendations while adhering to business constraints. During inference, frontier model prompting retrieves an enhanced elasticity prior for new products from behavioral economics literature, improving pricing effectiveness. We demonstrate strong in context learning performance using simulated, synthetic, and real-world datasets. C3PO is trained on simulated data generated from multiple classical discrete choice models in economics. The model is trained on data comprising simulated customer segments and counterfactual action and outcome pairs and evaluated on randomly generated choice environments with no access to the underlying preference structure. The trained model consistently improves the pricing KPIs, with gains increasing as customer price sensitivity increases. We also deploy the tuned foundation model for optimal pricing in real-world applications such as healthcare, tender pricing, airline ancillary pricing, and other domains, achieving substantial gains across multiple products, markets, and divisions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

C3PO fuses imitation learning, multi-task revenue heads, and literature elasticity retrieval into a constrained network for bilevel discrete-choice pricing, but stays inside simulated classical models with no numbers or external checks.

read the letter

The main point is a triple-head architecture called C3PO that trains on counterfactual pairs from logit, probit, and similar models, then uses in-context prompting to pull elasticity priors from behavioral economics literature for new products. It adds business constraints at inference time and claims better KPIs as price sensitivity rises, plus deployments in healthcare, airlines, and tender pricing.

Referee Report

4 major / 1 minor

Summary. The paper introduces a causal-aware foundation-model framework called C3PO for real-time bilevel optimization in discrete choice settings. A service provider optimizes assortments and prices subject to constraints while heterogeneous users make personalized acceptance/rejection decisions; the network combines imitation learning of prices, multi-task revenue prediction, and in-context retrieval of price-elasticity priors from behavioral-economics literature via frontier-model prompting. The model is trained exclusively on counterfactual pairs simulated from classical discrete-choice models (logit, probit, etc.) and evaluated on randomly generated environments drawn from the same model families, with no access to true preference parameters. The authors claim consistent KPI improvements that increase with customer price sensitivity and report substantial gains after deploying the tuned model in healthcare, tender, and airline ancillary pricing.

Significance. If the empirical claims can be substantiated with quantitative metrics and proper validation, the approach would represent a practical bridge between simulation-based training and real-time constrained pricing, leveraging foundation-model in-context learning to incorporate external economic priors. The training on multiple classical choice models and the explicit handling of business constraints are constructive elements that could scale to other bilevel decision problems if generalization beyond the training distribution is demonstrated.

major comments (4)

[Abstract] Abstract: the central empirical claims ('consistently improves the pricing KPIs, with gains increasing as customer price sensitivity increases' and 'achieving substantial gains across multiple products, markets, and divisions') are stated without any numerical results, baselines, error bars, ablation studies, or validation protocol, making it impossible to evaluate the magnitude or statistical reliability of the reported benefits.
[Training and Evaluation] Training and Evaluation sections: data are generated from classical discrete-choice models and evaluation environments are 'randomly generated choice environments' drawn from the same model families; this protocol does not constitute an out-of-distribution test and therefore cannot substantiate the claim that the network generalizes to real customer behavior when the true utility parameters are inaccessible.
[Deployment] Deployment claims: statements of successful real-world application in healthcare, tender pricing, and airline ancillary pricing are presented without any quantitative before/after KPIs, comparison to incumbent methods, or confirmation against observed acceptance rates, which is load-bearing for the assertion of practical utility.
[Methodology] Methodology: the in-context elasticity prior retrieved via frontier-model prompting is described as improving pricing effectiveness, yet no ablation isolating the contribution of these priors, no sensitivity analysis to literature selection, and no causal identification strategy are reported, leaving the 'causal-aware' component unverified.

minor comments (1)

The description of the constrained triple-head architecture (C3PO) would benefit from an explicit diagram or pseudocode showing how the three heads interact with the bilevel constraints during inference.

Simulated Author's Rebuttal

4 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, indicating where revisions will be made to improve clarity, substantiation, and transparency while preserving the core contributions of the work.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claims ('consistently improves the pricing KPIs, with gains increasing as customer price sensitivity increases' and 'achieving substantial gains across multiple products, markets, and divisions') are stated without any numerical results, baselines, error bars, ablation studies, or validation protocol, making it impossible to evaluate the magnitude or statistical reliability of the reported benefits.

Authors: We agree that the abstract would be strengthened by including concrete quantitative details. In the revised version we will expand the abstract to summarize key results from the experiments, including average revenue uplifts (with ranges across sensitivity levels), comparisons against baselines such as myopic pricing and classical optimization, references to ablation studies, and a brief mention of the validation protocol and error bars from multiple simulation runs. These details are already present in Sections 4 and 5; the abstract will now foreground them. revision: yes
Referee: [Training and Evaluation] Training and Evaluation sections: data are generated from classical discrete-choice models and evaluation environments are 'randomly generated choice environments' drawn from the same model families; this protocol does not constitute an out-of-distribution test and therefore cannot substantiate the claim that the network generalizes to real customer behavior when the true utility parameters are inaccessible.

Authors: The referee correctly notes that the evaluation remains within the family of classical discrete-choice models rather than testing true out-of-distribution real-world behavior. Our protocol is designed to evaluate performance when true preference parameters are unknown, which matches the practical setting. We will revise the manuscript to explicitly state this limitation, add a dedicated discussion of potential domain shift to real customer data, and include any anonymized real-world hold-out checks from the deployment cases. We believe the current results still demonstrate the value of the approach under the stated assumptions. revision: partial
Referee: [Deployment] Deployment claims: statements of successful real-world application in healthcare, tender pricing, and airline ancillary pricing are presented without any quantitative before/after KPIs, comparison to incumbent methods, or confirmation against observed acceptance rates, which is load-bearing for the assertion of practical utility.

Authors: We acknowledge that the deployment section currently lacks the quantitative detail needed for full evaluation. Because of confidentiality agreements, we cannot release specific before/after KPIs or direct numerical comparisons. In the revision we will expand the text to describe the validation process against observed acceptance rates, the constraint-handling outcomes, and high-level (non-proprietary) performance indicators. If this remains insufficient we are prepared to move the deployment claims to a supplementary note or qualify them more cautiously. revision: partial
Referee: [Methodology] Methodology: the in-context elasticity prior retrieved via frontier-model prompting is described as improving pricing effectiveness, yet no ablation isolating the contribution of these priors, no sensitivity analysis to literature selection, and no causal identification strategy are reported, leaving the 'causal-aware' component unverified.

Authors: We agree that the contribution of the in-context priors requires explicit verification. We will add a new ablation subsection comparing model performance with and without the retrieved elasticity priors, including sensitivity tests across different literature selections and prompting variations. We will also clarify that the causal-aware framing derives from training exclusively on counterfactual pairs generated by classical causal discrete-choice models together with the bilevel optimization structure; a short discussion of identification assumptions will be included. revision: yes

standing simulated objections not resolved

Specific numerical before/after KPIs and direct incumbent comparisons from the confidential real-world deployments cannot be provided.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes training C3PO on simulated customer segments and counterfactual pairs generated from classical discrete choice models, then evaluating on randomly generated environments from the same model families without access to underlying preferences. This is a standard supervised setup for testing imitation and in-context learning rather than a derivation that reduces by construction to its inputs. No equations, self-citations, or uniqueness theorems are invoked in the provided text to force the central claims; reported KPI improvements and real-world deployments are presented as empirical outcomes. The derivation chain remains self-contained against external benchmarks of simulated performance.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities can be extracted or audited. The framework implicitly rests on the fidelity of classical discrete-choice simulations and the relevance of retrieved economics literature, but these cannot be quantified here.

pith-pipeline@v0.9.0 · 5532 in / 1349 out tokens · 63584 ms · 2026-05-11T01:05:20.935545+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

C3PO is trained on simulated data generated from multiple classical discrete choice models... evaluated on randomly generated choice environments with no access to the underlying preference structure.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

[1]

Tabpfn: A transformer that solves small tabular classification problems in a second

Noah Hollmann, Samuel M ¨uller, Katharina Eggensperger, and Frank Hutter. Tabpfn: A transformer that solves small tabular classification problems in a second. InNeurIPS, First Table Representation Workshop, 2022

work page 2022
[2]

Conditional logit analysis of qualitative choice behavior.Frontiers in Econometrics, 1974

Daniel McFadden. Conditional logit analysis of qualitative choice behavior.Frontiers in Econometrics, 1974

work page 1974
[3]

The measurement of urban travel demand.Journal of public economics, 3(4):303–328, 1974

Daniel McFadden. The measurement of urban travel demand.Journal of public economics, 3(4):303–328, 1974

work page 1974
[4]

MIT press, 1992

Simon P Anderson, Andre De Palma, and Jacques-Francois Thisse.Discrete choice theory of product differentiation. MIT press, 1992

work page 1992
[5]

Multiproduct price optimization and competition under the nested logit model with product-differentiated price sensitivities.Operations Research, 62(2):450–461, 2014

Guillermo Gallego and Ruxian Wang. Multiproduct price optimization and competition under the nested logit model with product-differentiated price sensitivities.Operations Research, 62(2):450–461, 2014

work page 2014
[6]

Multiproduct pricing under the generalized extreme value models with homogeneous price sensitivity parameters.Operations Research, 66(6):1559–1570, 2018

Heng Zhang, Paat Rusmevichientong, and Huseyin Topaloglu. Multiproduct pricing under the generalized extreme value models with homogeneous price sensitivity parameters.Operations Research, 66(6):1559–1570, 2018

work page 2018
[7]

A data-driven approach to modeling choice.Advances in Neural Information Processing Systems, 22, 2009

Vivek Farias, Srikanth Jagabathula, and Devavrat Shah. A data-driven approach to modeling choice.Advances in Neural Information Processing Systems, 22, 2009

work page 2009
[8]

A nonparametric approach to modeling choice with limited data.Management science, 59(2):305–322, 2013

Vivek F Farias, Srikanth Jagabathula, and Devavrat Shah. A nonparametric approach to modeling choice with limited data.Management science, 59(2):305–322, 2013

work page 2013
[9]

A markov chain approximation to choice modeling.Operations Research, 64(4):886–905, 2016

Jose Blanchet, Guillermo Gallego, and Vineet Goyal. A markov chain approximation to choice modeling.Operations Research, 64(4):886–905, 2016

work page 2016
[10]

Routledge, 2016

Laurie A Garrow.Discrete choice modelling and air travel demand: theory and applications. Routledge, 2016

work page 2016
[11]

Pricing personalized bundles: A new approach and an empirical study.Manufacturing & Service Operations Management, 18(1):51–68, 2016

Zhengliang Xue, Zizhuo Wang, and Markus Ettl. Pricing personalized bundles: A new approach and an empirical study.Manufacturing & Service Operations Management, 18(1):51–68, 2016

work page 2016
[12]

Constrained prescriptive trees via column generation

Shivaram Subramanian, Wei Sun, Youssef Drissi, and Markus Ettl. Constrained prescriptive trees via column generation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36(4), pages 4602–4610, 2022

work page 2022
[13]

Bounds and heuristics for multiproduct pricing

Guillermo Gallego and Gerardo Berbeglia. Bounds and heuristics for multiproduct pricing. Management Science, 70(6):4132–4144, 2024

work page 2024
[14]

Baichuan Mo, Qingyi Wang, Xiaotong Guo, Matthias Winkenbach, and Jinhua Zhao. Predicting drivers’ route trajectories in last-mile delivery using a pair-wise attention-based pointer neural network.Transportation Research Part E: Logistics and Transportation Review, 175:103168, 2023

work page 2023
[15]

Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation

Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. InProceedings of the 2021 conference on empirical methods in natural language processing, pages 8696–8708, 2021. 10

work page 2021
[16]

Starcoder: may the source be with you!Transactions on Machine Learning Research, 2023

Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennigho, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et al. Starcoder: may the source be with you!Transactions on Machine Learning Research, 2023

work page 2023
[17]

Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

Noah Hollmann, Samuel M¨uller, Lennart Purucker, Arjun Krishnakumar, Max K¨orfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

work page 2025
[18]

Transformers can do bayesian inference

Samuel M¨uller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference. InInternational Conference on Learning Representa- tions, 2022

work page 2022
[19]

Representing random utility choice models with neural networks

Ali Aouad and Antoine D´esir. Representing random utility choice models with neural networks. Management Science, 2026

work page 2026
[20]

On the power of foundation models

Yang Yuan. On the power of foundation models. InInternational conference on machine learning, pages 40519–40530. PMLR, 2023

work page 2023
[21]

Optnet: Differentiable optimization as a layer in neural networks

Brandon Amos and J Zico Kolter. Optnet: Differentiable optimization as a layer in neural networks. InInternational conference on machine learning, pages 136–145. PMLR, 2017

work page 2017
[22]

Elsevier, 2019

Ievgen Redko, Emilie Morvant, Amaury Habrard, Marc Sebban, and Younes Bennani.Advances in domain adaptation theory. Elsevier, 2019

work page 2019
[23]

Springer, 2014

Charles A Rohde et al.Introductory statistical inference with the likelihood function. Springer, 2014

work page 2014
[24]

A theory of learning from different domains.Machine learning, 79(1):151–175, 2010

Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jen- nifer Wortman Vaughan. A theory of learning from different domains.Machine learning, 79(1):151–175, 2010

work page 2010
[25]

On large-batch training for deep learning: Generalization gap and sharp minima

Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On large-batch training for deep learning: Generalization gap and sharp minima. InInternational Conference on Learning Representations, 2017

work page 2017
[26]

Fantastic generalization measures and where to find them

Yiding Jiang, Behnam Neyshabur, Hossein Mobahi, Dilip Krishnan, and Samy Bengio. Fantastic generalization measures and where to find them. InInternational Conference on Learning Representations, 2020

work page 2020
[27]

Avoiding spurious sharpness minimization broadens applicability of sam

Sidak Pal Singh, Hossein Mobahi, Atish Agarwala, and Yann Dauphin. Avoiding spurious sharpness minimization broadens applicability of sam. InInternational Conference on Machine Learning, pages 55702–55719. PMLR, 2025

work page 2025
[28]

Entropy-sgd optimizes the prior of a pac-bayes bound: Generalization properties of entropy-sgd and data-dependent priors

Gintare Karolina Dziugaite and Daniel Roy. Entropy-sgd optimizes the prior of a pac-bayes bound: Generalization properties of entropy-sgd and data-dependent priors. InInternational Conference on Machine Learning, pages 1377–1386. PMLR, 2018

work page 2018
[29]

Estimation of choice-based models using sales data from a single firm.Manufacturing & Service Operations Management, 16(2):184–197, 2014

Jeffrey P Newman, Mark E Ferguson, Laurie A Garrow, and Timothy L Jacobs. Estimation of choice-based models using sales data from a single firm.Manufacturing & Service Operations Management, 16(2):184–197, 2014

work page 2014
[30]

The acceptance of modal innovation: The case of swissmetro.Swiss Transport Research Conference, 2001

Michel Bierlaire, Kay W Axhausen, and Georg Abay. The acceptance of modal innovation: The case of swissmetro.Swiss Transport Research Conference, 2001

work page 2001
[31]

A random-coefficients logit brand-choice model applied to panel data.Journal of Business & Economic Statistics, 12(3):317– 328, 1994

Dipak C Jain, Naufel J Vilcassim, and Pradeep K Chintagunta. A random-coefficients logit brand-choice model applied to panel data.Journal of Business & Economic Statistics, 12(3):317– 328, 1994

work page 1994
[32]

Microeconomics by Robert S

Ante Babi´c. Microeconomics by Robert S. Pindyck and Daniel L. Rubinfeld.Financial theory and practice, 29(4):385–386, 2005

work page 2005
[33]

Elsevier, 1998

N Gregory Mankiw.Principles of microeconomics, volume 1. Elsevier, 1998. 11

work page 1998
[34]

Customized regression model for airbnb dynamic pricing

Peng Ye, Julian Qian, Jieying Chen, Chen-hung Wu, Yitong Zhou, Spencer De Mars, Frank Yang, and Li Zhang. Customized regression model for airbnb dynamic pricing. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 932–940, 2018

work page 2018
[35]

Pricing frictions and platform remedies: The case of airbnb.Marketing Science, 41(6):1085–1108, 2022

Davide Proserpio, Meng Xu, and Georgios Zervas. Pricing frictions and platform remedies: The case of airbnb.Marketing Science, 41(6):1085–1108, 2022

work page 2022
[36]

https://www.ftc.gov/news-events/news/ press-releases/2025/01/ftc-surveillance-pricing-study-indicates-wide- range-personal-data-used-set-individualized-consumer, 2025

Federal Trade Commission Ftc surveillance pricing study indicates wide range of personal data used to set individualized consumer prices. https://www.ftc.gov/news-events/news/ press-releases/2025/01/ftc-surveillance-pricing-study-indicates-wide- range-personal-data-used-set-individualized-consumer, 2025

work page 2025
[37]

P. Langley. Crafting papers on machine learning. In Pat Langley, editor,Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pages 1207–1216, Stanford, CA, 2000

work page 2000
[38]

Parkin, M

M. Parkin, M. Powell, and K. Matthews.Economics. Addison-Wesley, 2008

work page 2008
[39]

Stochastic approximation with two time scales.Systems & Control Letters, 29(5):291–294, 1997

Vivek S Borkar. Stochastic approximation with two time scales.Systems & Control Letters, 29(5):291–294, 1997

work page 1997
[40]

An introduction to bilevel optimization: Foundations and applications in signal processing and machine learning.IEEE Signal Process

Yihua Zhang, Prashant Khanduri, Ioannis C Tsaknakis, Yuguang Yao, Mingyi Hong, and Sijia Liu. An introduction to bilevel optimization: Foundations and applications in signal processing and machine learning.IEEE Signal Process. Mag., 2024

work page 2024
[41]

Justifying recommendations using distantly- labeled reviews and fine-grained aspects

Jianmo Ni, Jiacheng Li, and Julian McAuley. Justifying recommendations using distantly- labeled reviews and fine-grained aspects. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Lan- guage Processing and the 9th International Joint Conference on Natural Language Processin...

work page 2019
[42]

The instacart online grocery shopping dataset 2017

Instacart. The instacart online grocery shopping dataset 2017. https://www.instacart.com/ datasets/grocery-shopping-2017, 2017

work page 2017
[43]

Uber and lyft dataset boston, ma

Brllrb. Uber and lyft dataset boston, ma. https://www.kaggle.com/datasets/brllrb/ uber-and-lyft-dataset-boston-ma, 2019

work page 2019
[44]

Tabpfgen– tabular data generation with tabpfn

Junwei Ma, Apoorv Dankar, George Stein, Guangwei Yu, and Anthony Caterini. Tabpfgen– tabular data generation with tabpfn. InNeurIPS, Second Table Representation Learning Workshop, 2023

work page 2023
[45]

Samuelson and William D

Paul A. Samuelson and William D. Nordhaus.Economics. McGraw-Hill, 19th edition, 2009

work page 2009
[46]

All beta_i must be negative

Paul Milgrom.Putting Auction Theory to Work. Cambridge University Press, 2004. 12 A Theoretical Development A.1 Methodologies for Decision Model under Discrete Choice A.1.1 Normalization of the Decision Variables Normalization is important both for the optimization procedure and for obtaining meaningful insights into the resulting optimal decisions, and i...

work page 2004
[47]

+Kcolumns. We generate prices p for the ’what-if’ data by sampling from a normal distribution with mean and standard deviation equal to 1.0, and clip the sampled values to lie within the interval [0,2] . This design restricts prices to a narrow range around the mean, reflecting real-world pricing distributions 19 Table 8: ICL-OFF ablation results. PDR/PIR...

work page

[1] [1]

Tabpfn: A transformer that solves small tabular classification problems in a second

Noah Hollmann, Samuel M ¨uller, Katharina Eggensperger, and Frank Hutter. Tabpfn: A transformer that solves small tabular classification problems in a second. InNeurIPS, First Table Representation Workshop, 2022

work page 2022

[2] [2]

Conditional logit analysis of qualitative choice behavior.Frontiers in Econometrics, 1974

Daniel McFadden. Conditional logit analysis of qualitative choice behavior.Frontiers in Econometrics, 1974

work page 1974

[3] [3]

The measurement of urban travel demand.Journal of public economics, 3(4):303–328, 1974

Daniel McFadden. The measurement of urban travel demand.Journal of public economics, 3(4):303–328, 1974

work page 1974

[4] [4]

MIT press, 1992

Simon P Anderson, Andre De Palma, and Jacques-Francois Thisse.Discrete choice theory of product differentiation. MIT press, 1992

work page 1992

[5] [5]

Multiproduct price optimization and competition under the nested logit model with product-differentiated price sensitivities.Operations Research, 62(2):450–461, 2014

Guillermo Gallego and Ruxian Wang. Multiproduct price optimization and competition under the nested logit model with product-differentiated price sensitivities.Operations Research, 62(2):450–461, 2014

work page 2014

[6] [6]

Multiproduct pricing under the generalized extreme value models with homogeneous price sensitivity parameters.Operations Research, 66(6):1559–1570, 2018

Heng Zhang, Paat Rusmevichientong, and Huseyin Topaloglu. Multiproduct pricing under the generalized extreme value models with homogeneous price sensitivity parameters.Operations Research, 66(6):1559–1570, 2018

work page 2018

[7] [7]

A data-driven approach to modeling choice.Advances in Neural Information Processing Systems, 22, 2009

Vivek Farias, Srikanth Jagabathula, and Devavrat Shah. A data-driven approach to modeling choice.Advances in Neural Information Processing Systems, 22, 2009

work page 2009

[8] [8]

A nonparametric approach to modeling choice with limited data.Management science, 59(2):305–322, 2013

Vivek F Farias, Srikanth Jagabathula, and Devavrat Shah. A nonparametric approach to modeling choice with limited data.Management science, 59(2):305–322, 2013

work page 2013

[9] [9]

A markov chain approximation to choice modeling.Operations Research, 64(4):886–905, 2016

Jose Blanchet, Guillermo Gallego, and Vineet Goyal. A markov chain approximation to choice modeling.Operations Research, 64(4):886–905, 2016

work page 2016

[10] [10]

Routledge, 2016

Laurie A Garrow.Discrete choice modelling and air travel demand: theory and applications. Routledge, 2016

work page 2016

[11] [11]

Pricing personalized bundles: A new approach and an empirical study.Manufacturing & Service Operations Management, 18(1):51–68, 2016

Zhengliang Xue, Zizhuo Wang, and Markus Ettl. Pricing personalized bundles: A new approach and an empirical study.Manufacturing & Service Operations Management, 18(1):51–68, 2016

work page 2016

[12] [12]

Constrained prescriptive trees via column generation

Shivaram Subramanian, Wei Sun, Youssef Drissi, and Markus Ettl. Constrained prescriptive trees via column generation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36(4), pages 4602–4610, 2022

work page 2022

[13] [13]

Bounds and heuristics for multiproduct pricing

Guillermo Gallego and Gerardo Berbeglia. Bounds and heuristics for multiproduct pricing. Management Science, 70(6):4132–4144, 2024

work page 2024

[14] [14]

Baichuan Mo, Qingyi Wang, Xiaotong Guo, Matthias Winkenbach, and Jinhua Zhao. Predicting drivers’ route trajectories in last-mile delivery using a pair-wise attention-based pointer neural network.Transportation Research Part E: Logistics and Transportation Review, 175:103168, 2023

work page 2023

[15] [15]

Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation

Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. InProceedings of the 2021 conference on empirical methods in natural language processing, pages 8696–8708, 2021. 10

work page 2021

[16] [16]

Starcoder: may the source be with you!Transactions on Machine Learning Research, 2023

Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennigho, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et al. Starcoder: may the source be with you!Transactions on Machine Learning Research, 2023

work page 2023

[17] [17]

Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

Noah Hollmann, Samuel M¨uller, Lennart Purucker, Arjun Krishnakumar, Max K¨orfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

work page 2025

[18] [18]

Transformers can do bayesian inference

Samuel M¨uller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference. InInternational Conference on Learning Representa- tions, 2022

work page 2022

[19] [19]

Representing random utility choice models with neural networks

Ali Aouad and Antoine D´esir. Representing random utility choice models with neural networks. Management Science, 2026

work page 2026

[20] [20]

On the power of foundation models

Yang Yuan. On the power of foundation models. InInternational conference on machine learning, pages 40519–40530. PMLR, 2023

work page 2023

[21] [21]

Optnet: Differentiable optimization as a layer in neural networks

Brandon Amos and J Zico Kolter. Optnet: Differentiable optimization as a layer in neural networks. InInternational conference on machine learning, pages 136–145. PMLR, 2017

work page 2017

[22] [22]

Elsevier, 2019

Ievgen Redko, Emilie Morvant, Amaury Habrard, Marc Sebban, and Younes Bennani.Advances in domain adaptation theory. Elsevier, 2019

work page 2019

[23] [23]

Springer, 2014

Charles A Rohde et al.Introductory statistical inference with the likelihood function. Springer, 2014

work page 2014

[24] [24]

A theory of learning from different domains.Machine learning, 79(1):151–175, 2010

Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jen- nifer Wortman Vaughan. A theory of learning from different domains.Machine learning, 79(1):151–175, 2010

work page 2010

[25] [25]

On large-batch training for deep learning: Generalization gap and sharp minima

Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On large-batch training for deep learning: Generalization gap and sharp minima. InInternational Conference on Learning Representations, 2017

work page 2017

[26] [26]

Fantastic generalization measures and where to find them

Yiding Jiang, Behnam Neyshabur, Hossein Mobahi, Dilip Krishnan, and Samy Bengio. Fantastic generalization measures and where to find them. InInternational Conference on Learning Representations, 2020

work page 2020

[27] [27]

Avoiding spurious sharpness minimization broadens applicability of sam

Sidak Pal Singh, Hossein Mobahi, Atish Agarwala, and Yann Dauphin. Avoiding spurious sharpness minimization broadens applicability of sam. InInternational Conference on Machine Learning, pages 55702–55719. PMLR, 2025

work page 2025

[28] [28]

Entropy-sgd optimizes the prior of a pac-bayes bound: Generalization properties of entropy-sgd and data-dependent priors

Gintare Karolina Dziugaite and Daniel Roy. Entropy-sgd optimizes the prior of a pac-bayes bound: Generalization properties of entropy-sgd and data-dependent priors. InInternational Conference on Machine Learning, pages 1377–1386. PMLR, 2018

work page 2018

[29] [29]

Estimation of choice-based models using sales data from a single firm.Manufacturing & Service Operations Management, 16(2):184–197, 2014

Jeffrey P Newman, Mark E Ferguson, Laurie A Garrow, and Timothy L Jacobs. Estimation of choice-based models using sales data from a single firm.Manufacturing & Service Operations Management, 16(2):184–197, 2014

work page 2014

[30] [30]

The acceptance of modal innovation: The case of swissmetro.Swiss Transport Research Conference, 2001

Michel Bierlaire, Kay W Axhausen, and Georg Abay. The acceptance of modal innovation: The case of swissmetro.Swiss Transport Research Conference, 2001

work page 2001

[31] [31]

A random-coefficients logit brand-choice model applied to panel data.Journal of Business & Economic Statistics, 12(3):317– 328, 1994

Dipak C Jain, Naufel J Vilcassim, and Pradeep K Chintagunta. A random-coefficients logit brand-choice model applied to panel data.Journal of Business & Economic Statistics, 12(3):317– 328, 1994

work page 1994

[32] [32]

Microeconomics by Robert S

Ante Babi´c. Microeconomics by Robert S. Pindyck and Daniel L. Rubinfeld.Financial theory and practice, 29(4):385–386, 2005

work page 2005

[33] [33]

Elsevier, 1998

N Gregory Mankiw.Principles of microeconomics, volume 1. Elsevier, 1998. 11

work page 1998

[34] [34]

Customized regression model for airbnb dynamic pricing

Peng Ye, Julian Qian, Jieying Chen, Chen-hung Wu, Yitong Zhou, Spencer De Mars, Frank Yang, and Li Zhang. Customized regression model for airbnb dynamic pricing. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 932–940, 2018

work page 2018

[35] [35]

Pricing frictions and platform remedies: The case of airbnb.Marketing Science, 41(6):1085–1108, 2022

Davide Proserpio, Meng Xu, and Georgios Zervas. Pricing frictions and platform remedies: The case of airbnb.Marketing Science, 41(6):1085–1108, 2022

work page 2022

[36] [36]

https://www.ftc.gov/news-events/news/ press-releases/2025/01/ftc-surveillance-pricing-study-indicates-wide- range-personal-data-used-set-individualized-consumer, 2025

Federal Trade Commission Ftc surveillance pricing study indicates wide range of personal data used to set individualized consumer prices. https://www.ftc.gov/news-events/news/ press-releases/2025/01/ftc-surveillance-pricing-study-indicates-wide- range-personal-data-used-set-individualized-consumer, 2025

work page 2025

[37] [37]

P. Langley. Crafting papers on machine learning. In Pat Langley, editor,Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pages 1207–1216, Stanford, CA, 2000

work page 2000

[38] [38]

Parkin, M

M. Parkin, M. Powell, and K. Matthews.Economics. Addison-Wesley, 2008

work page 2008

[39] [39]

Stochastic approximation with two time scales.Systems & Control Letters, 29(5):291–294, 1997

Vivek S Borkar. Stochastic approximation with two time scales.Systems & Control Letters, 29(5):291–294, 1997

work page 1997

[40] [40]

An introduction to bilevel optimization: Foundations and applications in signal processing and machine learning.IEEE Signal Process

Yihua Zhang, Prashant Khanduri, Ioannis C Tsaknakis, Yuguang Yao, Mingyi Hong, and Sijia Liu. An introduction to bilevel optimization: Foundations and applications in signal processing and machine learning.IEEE Signal Process. Mag., 2024

work page 2024

[41] [41]

Justifying recommendations using distantly- labeled reviews and fine-grained aspects

Jianmo Ni, Jiacheng Li, and Julian McAuley. Justifying recommendations using distantly- labeled reviews and fine-grained aspects. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Lan- guage Processing and the 9th International Joint Conference on Natural Language Processin...

work page 2019

[42] [42]

The instacart online grocery shopping dataset 2017

Instacart. The instacart online grocery shopping dataset 2017. https://www.instacart.com/ datasets/grocery-shopping-2017, 2017

work page 2017

[43] [43]

Uber and lyft dataset boston, ma

Brllrb. Uber and lyft dataset boston, ma. https://www.kaggle.com/datasets/brllrb/ uber-and-lyft-dataset-boston-ma, 2019

work page 2019

[44] [44]

Tabpfgen– tabular data generation with tabpfn

Junwei Ma, Apoorv Dankar, George Stein, Guangwei Yu, and Anthony Caterini. Tabpfgen– tabular data generation with tabpfn. InNeurIPS, Second Table Representation Learning Workshop, 2023

work page 2023

[45] [45]

Samuelson and William D

Paul A. Samuelson and William D. Nordhaus.Economics. McGraw-Hill, 19th edition, 2009

work page 2009

[46] [46]

All beta_i must be negative

Paul Milgrom.Putting Auction Theory to Work. Cambridge University Press, 2004. 12 A Theoretical Development A.1 Methodologies for Decision Model under Discrete Choice A.1.1 Normalization of the Decision Variables Normalization is important both for the optimization procedure and for obtaining meaningful insights into the resulting optimal decisions, and i...

work page 2004

[47] [47]

+Kcolumns. We generate prices p for the ’what-if’ data by sampling from a normal distribution with mean and standard deviation equal to 1.0, and clip the sampled values to lie within the interval [0,2] . This design restricts prices to a narrow range around the mean, reflecting real-world pricing distributions 19 Table 8: ICL-OFF ablation results. PDR/PIR...

work page