Embedding Foundation Model Predictions in Discrete-Choice Models with Structural Guarantees
Pith reviewed 2026-06-26 01:16 UTC · model grok-4.3
The pith
A two-stage adapter embeds foundation model predictions inside a multinomial logit while exactly preserving its marginal rate of substitution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The composition of a multinomial logit utility with a neural correction term applied to foundation model predicted probabilities exactly preserves the multinomial logit's marginal rate of substitution.
What carries the argument
Two-stage adapter that fits and freezes multinomial logit structural coefficients before adding a neural correction operating on foundation model predictions.
If this is right
- Test accuracy rises by 6.4 percentage points on average over the plain multinomial logit and by as much as 12.8 points.
- Cost monotonicity holds in 100 percent of cases.
- Derived values of time on transportation data fall inside the range reported in published economics studies.
- Accuracy gains remain at least 6 points even when the foundation model is restricted to 10 percent of its original context.
Where Pith is reading between the lines
- The same two-stage structure could be applied to other discrete choice specifications that rely on preserved substitution patterns.
- Larger foundation models could be swapped in without retraining the structural coefficients, potentially increasing accuracy further.
- The approach might transfer to non-transport domains where choice models must respect cost or price monotonicity.
Load-bearing premise
The neural correction is added to the utility in a form that leaves the partial derivatives with respect to observed attributes unchanged.
What would settle it
Direct calculation of the marginal rate of substitution on the fitted model before versus after the neural correction term is included, showing any nonzero difference.
Figures
read the original abstract
Tabular foundation models achieve strong accuracy on choice prediction tasks, but their predictions often violate the economic logic those tasks require: raising a price can increase predicted demand, implied willingness-to-pay estimates are frequently negative or implausible, and unavailable alternatives receive nonzero probability. We propose a two-stage adapter that takes a foundation model's predicted choice probabilities as a precomputed feature and embeds them inside a multinomial logit's utility. In Stage 1, we fit the multinomial logit's structural coefficients by maximum likelihood with sign constraints; in Stage 2, we freeze those coefficients and fit a small neural correction operating on the foundation model's predictions. We prove that this composition exactly preserves the multinomial logit's marginal rate of substitution, so analytically computable value-of-time becomes a mathematical guarantee rather than an empirical accident. Across three datasets and two foundation models, the adapter gains 6.4 percentage points (pp) of test accuracy on average over the multinomial logit and up to 12.8 pp, maintains 100% cost monotonicity, and produces values of time within the published transportation-economics range on the transportation datasets. Performance degrades gracefully under foundation-model context restriction, retaining at least 6 pp of accuracy gain even at 10% of the original foundation-model context.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a two-stage adapter embedding tabular foundation model choice probabilities as fixed features into a multinomial logit (MNL) utility function. Stage 1 fits sign-constrained MNL structural coefficients by MLE; Stage 2 freezes those coefficients and fits a small neural correction on the foundation-model outputs. The central claim is a mathematical proof that the composition exactly preserves the MNL marginal rate of substitution (hence cost monotonicity and analytically valid value-of-time), while delivering 6.4 pp average test-accuracy gains (up to 12.8 pp) over plain MNL, 100 % cost monotonicity, and value-of-time estimates inside published transportation ranges across three datasets and two foundation models.
Significance. If the structural preservation result holds, the work supplies a practical route to combine the predictive strength of foundation models with the economic interpretability and theoretical guarantees required for policy use in discrete-choice settings. The explicit proof (rather than post-hoc empirical checks) and the reported graceful degradation under context restriction are concrete strengths that differentiate the contribution from purely data-driven hybrids.
minor comments (3)
- [Abstract] Abstract: the three datasets and two foundation models are not named; adding their identities (or at least a one-sentence description) would improve immediate readability without lengthening the abstract.
- [Proof section] Proof of MRS preservation: while the architecture description (frozen structural coefficients, correction operating only on precomputed FM features) is internally consistent with independence from the structural attributes, an explicit statement or short derivation showing that the neural term has zero partial derivative w.r.t. those attributes would make the guarantee easier to verify at a glance.
- [Empirical results] Results tables: the 6.4 pp average and 12.8 pp maximum accuracy gains are reported; including the per-dataset, per-model breakdown (with standard errors) would allow readers to assess whether the gains are driven by particular combinations.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of the manuscript, the recognition of the structural preservation proof and empirical results as differentiating strengths, and the recommendation for minor revision. We are pleased that the work is viewed as supplying a practical route to combine foundation-model predictive power with economic interpretability.
Circularity Check
No significant circularity identified
full rationale
The paper's core claim is a mathematical proof that the two-stage adapter (sign-constrained MLE on structural MNL coefficients in Stage 1, then frozen coefficients with neural correction on fixed precomputed FM probabilities in Stage 2) exactly preserves marginal rates of substitution. This follows directly from the architecture: the correction term is independent of the structural attributes, so partial derivatives of total utility w.r.t. those attributes equal the MNL coefficients alone. No equations reduce to fitted inputs by construction, no self-citation chains are load-bearing for the proof, and the guarantee is asserted as a property of the composition rather than an empirical outcome. The derivation is self-contained against the stated model structure.
Axiom & Free-Parameter Ledger
free parameters (2)
- MNL structural coefficients
- Neural correction network parameters
axioms (2)
- domain assumption Choice probabilities follow the multinomial logit form with linear utility
- domain assumption Sign constraints on price and other coefficients are appropriate and sufficient
Reference graph
Works this paper leans on
-
[1]
ICML 2026 Workshop on Foundation Models for Structured Data (FMSD) , year =
Wang, Yingshuo and Sun, Xian and Li, Yanhang and Fan, Zhichao and Zhuang, Zexin , title =. ICML 2026 Workshop on Foundation Models for Structured Data (FMSD) , year =
2026
-
[2]
, title =
Ben-Akiva, Moshe and Lerman, Steven R. , title =
-
[3]
, title =
Train, Kenneth E. , title =
-
[4]
and Abay, Georg , title =
Bierlaire, Michel and Axhausen, Kay W. and Abay, Georg , title =. Proceedings of the 1st
-
[5]
Hillel, Tim and Elshafie, Mohammed Z. E. B. and Jin, Ying , title =. Proceedings of the Institution of Civil Engineers --- Smart Infrastructure and Construction , volume =
-
[6]
Hillel, Tim and Bierlaire, Michel and Elshafie, Mohammed Z. E. B. and Jin, Ying , title =. Journal of Choice Modelling , volume =
-
[7]
Journal of Choice Modelling , volume =
van Cranenburgh, Sander and Wang, Sheng and Vij, Akshay and Pereira, Francisco and Walker, Joan , title =. Journal of Choice Modelling , volume =
-
[8]
Travel Behaviour and Society , volume =
Zhao, Xilei and Yan, Xiang and Yu, Alan and Van Hentenryck, Pascal , title =. Travel Behaviour and Society , volume =
-
[9]
Transportation Research Part B , volume =
Han, Yafei and Calara Oereuran, Federico and Ben-Akiva, Moshe and Zegras, Christopher , title =. Transportation Research Part B , volume =
-
[10]
Transportation Research Part C , volume =
Wang, Shenhao and Mo, Baichuan and Zhao, Jinhua , title =. Transportation Research Part C , volume =
-
[11]
International Conference on Learning Representations , year =
Hollmann, Noah and M. International Conference on Learning Representations , year =
-
[12]
Accurate predictions on small data with a tabular foundation model , journal =
Hollmann, Noah and M. Accurate predictions on small data with a tabular foundation model , journal =
-
[13]
Maddix Robinson, Danielle and Yin, Junming and Erickson, Nick and Ansari, Abdul Fatir and Han, Boran and Zhang, Shuai and Akoglu, Leman and Faloutsos, Christos and Mahoney, Michael W. and Wilson, Andrew Gordon and Wang, Hao and Wang, Yuyang and Wang, Bernie and Zhang, Xiyuan , title =. arXiv preprint arXiv:2510.21204 , year =
-
[14]
and Golestan, Keyvan and Yu, Guangwei and Caterini, Anthony L
Ma, Junwei and Thomas, Valentin and Hosseinzadeh, Rasa and Labach, Alex and Kamkari, Hamidreza and Cresswell, Jesse C. and Golestan, Keyvan and Yu, Guangwei and Caterini, Anthony L. and Volkovs, Maksims , title =. Advances in Neural Information Processing Systems , year =
-
[15]
Advances in Neural Information Processing Systems , year =
Ye, Han-Jia and Liu, Si-Yang and Chao, Wei-Lun , title =. Advances in Neural Information Processing Systems , year =
-
[16]
, title =
Guo, Chuan and Pleiss, Geoff and Sun, Yu and Weinberger, Kilian Q. , title =. International Conference on Machine Learning , year =
-
[17]
arXiv preprint arXiv:1503.02531 , year =
Hinton, Geoffrey and Vinyals, Oriol and Dean, Jeff , title =. arXiv preprint arXiv:1503.02531 , year =
-
[18]
Advances in Neural Information Processing Systems , year =
Cha, Sungmin and Cho, Kyunghyun , title =. Advances in Neural Information Processing Systems , year =
-
[19]
International Conference on Machine Learning , year =
Sartor, Davide and Sinigaglia, Alberto and Susto, Gian Antonio , title =. International Conference on Machine Learning , year =
-
[20]
Advances in Neural Information Processing Systems , year =
Wang, Hanyang and Branke, Juergen and Poloczek, Matthias , title =. Advances in Neural Information Processing Systems , year =
-
[21]
and Blythe, John M
Johnson, Shane D. and Blythe, John M. and Manning, Matthew and Wong, Gabriel T. W. , title =. PLOS ONE , volume =. 2020 , doi =
2020
-
[22]
Advances in Neural Information Processing Systems , year =
Sill, Joseph , title =. Advances in Neural Information Processing Systems , year =
-
[23]
Advances in Neural Information Processing Systems , year =
Wehenkel, Antoine and Louppe, Gilles , title =. Advances in Neural Information Processing Systems , year =
-
[24]
Psychometrika , volume =
McNemar, Quinn , title =. Psychometrika , volume =
-
[25]
Proceedings of the IEEE International Conference on Computer Vision , year =
He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , title =. Proceedings of the IEEE International Conference on Computer Vision , year =
-
[26]
Rudin, Walter , title =
-
[27]
Mathematics of Control, Signals and Systems , volume =
Cybenko, George , title =. Mathematics of Control, Signals and Systems , volume =. 1989 , publisher =
1989
-
[28]
Neural Networks , volume =
Hornik, Kurt , title =. Neural Networks , volume =
-
[29]
and Pinkus, Allan and Schocken, Shimon , title =
Leshno, Moshe and Lin, Vladimir Ya. and Pinkus, Allan and Schocken, Shimon , title =. Neural Networks , volume =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.