pith. sign in

arxiv: 2604.19008 · v1 · submitted 2026-04-21 · 🧮 math.OC

Optimal Online and Offline Algorithms for Contextual MNL with Applications to Assortment and Pricing

Pith reviewed 2026-05-10 02:30 UTC · model grok-4.3

classification 🧮 math.OC
keywords assortmentpricingjointofflineonlineoptimaloptimizationsetting
0
0 comments X

The pith

New algorithms for joint contextual MNL assortment and pricing deliver improved online regret bounds of order W sqrt(d T log N)/L0 and local suboptimality guarantees offline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Retailers must choose which products to display and what prices to set, but customer choices depend on both product features and the prices themselves. The work models this with a contextual multinomial logit where features incorporate prices. It first builds a confidence region around the estimated demand parameters that accounts for price dependence. Using this region, the offline algorithm selects an assortment-price pair that is pessimistic about the uncertainty, giving a suboptimality bound based on local information near the optimum rather than requiring exact coverage. For the online setting, a SupCB-type algorithm learns over time and achieves a regret that scales with the square root of time, dimension, and log of the number of products, divided by a model parameter L0. A simpler Thompson sampling variant is also given. When the problem reduces to pure assortment or pure pricing, the bounds match the best known rates from those separate literatures.

Core claim

Our SupCB-type algorithm improves the best previously known regret bound to O~(W sqrt(d T log N)/L0), and when specialized to assortment-only or pricing-only it recovers the near-minimax-optimal rates.

Load-bearing premise

The new confidence region for demand estimation under price-dependent features is valid and the model parameters satisfy the boundedness conditions (including L0) needed for the regret and suboptimality analyses to hold.

read the original abstract

Selecting which products to display and at what prices is a central decision in retail and e-commerce operations. In many applications, these two choices must be made jointly under limited display capacity and uncertain customer demand. In this paper, we study the joint assortment and pricing problem under a price-based contextual multinomial logit model, where customer preferences depend on both product features and selling prices. Our analysis begins with the construction of a new confidence region for demand estimation under price-dependent features. Building on this result, we develop a pessimistic offline algorithm and SupCB-type online algorithms for joint assortment and pricing optimization. In the offline setting, we establish a suboptimality guarantee governed by local information around the optimal assortment-price pair, rather than by exact coverage of the optimal action. In the online setting, our SupCB-type algorithm improves the best previously known regret bound to $\widetilde{O}(W\sqrt{dT\log N}/L_0)$, and we also provide a computationally simpler Thompson-sampling alternative. When specialized to the assortment-only or pricing-only setting, our bounds recover the near-minimax-optimal rates established in those respective domains, thereby bridging the gap between the mature study on assortment optimization or dynamic pricing and the limited literature on their joint optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper addresses the joint assortment and pricing problem under a price-based contextual multinomial logit model. It introduces a new confidence region for estimating demand parameters when features depend on prices. This enables a pessimistic offline algorithm with a suboptimality guarantee based on local information around the optimum, and SupCB-type online algorithms with regret bound ~O(W sqrt(d T log N)/L0), as well as a Thompson sampling alternative. Specializations to assortment-only and pricing-only settings recover near-minimax optimal rates.

Significance. Assuming the new confidence region is valid and the boundedness conditions hold, the results provide an improved regret bound for the joint problem compared to prior work and unify it with the more developed separate literatures on assortment optimization and dynamic pricing. The local suboptimality approach in the offline case is innovative, and recovering optimal rates in special cases strengthens the contribution. The dual provision of online and offline methods, including a computationally simpler Thompson sampling option, enhances the practical relevance. The work is significant for operations research and revenue management applications in retail.

major comments (3)
  1. [Abstract and confidence region construction] The validity of the novel confidence region for demand estimation under price-dependent features is central to all main results. The construction must rigorously handle the dependence between prices and features in the MNL probabilities to ensure correct coverage and the claimed tightness; without detailed proof steps, it is unclear whether the radius supports the regret bound ~O(W sqrt(d T log N)/L0) and the local suboptimality guarantee.
  2. [Online regret analysis] The SupCB-type algorithm improves the regret to ~O(W sqrt(d T log N)/L0). However, this bound relies on the parameter L0 (a lower bound presumably on the minimum eigenvalue or sensitivity); the paper needs to explicitly state the assumptions ensuring L0 is bounded away from zero and how it relates to the model parameters, as the bound becomes vacuous otherwise.
  3. [Offline algorithm] The pessimistic offline algorithm's suboptimality guarantee is based on local information around the optimal assortment-price pair rather than exact coverage. The derivation from the confidence region to this local guarantee should be detailed to confirm it is load-bearing and not relying on post-hoc assumptions.
minor comments (3)
  1. [Notation] The parameters W, d, L0, N should be clearly defined at the beginning, including their interpretations and any boundedness assumptions.
  2. [Abstract] The abstract mentions recovering near-minimax-optimal rates but does not cite the specific prior works establishing those rates for comparison.
  3. [Thompson sampling] Details on the Thompson sampling alternative, such as its regret bound or computational advantages, should be expanded beyond the abstract mention.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the positive evaluation of our contribution and for the constructive major comments. We address each point below with clarifications and commit to revisions that strengthen the exposition without altering the core results.

read point-by-point responses
  1. Referee: The validity of the novel confidence region for demand estimation under price-dependent features is central to all main results. The construction must rigorously handle the dependence between prices and features in the MNL probabilities to ensure correct coverage and the claimed tightness; without detailed proof steps, it is unclear whether the radius supports the regret bound ~O(W sqrt(d T log N)/L0) and the local suboptimality guarantee.

    Authors: We appreciate this observation. The confidence region is derived via a self-normalized martingale bound that explicitly accounts for the price-dependent feature vectors inside the MNL probabilities; the radius is obtained by solving a fixed-point equation that incorporates the worst-case Lipschitz constant of the logit link. The full proof appears in Appendix A. To address the concern, we will insert a concise proof sketch (including the key martingale step and the resulting radius expression) into Section 3.1 of the main text, confirming that the radius is tight enough to support both the online regret and the local offline guarantee. revision: yes

  2. Referee: The SupCB-type algorithm improves the regret to ~O(W sqrt(d T log N)/L0). However, this bound relies on the parameter L0 (a lower bound presumably on the minimum eigenvalue or sensitivity); the paper needs to explicitly state the assumptions ensuring L0 is bounded away from zero and how it relates to the model parameters, as the bound becomes vacuous otherwise.

    Authors: We agree that the dependence on L0 must be stated explicitly. L0 is defined as the smallest eigenvalue of the expected outer-product matrix of the (price-augmented) feature vectors under the optimal policy; it is bounded away from zero under the standard assumptions that feature vectors are bounded, the price grid is finite, and the optimal policy places positive mass on a set of actions whose features span R^d. We will add a new paragraph in Section 2.2 that lists these regularity conditions, relates L0 to the minimal eigenvalue of the design matrix, and notes that the bound is vacuous only when the problem is information-theoretically degenerate. A short discussion of how L0 can be estimated from data will also be included. revision: yes

  3. Referee: The pessimistic offline algorithm's suboptimality guarantee is based on local information around the optimal assortment-price pair rather than exact coverage. The derivation from the confidence region to this local guarantee should be detailed to confirm it is load-bearing and not relying on post-hoc assumptions.

    Authors: The local guarantee follows directly from the fact that the pessimistic objective is a uniform upper bound on the true revenue function inside a neighborhood whose radius is controlled by the confidence-region width; inside this neighborhood the revenue function satisfies a quadratic growth condition (by twice-continuous differentiability of the MNL revenue and positive definiteness of the Hessian at the optimum). The derivation therefore uses only the coverage property of the confidence region and the local curvature assumption already stated in Section 4. We will expand the proof of Theorem 4.2 with an explicit two-page sketch that isolates each step, making clear that no additional post-hoc assumptions are introduced. revision: yes

Circularity Check

0 steps flagged

No circularity: regret bound and suboptimality guarantees depend on exogenous parameters

full rationale

The paper's central derivation begins with an explicit construction of a new confidence region for demand parameters under price-dependent features, then applies this region to obtain a pessimistic offline suboptimality bound (governed by local information around the optimum) and an online SupCB regret of O~(W sqrt(d T log N)/L0). Both the regret expression and the recovery of near-minimax rates in assortment-only or pricing-only special cases are stated directly in terms of the exogenous model parameters W, d, L0 and the validity of the constructed region; none of these quantities are fitted inside the derivation or renamed as predictions. No self-citation is load-bearing for the uniqueness or validity of the confidence region, and no step reduces by construction to its own inputs. The analysis is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claims rest on the validity of a newly constructed confidence region for price-dependent features and on standard boundedness assumptions for the contextual MNL parameters.

free parameters (2)
  • L0
    Appears as a denominator in the regret bound; likely a lower bound on a model parameter such as minimum probability or Lipschitz constant.
  • W
    Width parameter appearing in the regret expression; its origin is not detailed in the abstract.
axioms (1)
  • domain assumption Customer preferences follow a price-based contextual multinomial logit model.
    Foundation for the demand estimation and optimization formulations.

pith-pipeline@v0.9.0 · 5530 in / 1305 out tokens · 34538 ms · 2026-05-10T02:30:18.837417+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

123 extracted references · 123 canonical work pages

  1. [1]

    arXiv preprint arXiv:2012.06961 , year=

    Online stochastic optimization with wasserstein based non-stationarity , author=. arXiv preprint arXiv:2012.06961 , year=

  2. [2]

    Operations Research Letters , volume=

    Capacitated assortment and price optimization under the multinomial logit model , author=. Operations Research Letters , volume=. 2012 , publisher=

  3. [3]

    Manufacturing & Service Operations Management , volume=

    Dynamic joint assortment and pricing optimization with demand learning , author=. Manufacturing & Service Operations Management , volume=. 2021 , publisher=

  4. [4]

    Sequential batch learning in finite-action linear contextual bandits,

    Sequential batch learning in finite-action linear contextual bandits , author=. arXiv preprint arXiv:2004.06321 , year=

  5. [5]

    2023 , eprint=

    Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs , author=. 2023 , eprint=

  6. [6]

    44th Annual IEEE Symposium on Foundations of Computer Science, 2003

    The value of knowing a demand curve: Bounds on regret for online posted-price auctions , author=. 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings. , pages=. 2003 , organization=

  7. [7]

    Operations Research , volume=

    Dynamic pricing under a general parametric choice model , author=. Operations Research , volume=. 2012 , publisher=

  8. [8]

    Advances in Neural Information Processing Systems , volume=

    Logarithmic regret in feature-based dynamic pricing , author=. Advances in Neural Information Processing Systems , volume=

  9. [9]

    Journal of Machine Learning Research , volume=

    Perishability of data: Dynamic pricing under varying-coefficient models , author=. Journal of Machine Learning Research , volume=

  10. [10]

    Journal of Machine Learning Research , volume=

    Multi-scale online learning: Theory and applications to online auctions and pricing , author=. Journal of Machine Learning Research , volume=

  11. [11]

    Manufacturing & Service Operations Management , volume=

    Product line selection and pricing with modularity in design , author=. Manufacturing & Service Operations Management , volume=. 2005 , publisher=

  12. [12]

    Manufacturing & Service Operations Management , volume =

    Li, Hongmin and Huh, Woonghee Tim , title =. Manufacturing & Service Operations Management , volume =

  13. [13]

    2021 , publisher=

    Demand management and inventory control for substitutable products , author=. 2021 , publisher=

  14. [14]

    Production and Operations Management , volume =

    Besbes, Omar and Sauré, Denis , title =. Production and Operations Management , volume =

  15. [15]

    arXiv preprint arXiv:2504.02324 , year=

    Dynamic Assortment Selection and Pricing with Censored Preference Feedback , author=. arXiv preprint arXiv:2504.02324 , year=

  16. [16]

    Manufacturing & Service Operations Management , volume=

    Pricing multiple products with the multinomial logit and nested logit models: Concavity and implications , author=. Manufacturing & Service Operations Management , volume=. 2011 , publisher=

  17. [17]

    Production and Operations Management , volume=

    Product assortment and price competition under multinomial logit demand , author=. Production and Operations Management , volume=. 2016 , publisher=

  18. [18]

    Management Science , volume=

    A nonparametric joint assortment and price choice model , author=. Management Science , volume=. 2017 , publisher=

  19. [19]

    Proceedings of the 41st International Conference on Machine Learning , series =

    Xu, Jianyu and Wang, Yu-Xiang , title =. Proceedings of the 41st International Conference on Machine Learning , series =. 2024 , publisher =

  20. [20]

    Management Science , volume=

    The impact of consumer search cost on assortment planning and pricing , author=. Management Science , volume=. 2018 , publisher=

  21. [21]

    Improved Confidence Regions and Optimal Algorithms for Online and Offline Linear

    Han, Yuxuan and Blanchet, Jose and Zhou, Zhengyuan , booktitle=. Improved Confidence Regions and Optimal Algorithms for Online and Offline Linear

  22. [22]

    Operations Research , volume=

    Assortment optimization and pricing under the multinomial logit model with impatient customers: Sequential recommendation and selection , author=. Operations Research , volume=. 2021 , publisher=

  23. [23]

    Operations Research , volume=

    Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms , author=. Operations Research , volume=. 2009 , publisher=

  24. [24]

    2020 IEEE International Symposium on Information Theory (ISIT) , pages=

    Multi-product dynamic pricing in high-dimensions with heterogeneous price sensitivity , author=. 2020 IEEE International Symposium on Information Theory (ISIT) , pages=. 2020 , organization=

  25. [25]

    A unified confidence sequence for generalized linear models, with applications to bandits

    A unified confidence sequence for generalized linear models, with applications to bandits , author=. arXiv preprint arXiv:2407.13977 , year=

  26. [26]

    2012 , issn =

    Capacitated assortment and price optimization under the multinomial logit model , journal =. 2012 , issn =. doi:https://doi.org/10.1016/j.orl.2012.08.003 , author =

  27. [27]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Thompson Sampling for Multinomial Logit Contextual Bandits , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  28. [28]

    arXiv preprint arXiv:2503.11819 , year=

    Online Assortment and Price Optimization Under Contextual Choice Models , author=. arXiv preprint arXiv:2503.11819 , year=

  29. [29]

    The 22nd International Conference on Artificial Intelligence and Statistics , pages=

    Active ranking with subset-wise preferences , author=. The 22nd International Conference on Artificial Intelligence and Statistics , pages=. 2019 , organization=

  30. [30]

    Advances in Neural Information Processing Systems , volume=

    Eluder dimension and the sample complexity of optimistic exploration , author=. Advances in Neural Information Processing Systems , volume=

  31. [31]

    International Conference on Algorithmic Learning Theory , pages=

    Efficient local planning with linear function approximation , author=. International Conference on Algorithmic Learning Theory , pages=. 2022 , organization=

  32. [32]

    Management Science , volume=

    Revenue management under a general discrete choice model of consumer behavior , author=. Management Science , volume=. 2004 , publisher=

  33. [33]

    Lee and M.-h

    Improved Online Confidence Bounds for Multinomial Logistic Bandits , author=. arXiv preprint arXiv:2502.10020 , year=

  34. [34]

    Advances in Neural Information Processing Systems , volume=

    Online (multinomial) logistic bandit: Improved regret and constant computation cost , author=. Advances in Neural Information Processing Systems , volume=

  35. [35]

    Advances in Neural Information Processing Systems , volume=

    Double pessimism is provably efficient for distributionally robust offline reinforcement learning: Generic algorithm and robust partial coverage , author=. Advances in Neural Information Processing Systems , volume=

  36. [36]

    Advances in neural information processing systems , volume=

    Improved algorithms for linear stochastic bandits , author=. Advances in neural information processing systems , volume=

  37. [37]

    arXiv preprint arXiv:2502.06777

    Learning an Optimal Assortment Policy under Observational Data , author=. arXiv preprint arXiv:2502.06777 , year=

  38. [38]

    Foundations and Trends

    Modeling, Prediction, Assortment and Price Optimization Under Consumer Choice Behavior , author=. Foundations and Trends. 2025 , publisher=

  39. [39]

    Foundations and Trends

    Introduction to multi-armed bandits , author=. Foundations and Trends. 2019 , publisher=

  40. [40]

    2020 , publisher=

    Bandit algorithms , author=. 2020 , publisher=

  41. [41]

    , author=

    Stochastic Linear Optimization under Bandit Feedback. , author=. COLT , volume=

  42. [42]

    2006 , publisher=

    Theory of point estimation , author=. 2006 , publisher=

  43. [43]

    Operations Research , volume=

    MNL-bandit: A dynamic learning approach to assortment selection , author=. Operations Research , volume=. 2019 , publisher=

  44. [44]

    Mathematics of Operations Research , volume=

    Robust Markov decision processes , author=. Mathematics of Operations Research , volume=. 2013 , publisher=

  45. [45]

    Management Science , volume=

    A comparative empirical study of discrete choice models in retail operations , author=. Management Science , volume=. 2022 , publisher=

  46. [46]

    Manufacturing & Service Operations Management , volume=

    Optimal dynamic assortment planning with demand learning , author=. Manufacturing & Service Operations Management , volume=. 2013 , publisher=

  47. [47]

    Conference on learning theory , pages=

    Thompson sampling for the mnl-bandit , author=. Conference on learning theory , pages=. 2017 , organization=

  48. [48]

    Operations Research , year=

    Robust assortment optimization under the markov chain choice model , author=. Operations Research , year=

  49. [49]

    Near-optimal Regret Bounds for Reinforcement Learning , author=. J. Mach. Learn. Res. , year=

  50. [50]

    Operations Research , year=

    Randomized assortment optimization , author=. Operations Research , year=

  51. [51]

    Advances in Neural Information Processing Systems , volume=

    Simple and fast algorithm for binary integer and online linear programming , author=. Advances in Neural Information Processing Systems , volume=

  52. [52]

    Management science , volume=

    Dynamic assortment with demand learning for seasonal consumer goods , author=. Management science , volume=. 2007 , publisher=

  53. [53]

    Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics , pages=

    Contextual bandits with linear payoff functions , author=. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics , pages=. 2011 , organization=

  54. [54]

    International Conference on Machine Learning , pages=

    Provably optimal algorithms for generalized linear contextual bandits , author=. International Conference on Machine Learning , pages=. 2017 , organization=

  55. [55]

    Decoupling Learning and Decision-Making: Breaking the O(

    Gao, Wenzhi and Sun, Chunlin and Xue, Chenyu and Ge, Dongdong and Ye, Yinyu , journal=. Decoupling Learning and Decision-Making: Breaking the O(

  56. [56]

    Canadian Journal of Mathematics , volume=

    The equivalence of two extremum problems , author=. Canadian Journal of Mathematics , volume=. 1960 , publisher=

  57. [57]

    Mathematics of Operations Research , volume=

    Delay-adaptive learning in generalized linear contextual bandits , author=. Mathematics of Operations Research , volume=. 2024 , publisher=

  58. [58]

    Theoretical Computer Science , volume=

    Parameter identification in Markov chain choice models , author=. Theoretical Computer Science , volume=. 2020 , publisher=

  59. [59]

    Advances in Neural Information Processing Systems , volume=

    Dynamic pricing and assortment under a contextual MNL demand , author=. Advances in Neural Information Processing Systems , volume=

  60. [60]

    Operations Research Letters , volume=

    A note on a tight lower bound for capacitated MNL-bandit assortment selection models , author=. Operations Research Letters , volume=. 2018 , publisher=

  61. [61]

    Advances in Neural Information Processing Systems , volume=

    Thompson sampling for multinomial logit contextual bandits , author=. Advances in Neural Information Processing Systems , volume=

  62. [62]

    2007 , publisher=

    UCI machine learning repository , author=. 2007 , publisher=

  63. [63]

    1959 , publisher=

    Individual choice behavior , author=. 1959 , publisher=

  64. [64]

    2014 , publisher=

    Multinomial probit: the theory and its application to demand forecasting , author=. 2014 , publisher=

  65. [65]

    Operations Research , volume=

    The exponomial choice model: A new alternative for assortment and price optimization , author=. Operations Research , volume=. 2016 , publisher=

  66. [66]

    Journal of applied Econometrics , volume=

    Mixed MNL models for discrete response , author=. Journal of applied Econometrics , volume=. 2000 , publisher=

  67. [67]

    Operations Research , volume=

    Assortment optimization under variants of the nested logit model , author=. Operations Research , volume=. 2014 , publisher=

  68. [68]

    2009 , publisher=

    Discrete choice methods with simulation , author=. 2009 , publisher=

  69. [69]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Multinomial logit contextual bandits: Provable optimality and practicality , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  70. [70]

    Operations Research , volume=

    An expectation-maximization algorithm to estimate the parameters of the Markov chain choice model , author=. Operations Research , volume=. 2018 , publisher=

  71. [71]

    International Conference on Artificial Intelligence and Statistics , pages=

    Instance-wise minimax-optimal algorithms for logistic bandits , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

  72. [72]

    International Conference on Machine Learning , pages=

    Improved optimistic algorithms for logistic bandits , author=. International Conference on Machine Learning , pages=. 2020 , organization=

  73. [73]

    Work in Progress , year=

    Assortment planning under the multinomial logit model with totally unimodular constraint structures , author=. Work in Progress , year=

  74. [74]

    Advances in Neural Information Processing Systems , volume=

    Bridging offline reinforcement learning and imitation learning: A tale of pessimism , author=. Advances in Neural Information Processing Systems , volume=

  75. [75]

    Advances in Neural Information Processing Systems , volume=

    Pessimism for Offline Linear Contextual Bandits using ell\_p Confidence Sets , author=. Advances in Neural Information Processing Systems , volume=

  76. [76]

    Operations Research Letters , volume=

    On the tightness of an LP relaxation for rational optimization and its applications , author=. Operations Research Letters , volume=. 2016 , publisher=

  77. [77]

    Operations research , volume=

    Dynamic assortment optimization with a multinomial logit choice model and capacity constraint , author=. Operations research , volume=. 2010 , publisher=

  78. [78]

    Mathematics of Operations Research , volume=

    Optimal policy for dynamic assortment planning under multinomial logit models , author=. Mathematics of Operations Research , volume=. 2021 , publisher=

  79. [79]

    Production and Operations Management , volume=

    Dynamic assortment planning under nested logit models , author=. Production and Operations Management , volume=. 2021 , publisher=

  80. [80]

    Journal of machine learning research , volume=

    Dynamic assortment optimization with changing contextual information , author=. Journal of machine learning research , volume=

Showing first 80 references.