Learning in Position-Aware Multinomial Logit Bandits: From Multiplicative to General Position Effects

Jiameng Lyu; Shibo Dai; Xi Chen; Yuan Zhou

arxiv: 2605.17238 · v1 · pith:V3WSJ5E5new · submitted 2026-05-17 · 💻 cs.LG · stat.ML

Learning in Position-Aware Multinomial Logit Bandits: From Multiplicative to General Position Effects

Xi Chen , Shibo Dai , Jiameng Lyu , Yuan Zhou This is my paper

Pith reviewed 2026-05-20 14:32 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords position-aware MNL banditsassortment optimizationregret boundsmultiplicative position effectsgeneral position effectsonline learningmaximum likelihood estimation

0 comments

The pith

The paper gives the first regret-optimal algorithms for jointly learning assortments and display positions under position-aware multinomial logit choice.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The work addresses dynamic assortment selection where each product's attraction depends on its display position. For the multiplicative position-effects model it introduces a cross-position pairwise maximum-likelihood estimator with clipping and proves that the resulting P2MLE-UCB algorithm achieves Õ(√(NT)) regret, matching the information-theoretic lower bound and removing the extra √K factor left by earlier epoch-based methods. For the fully general position-effects model it supplies both a minimax lower bound and a matching upper bound via the GP2-UCB algorithm. In both cases the algorithms update after every single customer feedback and solve the per-round joint optimization problem via Dinkelbach's fractional-programming method combined with maximum-weight bipartite matching.

Core claim

For the multiplicative model, a cross-position pairwise maximum-likelihood estimator with clipping yields an algorithm whose regret is Õ(√(NT)), matching the lower bound; for the general model, a new algorithm attains the minimax rate characterized by a matching lower bound.

What carries the argument

Cross-position pairwise maximum-likelihood estimator (with clipping) for the multiplicative case; GP2-UCB for the general case; per-round optimization via Dinkelbach's method plus maximum-weight bipartite matching.

If this is right

Real-time platforms can now update assortment and position decisions after every single purchase observation rather than waiting for epochs.
The √K gap that previously separated upper and lower bounds for the multiplicative model is eliminated.
The same algorithmic template yields matching upper and lower bounds once position effects become fully general.
An efficient exact solver for the per-round combinatorial optimization problem is supplied.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The pairwise estimator may extend to other choice models that admit a similar cross-position likelihood structure.
The results suggest that position bias can be learned at essentially the same cost as item-level learning when the model is correctly specified.
The optimization subroutine could be reused inside other revenue-management settings that require fractional programming over assignments.

Load-bearing premise

Customer choices are generated exactly according to the multinomial logit model whose position-dependent attraction parameters are fixed over time.

What would settle it

A sequence of experiments in which the observed cumulative regret grows faster than Õ(√(NT)) under the multiplicative model, or faster than the minimax rate under the general model, would falsify the claimed bounds.

read the original abstract

We study the dynamic joint assortment selection and positioning problem, where the attraction of each product depends on both its intrinsic appeal and its display position under a Multinomial Logit (MNL) choice framework. Our study ranges from the multiplicative position effects model, in which each product's attraction is scaled by a position-specific factor, to a general position effects model assigning independent attraction parameters to every product--position pair to capture heterogeneous synergies. For both models, we design round-based learning algorithms that update decisions after every single feedback, and establish the first regret-optimal characterization. Besides, our round-based algorithms provide the prompt operations needed by modern platforms. For the multiplicative model, we develop a cross-position pairwise maximum likelihood estimator with a clipping mechanism, and prove that our algorithm P2MLE-UCB attains a regret of $\tilde{O}(\sqrt{NT})$, matching the lower bound and closing the $\sqrt{K}$ gap left by prior epoch-based analyses. For the general model, we establish a minimax lower bound and propose GP2-UCB with a matching upper bound. Moreover, we design an efficient subroutine for the per-round joint assortment and positioning optimization based on Dinkelbach's method and maximum-weight bipartite matching. Numerical experiments on synthetic data and the Expedia dataset show that our algorithms consistently outperform state-of-the-art benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper closes the √K regret gap for multiplicative position-aware MNL bandits by switching to round-based updates and a new cross-position pairwise MLE with clipping, while also delivering matching bounds for the general case.

read the letter

The punchline is that this paper closes the √K regret gap for the multiplicative position effects model in MNL bandits by moving to round-based updates and a new cross-position pairwise MLE estimator with clipping, while also giving the first matching bounds for the general position model. What is new is the development of P2MLE-UCB and GP2-UCB algorithms that update after every single feedback rather than in epochs, along with an efficient subroutine for the per-round optimization problem using Dinkelbach's method and maximum-weight bipartite matching. The experiments on synthetic data and Expedia dataset demonstrate better performance than prior benchmarks. The paper does well in addressing a practical need for prompt operations on platforms and in providing both upper and lower bounds that match for these settings. The regret analysis builds on standard stochastic bandit techniques applied to the new estimators. Soft spots are minor here. The central claims rest on the MNL model assumptions holding with fixed parameters, which is standard but could be sensitive to model misspecification in real data. The clipping mechanism helps with estimation but its impact on the exact regret bound might need close checking in the proofs. No major internal inconsistencies show up in the summary. This paper is for researchers in online learning and assortment optimization, particularly those dealing with position effects in recommendations or e-commerce. Readers who want to see theoretical improvements translated into more practical algorithms will get value from it. It deserves a serious referee given the specific gap it closes and the combination of theory and experiments. I recommend sending this to peer review.

Referee Report

0 major / 3 minor

Summary. The manuscript studies the dynamic joint assortment selection and positioning problem under the Multinomial Logit (MNL) choice model with position-dependent product attractions. It considers the multiplicative position effects model, where attractions are scaled by position-specific factors, and the general position effects model with independent attraction parameters for each product-position pair. For the multiplicative case, the authors develop the P2MLE-UCB algorithm using a cross-position pairwise maximum likelihood estimator with a clipping mechanism and prove a regret bound of Õ(√(NT)) that matches the lower bound, closing the √K gap from prior epoch-based work. For the general model, they establish a minimax lower bound and propose GP2-UCB with a matching upper bound. The paper also provides an efficient per-round optimization subroutine based on Dinkelbach's method and maximum-weight bipartite matching, along with numerical experiments on synthetic data and the Expedia dataset.

Significance. If the central regret bounds hold, this work makes a substantial contribution by delivering the first regret-optimal characterization for position-aware MNL bandits in both the multiplicative and general settings. The Õ(√(NT)) bound for the multiplicative model resolves a concrete gap left by earlier analyses, while the matching minimax results for the general model are notable. The round-based design with single-feedback updates and the efficient Dinkelbach+bipartite-matching subroutine are practical strengths for deployment on modern platforms. The empirical validation on real data further supports the claims.

minor comments (3)

[Algorithm and estimator description] The description of the clipping mechanism in the cross-position pairwise MLE (mentioned in the abstract and likely detailed in the algorithm section) would benefit from an explicit statement of how the clipping threshold is chosen and its precise effect on the estimation error term in the regret analysis.
[Numerical experiments] In the numerical experiments section, additional details on how position effects are instantiated or estimated from the Expedia dataset would help readers assess the realism of the simulation setup.
[Lower bound discussion] The paper could add a short remark confirming that the lower-bound construction for the multiplicative model incorporates the joint assortment-positioning decision space, to make the matching claim fully transparent.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our work, as well as for recommending minor revision. We appreciate the recognition that our round-based algorithms close the √K gap in the multiplicative setting and deliver matching minimax bounds in the general position-effects model. We will incorporate all suggested minor improvements in the revised manuscript.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces new estimators (cross-position pairwise MLE with clipping for the multiplicative model) and algorithms (P2MLE-UCB and GP2-UCB) under standard fixed-parameter MNL assumptions. Regret bounds are derived via standard stochastic bandit tools (UCB-style exploration, concentration inequalities, and per-round optimization via Dinkelbach + bipartite matching), with matching lower bounds established separately. These steps do not reduce by construction to fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations; the analysis remains independent of the target regret quantities and is externally falsifiable through the stated model and concentration arguments. No circular patterns from the enumerated list are present.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption that choices follow the multinomial logit model with the stated position effects, plus standard technical assumptions required for regret analysis in stochastic bandits such as bounded parameters and independent observations.

axioms (1)

domain assumption Customer choices are generated according to the multinomial logit model with position-dependent attractions that are fixed but unknown.
Invoked throughout the problem formulation for both multiplicative and general models.

pith-pipeline@v0.9.0 · 5775 in / 1497 out tokens · 50907 ms · 2026-05-20T14:32:18.666537+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

For the multiplicative model, we develop a cross-position pairwise maximum likelihood estimator with a clipping mechanism, and prove that our algorithm P2MLE-UCB attains a regret of Õ(√(NT))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

[1]

4OR 14(1):57--75

Abeliuk A, Berbeglia G, Cebrian M, Van Hentenryck P (2016) Assortment optimization under a multinomial logit model with position bias and social influence. 4OR 14(1):57--75

work page 2016
[2]

Conference on learning theory, 76--78 (PMLR)

Agrawal S, Avadhanula V, Goyal V, Zeevi A (2017) Thompson sampling for the mnl-bandit. Conference on learning theory, 76--78 (PMLR)

work page 2017
[3]

Operations Research 67(5):1453--1485

Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) Mnl-bandit: A dynamic learning approach to assortment selection. Operations Research 67(5):1453--1485

work page 2019
[4]

Management Science 67(6):3519--3550

Aouad A, Segev D (2021) Display optimization for vertically differentiated locations under multinomial logit preferences. Management Science 67(6):3519--3550

work page 2021
[5]

Communications of the ACM 14(12):802--804

Bourgeois F, Lassalle JC (1971) An extension of the munkres algorithm for the assignment problem to rectangular matrices. Communications of the ACM 14(12):802--804

work page 1971
[6]

Management science 53(2):276--292

Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management science 53(2):276--292

work page 2007
[7]

Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, and games (Cambridge university press)

work page 2006
[8]

ACM Transactions on Information Systems 41(3):1--39

Chen J, Dong H, Wang X, Feng F, Wang M, He X (2023 a ) Bias and debias in recommender system: A survey and future directions. ACM Transactions on Information Systems 41(3):1--39

work page 2023
[9]

Available at SSRN 4526247

Chen N, Gao P, Wang C, Wang Y (2023 b ) Assortment optimization for the multinomial logit model with repeated customer interactions. Available at SSRN 4526247

work page 2023
[10]

Production and Operations Management 30(1):85--102

Chen X, Shi C, Wang Y, Zhou Y (2021 a ) Dynamic assortment planning under nested logit models. Production and Operations Management 30(1):85--102

work page 2021
[11]

Operations Research Letters 46(5):534--537

Chen X, Wang Y (2018) A note on a tight lower bound for capacitated mnl-bandit assortment selection models. Operations Research Letters 46(5):534--537

work page 2018
[12]

Mathematics of Operations Research 46(4):1639--1657

Chen X, Wang Y, Zhou Y (2021 b ) Optimal policy for dynamic assortment planning under multinomial logit models. Mathematics of Operations Research 46(4):1639--1657

work page 2021
[13]

The 22nd International Conference on Artificial Intelligence and Statistics, 438--447 (PMLR)

Cheung WC, Tan V, Zhong Z (2019) A thompson sampling algorithm for cascading bandits. The 22nd International Conference on Artificial Intelligence and Statistics, 438--447 (PMLR)

work page 2019
[14]

Proceedings of the 2008 international conference on web search and data mining, 87--94

Craswell N, Zoeter O, Taylor M, Ramsey B (2008) An experimental comparison of click position-bias models. Proceedings of the 2008 international conference on web search and data mining, 87--94

work page 2008
[15]

Management science 13(7):492--498

Dinkelbach W (1967) On nonlinear fractional programming. Management science 13(7):492--498

work page 1967
[16]

arXiv preprint arXiv:2510.01693

Dong J, Mo W, Qi Z, Shi C, Fang EX, Tarokh V (2025) Pasta: A unified framework for offline assortment learning. arXiv preprint arXiv:2510.01693

work page arXiv 2025
[17]

the Annals of Probability 100--118

Freedman DA (1975) On tail probabilities for martingales. the Annals of Probability 100--118

work page 1975
[18]

Operations Research 68(1):134--160

Gallego G, Li A, Truong VA, Wang X (2020) Approximation algorithms for product framing and pricing. Operations Research 68(1):134--160

work page 2020
[19]

Operations research 69(5):1509--1532

Gao P, Ma Y, Chen N, Gallego G, Li A, Rusmevichientong P, Topaloglu H (2021) Assortment optimization and pricing under the multinomial logit model with impatient customers: Sequential recommendation and selection. Operations research 69(5):1509--1532

work page 2021
[20]

The Thirty-ninth Annual Conference on Neural Information Processing Systems

Han Y, Blanchet J, Zhou Z (2025 a ) Improved confidence regions and optimal algorithms for online and offline linear mnl bandits. The Thirty-ninth Annual Conference on Neural Information Processing Systems

work page 2025
[21]

arXiv preprint arXiv:2502.06777

Han Y, Zhong H, Lu M, Blanchet J, Zhou Z (2025 b ) Learning an optimal assortment policy under observational data. arXiv preprint arXiv:2502.06777

work page arXiv 2025
[22]

Manufacturing & Service Operations Management 26(1):215--232

Jasin S, Lyu C, Najafi S, Zhang H (2024) Assortment optimization with multi-item basket purchase under multivariate mnl model. Manufacturing & Service Operations Management 26(1):215--232

work page 2024
[23]

Production and Operations Management 35(3):1133--1150

Jiang B, Wang Z, Xue C, Zhang N (2026) Assortment optimization in the presence of focal effect: Operational insights and efficient algorithms. Production and Operations Management 35(3):1133--1150

work page 2026
[24]

Proceedings of the tenth ACM international conference on web search and data mining, 781--789

Joachims T, Swaminathan A, Schnabel T (2017) Unbiased learning-to-rank with biased feedback. Proceedings of the tenth ACM international conference on web search and data mining, 781--789

work page 2017
[25]

Computing 38(4):325--340

Jonker R, Volgenant A (1987) A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38(4):325--340

work page 1987
[26]

International conference on machine learning, 767--776 (PMLR)

Kveton B, Szepesvari C, Wen Z, Ashkan A (2015) Cascading bandits: Learning to rank in the cascade model. International conference on machine learning, 767--776 (PMLR)

work page 2015
[27]

Operations research 73(1):109--138

Li S, Luo Q, Huang Z, Shi C (2025) Online learning for constrained assortment optimization under markov chain choice model. Operations research 73(1):109--138

work page 2025
[28]

International conference on machine learning, 1245--1253 (PMLR)

Li S, Wang B, Zhang S, Chen W (2016) Contextual combinatorial cascading bandits. International conference on machine learning, 1245--1253 (PMLR)

work page 2016
[29]

Operations Research

Liang Y, Mao X, Wang S (2026) Online joint assortment-inventory optimization under mnl choices. Operations Research

work page 2026
[30]

Operations Research

Luo Y, Sun WW, Liu Y (2025) Rate-optimal online learning for dynamic assortment selection with positioning. Operations Research

work page 2025
[31]

Manufacturing & Service Operations Management 23(2):525--545

Miao S, Chao X (2021) Dynamic joint assortment and pricing optimization with demand learning. Manufacturing & Service Operations Management 23(2):525--545

work page 2021
[32]

Operations Research

Najafi S, Jasin S, Uichanco J, Zhao J (2026) Assortment and price optimization under a multiattribute (contextual) choice model. Operations Research

work page 2026
[33]

Operations research 58(6):1666--1680

Rusmevichientong P, Shen ZJM, Shmoys DB (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Operations research 58(6):1666--1680

work page 2010
[34]

arXiv preprint arXiv:2402.18917

Saha A, Gaillard P (2024) Stop relying on no-choice and do not repeat the moves: Optimal, efficient and practical algorithms for assortment optimization. arXiv preprint arXiv:2402.18917

work page arXiv 2024
[35]

Manufacturing & Service Operations Management 15(3):387--404

Saur \'e D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing & Service Operations Management 15(3):387--404

work page 2013
[36]

ii, on dinkelbach's algorithm

Schaible S (1976) Fractional programming. ii, on dinkelbach's algorithm. Management science 22(8):868--873

work page 1976
[37]

Proceedings of the eleventh ACM international conference on web search and data mining, 610--618

Wang X, Golbandi N, Bendersky M, Metzler D, Najork M (2018) Position bias estimation for unbiased learning to rank in personal search. Proceedings of the eleventh ACM international conference on web search and data mining, 610--618

work page 2018
[38]

Manufacturing & Service Operations Management 25(5):1748--1764

Xu Y, Wang Z (2023) Assortment optimization for a multistage choice model. Manufacturing & Service Operations Management 25(5):1748--1764

work page 2023

[1] [1]

4OR 14(1):57--75

Abeliuk A, Berbeglia G, Cebrian M, Van Hentenryck P (2016) Assortment optimization under a multinomial logit model with position bias and social influence. 4OR 14(1):57--75

work page 2016

[2] [2]

Conference on learning theory, 76--78 (PMLR)

Agrawal S, Avadhanula V, Goyal V, Zeevi A (2017) Thompson sampling for the mnl-bandit. Conference on learning theory, 76--78 (PMLR)

work page 2017

[3] [3]

Operations Research 67(5):1453--1485

Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) Mnl-bandit: A dynamic learning approach to assortment selection. Operations Research 67(5):1453--1485

work page 2019

[4] [4]

Management Science 67(6):3519--3550

Aouad A, Segev D (2021) Display optimization for vertically differentiated locations under multinomial logit preferences. Management Science 67(6):3519--3550

work page 2021

[5] [5]

Communications of the ACM 14(12):802--804

Bourgeois F, Lassalle JC (1971) An extension of the munkres algorithm for the assignment problem to rectangular matrices. Communications of the ACM 14(12):802--804

work page 1971

[6] [6]

Management science 53(2):276--292

Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management science 53(2):276--292

work page 2007

[7] [7]

Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, and games (Cambridge university press)

work page 2006

[8] [8]

ACM Transactions on Information Systems 41(3):1--39

Chen J, Dong H, Wang X, Feng F, Wang M, He X (2023 a ) Bias and debias in recommender system: A survey and future directions. ACM Transactions on Information Systems 41(3):1--39

work page 2023

[9] [9]

Available at SSRN 4526247

Chen N, Gao P, Wang C, Wang Y (2023 b ) Assortment optimization for the multinomial logit model with repeated customer interactions. Available at SSRN 4526247

work page 2023

[10] [10]

Production and Operations Management 30(1):85--102

Chen X, Shi C, Wang Y, Zhou Y (2021 a ) Dynamic assortment planning under nested logit models. Production and Operations Management 30(1):85--102

work page 2021

[11] [11]

Operations Research Letters 46(5):534--537

Chen X, Wang Y (2018) A note on a tight lower bound for capacitated mnl-bandit assortment selection models. Operations Research Letters 46(5):534--537

work page 2018

[12] [12]

Mathematics of Operations Research 46(4):1639--1657

Chen X, Wang Y, Zhou Y (2021 b ) Optimal policy for dynamic assortment planning under multinomial logit models. Mathematics of Operations Research 46(4):1639--1657

work page 2021

[13] [13]

The 22nd International Conference on Artificial Intelligence and Statistics, 438--447 (PMLR)

Cheung WC, Tan V, Zhong Z (2019) A thompson sampling algorithm for cascading bandits. The 22nd International Conference on Artificial Intelligence and Statistics, 438--447 (PMLR)

work page 2019

[14] [14]

Proceedings of the 2008 international conference on web search and data mining, 87--94

Craswell N, Zoeter O, Taylor M, Ramsey B (2008) An experimental comparison of click position-bias models. Proceedings of the 2008 international conference on web search and data mining, 87--94

work page 2008

[15] [15]

Management science 13(7):492--498

Dinkelbach W (1967) On nonlinear fractional programming. Management science 13(7):492--498

work page 1967

[16] [16]

arXiv preprint arXiv:2510.01693

Dong J, Mo W, Qi Z, Shi C, Fang EX, Tarokh V (2025) Pasta: A unified framework for offline assortment learning. arXiv preprint arXiv:2510.01693

work page arXiv 2025

[17] [17]

the Annals of Probability 100--118

Freedman DA (1975) On tail probabilities for martingales. the Annals of Probability 100--118

work page 1975

[18] [18]

Operations Research 68(1):134--160

Gallego G, Li A, Truong VA, Wang X (2020) Approximation algorithms for product framing and pricing. Operations Research 68(1):134--160

work page 2020

[19] [19]

Operations research 69(5):1509--1532

Gao P, Ma Y, Chen N, Gallego G, Li A, Rusmevichientong P, Topaloglu H (2021) Assortment optimization and pricing under the multinomial logit model with impatient customers: Sequential recommendation and selection. Operations research 69(5):1509--1532

work page 2021

[20] [20]

The Thirty-ninth Annual Conference on Neural Information Processing Systems

Han Y, Blanchet J, Zhou Z (2025 a ) Improved confidence regions and optimal algorithms for online and offline linear mnl bandits. The Thirty-ninth Annual Conference on Neural Information Processing Systems

work page 2025

[21] [21]

arXiv preprint arXiv:2502.06777

Han Y, Zhong H, Lu M, Blanchet J, Zhou Z (2025 b ) Learning an optimal assortment policy under observational data. arXiv preprint arXiv:2502.06777

work page arXiv 2025

[22] [22]

Manufacturing & Service Operations Management 26(1):215--232

Jasin S, Lyu C, Najafi S, Zhang H (2024) Assortment optimization with multi-item basket purchase under multivariate mnl model. Manufacturing & Service Operations Management 26(1):215--232

work page 2024

[23] [23]

Production and Operations Management 35(3):1133--1150

Jiang B, Wang Z, Xue C, Zhang N (2026) Assortment optimization in the presence of focal effect: Operational insights and efficient algorithms. Production and Operations Management 35(3):1133--1150

work page 2026

[24] [24]

Proceedings of the tenth ACM international conference on web search and data mining, 781--789

Joachims T, Swaminathan A, Schnabel T (2017) Unbiased learning-to-rank with biased feedback. Proceedings of the tenth ACM international conference on web search and data mining, 781--789

work page 2017

[25] [25]

Computing 38(4):325--340

Jonker R, Volgenant A (1987) A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38(4):325--340

work page 1987

[26] [26]

International conference on machine learning, 767--776 (PMLR)

Kveton B, Szepesvari C, Wen Z, Ashkan A (2015) Cascading bandits: Learning to rank in the cascade model. International conference on machine learning, 767--776 (PMLR)

work page 2015

[27] [27]

Operations research 73(1):109--138

Li S, Luo Q, Huang Z, Shi C (2025) Online learning for constrained assortment optimization under markov chain choice model. Operations research 73(1):109--138

work page 2025

[28] [28]

International conference on machine learning, 1245--1253 (PMLR)

Li S, Wang B, Zhang S, Chen W (2016) Contextual combinatorial cascading bandits. International conference on machine learning, 1245--1253 (PMLR)

work page 2016

[29] [29]

Operations Research

Liang Y, Mao X, Wang S (2026) Online joint assortment-inventory optimization under mnl choices. Operations Research

work page 2026

[30] [30]

Operations Research

Luo Y, Sun WW, Liu Y (2025) Rate-optimal online learning for dynamic assortment selection with positioning. Operations Research

work page 2025

[31] [31]

Manufacturing & Service Operations Management 23(2):525--545

Miao S, Chao X (2021) Dynamic joint assortment and pricing optimization with demand learning. Manufacturing & Service Operations Management 23(2):525--545

work page 2021

[32] [32]

Operations Research

Najafi S, Jasin S, Uichanco J, Zhao J (2026) Assortment and price optimization under a multiattribute (contextual) choice model. Operations Research

work page 2026

[33] [33]

Operations research 58(6):1666--1680

Rusmevichientong P, Shen ZJM, Shmoys DB (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Operations research 58(6):1666--1680

work page 2010

[34] [34]

arXiv preprint arXiv:2402.18917

Saha A, Gaillard P (2024) Stop relying on no-choice and do not repeat the moves: Optimal, efficient and practical algorithms for assortment optimization. arXiv preprint arXiv:2402.18917

work page arXiv 2024

[35] [35]

Manufacturing & Service Operations Management 15(3):387--404

Saur \'e D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing & Service Operations Management 15(3):387--404

work page 2013

[36] [36]

ii, on dinkelbach's algorithm

Schaible S (1976) Fractional programming. ii, on dinkelbach's algorithm. Management science 22(8):868--873

work page 1976

[37] [37]

Proceedings of the eleventh ACM international conference on web search and data mining, 610--618

Wang X, Golbandi N, Bendersky M, Metzler D, Najork M (2018) Position bias estimation for unbiased learning to rank in personal search. Proceedings of the eleventh ACM international conference on web search and data mining, 610--618

work page 2018

[38] [38]

Manufacturing & Service Operations Management 25(5):1748--1764

Xu Y, Wang Z (2023) Assortment optimization for a multistage choice model. Manufacturing & Service Operations Management 25(5):1748--1764

work page 2023