Harnessing Unimodality in Semiparametric Contextual Pricing via Oracle Price Map Learning

Jinchi Lv; Xiaocong Xu; Yingying Fan; Yuxuan Han; Zhengyuan Zhou

arxiv: 2605.15411 · v1 · pith:4UBO4OFWnew · submitted 2026-05-14 · 📊 stat.ML · cs.LG· math.OC

Harnessing Unimodality in Semiparametric Contextual Pricing via Oracle Price Map Learning

Yingying Fan , Yuxuan Han , Jinchi Lv , Xiaocong Xu , Zhengyuan Zhou This is my paper

Pith reviewed 2026-05-19 15:15 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.OC

keywords contextual pricingsemiparametric modeloracle price mapdynamic pricingbandit convex optimizationregret boundsunimodalityscalar index

0 comments

The pith

In semiparametric contextual pricing, a scalar-index pilot reduces the problem to learning a one-dimensional smooth oracle price map whose nonparametric cost is minimax sharp.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper considers contextual dynamic pricing where customer values follow a scalar-index model with unknown utility map and additive noise. The optimal price as a function of the scalar index forms an oracle price map that becomes smoother than the noise tail itself under a revenue-geometry condition ensuring unique interior revenue maximizers. ORBIT exploits this by first obtaining a scalar pilot index, then using local polynomial approximations inside trust regions solved via bandit convex optimization to learn the map. The resulting policy attains regret that separates a nonparametric term in the smoothness parameter from the usual parametric term in context dimension. A matching lower bound for fixed dimension shows the nonparametric term cannot be improved without stronger assumptions on the noise or geometry.

Core claim

Under β-Hölder smoothness of the tail function for β ≥ 2 and the revenue-geometry condition, the oracle price map u ↦ p^*(u) is itself (β-1)-smooth. The ORBIT policy takes a scalar pilot index, localizes benchmark prices per bin, and learns local polynomial approximations of the map inside trust regions via bandit convex optimization. For the linear utility model, an adaptive elliptical exploration scheme constructs the pilot online. This yields regret Õ(T^{(2β-1)/(4β-3)} + √(dT)), with a matching lower bound in the horizon dependence for fixed d that establishes minimax sharpness of the nonparametric term. The same scalar-pilot interface extends to sparse high-dimensional linear and fully H

What carries the argument

The one-dimensional oracle price map u ↦ p^*(u) induced by the scalar index and noise tail, which carries the reduction from high-dimensional contextual pricing to univariate nonparametric learning while preserving unimodality of the revenue function.

If this is right

The policy achieves the stated regret bound for linear utility models via adaptive elliptical exploration without context distributional assumptions.
A matching lower bound confirms the nonparametric oracle-map term is minimax sharp in the horizon for fixed dimension.
The scalar-pilot interface carries over directly to sparse high-dimensional linear utility and nonparametric H

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same index-reduction approach could simplify other semiparametric decision problems that possess revenue-like unimodal structure.
Trust-region local polynomial learning inside a coarse pilot might transfer to unimodal nonparametric bandits outside pricing.
Estimating the smoothness parameter β from observed revenue curves could allow the policy to adapt its local approximation order in practice.

Load-bearing premise

The revenue-geometry condition that produces a unique, stable, interior maximizer of expected revenue for each scalar index value u.

What would settle it

An experiment that measures the empirical regret exponent for large T with fixed d and varying known β, checking whether it tracks (2β-1)/(4β-3) or deviates from it.

read the original abstract

We study contextual dynamic pricing in a semiparametric scalar-index valuation model where the latent value is $v_t=\mu_\ast(\mathsf c_t)+\xi_t$, with an unknown utility map $\mu_\ast$ and an unknown additive noise distribution. The key decision object is the one-dimensional oracle price map $u\mapsto p^\ast(u)$ induced by the scalar index $u=\mu_\ast(\mathsf c)$ and the noise tail. Under the $\beta$-H\"older smoothness of the tail function for $\beta\geq 2$ and a revenue-geometry condition that gives a unique, stable, interior maximizer, this oracle map is itself $(\beta-1)$-smooth. We exploit such structure through $\mathsf{ORBIT}$, a modular coarse-to-fine policy that takes a scalar pilot index as input, localizes a benchmark price in each active bin, and learns a local polynomial approximation of the oracle map inside a trust region via bandit convex optimization. For the baseline linear utility model $\mu_\ast(\mathsf c)=\mathsf c^\top\theta_\ast$, an adaptive elliptical exploration scheme constructs the required scalar pilot online without distributional assumptions on the contexts. The resulting policy achieves regret $\widetilde{O}\big(T^{\frac{2\beta-1}{4\beta-3}}+\sqrt{dT}\big)$. For fixed $d$, we establish a matching lower bound in the horizon dependence, unveiling that the nonparametric oracle-map learning term is minimax sharp. The same scalar-pilot interface also yields extensions to sparse high-dimensional linear utility and nonparametric H\"older utility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a new regret rate for semiparametric contextual pricing by reducing the problem to learning a smooth one-dimensional oracle price map, but the claimed (β-1) smoothness rests on a revenue-geometry condition whose details are thin in the abstract.

read the letter

The main takeaway is that they get regret Õ(T^{(2β-1)/(4β-3)} + √(dT)) with a matching lower bound on the nonparametric term. The work introduces ORBIT, a coarse-to-fine policy that takes a scalar pilot index, bins the contexts, and fits a local polynomial to the oracle price map inside a trust region using bandit convex optimization. For the linear utility case they add an adaptive elliptical exploration step to build the pilot online. That modular scalar-pilot interface and the specific exponent look new relative to earlier semiparametric pricing results. They also sketch extensions to sparse high-dimensional and fully nonparametric utilities, which is useful. The analysis appears self-contained and does not lean on circular fitted quantities, which is a plus. Having both upper and lower bounds in the same paper strengthens the claim that the nonparametric term is sharp for fixed d. The revenue-geometry condition is the soft spot. The abstract invokes it to guarantee a unique stable interior maximizer so that p*(u) inherits (β-1)-Hölder smoothness from the β-Hölder tail. But it does not spell out a uniform curvature lower bound or margin away from the boundary. If that margin can vanish in places, the local polynomial bias inside the trust region could degrade and the localization step would not deliver the stated rate. Without the full proofs it is hard to tell whether the condition is stated strongly enough to close the argument. This paper is for people working on regret bounds in contextual bandits and dynamic pricing who already care about semiparametric structure and smoothness. A reader who follows the literature on oracle maps or trust-region methods in bandits would find the architecture and the exponent worth seeing. I would send it to peer review. The core idea is coherent and the rate is interesting enough that referees should check the geometry step and the BCO localization details.

Referee Report

3 major / 3 minor

Summary. The paper studies contextual dynamic pricing under a semiparametric scalar-index model v_t = μ_*(c_t) + ξ_t. It defines the one-dimensional oracle price map p^*(u) induced by the index u = μ_*(c) and the noise tail. Under β-Hölder smoothness (β ≥ 2) of the tail and a revenue-geometry condition guaranteeing a unique stable interior maximizer for each expected-revenue curve r_u(p), the map p^*(u) is (β-1)-smooth. The ORBIT policy uses a scalar pilot index, coarse-to-fine binning, and local polynomial approximation of p^* inside trust regions via bandit convex optimization (BCO). For linear μ_*(c) = c^⊤ θ_*, an adaptive elliptical exploration constructs the pilot online. The policy attains regret Õ(T^{(2β-1)/(4β-3)} + √(dT)); a matching lower bound in the T-exponent is shown for fixed d, establishing minimax sharpness of the nonparametric term. Extensions to sparse high-d linear and nonparametric Hölder utility are outlined.

Significance. If the revenue-geometry condition is shown to deliver uniform (β-1)-Hölder smoothness of p^*(u) with explicit curvature margins, the result supplies a sharp rate that cleanly separates the nonparametric oracle-map learning cost from the parametric pilot cost. The matching lower bound and the modular scalar-pilot interface (enabling extensions) are notable strengths. The work advances semiparametric contextual pricing by exploiting unimodality of revenue curves without requiring full nonparametric estimation of the value function.

major comments (3)

[Assumption 2] Assumption 2 (Revenue-Geometry Condition): The statement guarantees a unique stable interior maximizer but does not supply an explicit uniform lower bound on the second derivative of r_u(p) away from the boundary or a margin condition on the maximizer location. Without this, the derivation that p^*(u) is exactly (β-1)-Hölder (invoked for bias control of the local polynomial inside the trust region) may only hold pointwise rather than uniformly over the relevant range of u; this directly affects the localization error in the coarse-to-fine binning step and the claimed regret exponent.
[§4.2] §4.2 (Local Polynomial Approximation via BCO): The trust-region analysis assumes that the pilot index localizes the active bin tightly enough for the BCO subroutine to achieve the nonparametric rate. If the curvature margin in Assumption 2 is not uniform, the bias term in the local polynomial fit can degrade from O(h^β) to a slower rate, undermining the overall Õ(T^{(2β-1)/(4β-3)}) bound; the proof sketch does not quantify the propagation of this bias through the coarse-to-fine schedule.
[Theorem 4.1] Theorem 4.1 (Regret Upper Bound): The upper bound is stated under the (β-1)-smoothness of p^*; however, the localization argument for the pilot index (elliptical exploration) and the subsequent trust-region radius selection appear to rely on the same smoothness without an intermediate lemma establishing uniform Hölder continuity from the geometry condition. A gap here would make the rate non-sharp even if the lower bound holds.

minor comments (3)

[§3] Notation: the pilot index is sometimes denoted û_t and sometimes ũ_t; consistent use would improve readability.
[Theorem 5.1] The abstract claims the lower bound is 'matching in the horizon dependence'; the precise statement in Theorem 5.1 should clarify whether the constant factors and the d-dependence are also matched or only the T-exponent.
[Figure 1] Figure 1 (schematic of ORBIT): the trust-region visualization would benefit from an explicit annotation of the bin width h and the local polynomial degree.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough and constructive review. The comments correctly identify the need for greater explicitness regarding uniformity in the revenue-geometry condition and for additional intermediate steps in the analysis to support the claimed rates. We address each major comment below and will incorporate the suggested clarifications and lemmas in the revision.

read point-by-point responses

Referee: [Assumption 2] Assumption 2 (Revenue-Geometry Condition): The statement guarantees a unique stable interior maximizer but does not supply an explicit uniform lower bound on the second derivative of r_u(p) away from the boundary or a margin condition on the maximizer location. Without this, the derivation that p^*(u) is exactly (β-1)-Hölder (invoked for bias control of the local polynomial inside the trust region) may only hold pointwise rather than uniformly over the relevant range of u; this directly affects the localization error in the coarse-to-fine binning step and the claimed regret exponent.

Authors: We agree that the current statement of Assumption 2 would benefit from an explicit uniform curvature margin to guarantee uniformity. In the revision we will augment Assumption 2 with a uniform lower bound on |r_u''(p^*(u))| (derived from the existing revenue-geometry condition and the interior-maximizer requirement) that holds over the compact range of u relevant to the problem. A new supporting lemma will then establish that this margin, together with the β-Hölder smoothness of the tail, implies uniform (β-1)-Hölder continuity of p^*(u). This directly controls the localization error in the coarse-to-fine schedule and removes any ambiguity between pointwise and uniform smoothness. revision: yes
Referee: [§4.2] §4.2 (Local Polynomial Approximation via BCO): The trust-region analysis assumes that the pilot index localizes the active bin tightly enough for the BCO subroutine to achieve the nonparametric rate. If the curvature margin in Assumption 2 is not uniform, the bias term in the local polynomial fit can degrade from O(h^β) to a slower rate, undermining the overall Õ(T^{(2β-1)/(4β-3)}) bound; the proof sketch does not quantify the propagation of this bias through the coarse-to-fine schedule.

Authors: We thank the referee for noting the need to quantify bias propagation. With the uniform curvature margin added to Assumption 2, the bias of the local polynomial estimator remains O(h^β) uniformly inside each trust region. In the revised §4.2 we will expand the error decomposition to track how the pilot localization error determines the trust-region radius and how this radius interacts with the binning schedule. The resulting bounds will show that the nonparametric term stays Õ(T^{(2β-1)/(4β-3)}) without degradation, provided the pilot index satisfies the localization rate already established for the linear case. revision: yes
Referee: [Theorem 4.1] Theorem 4.1 (Regret Upper Bound): The upper bound is stated under the (β-1)-smoothness of p^*; however, the localization argument for the pilot index (elliptical exploration) and the subsequent trust-region radius selection appear to rely on the same smoothness without an intermediate lemma establishing uniform Hölder continuity from the geometry condition. A gap here would make the rate non-sharp even if the lower bound holds.

Authors: We acknowledge that an explicit intermediate step is desirable for clarity. We will insert a new lemma (placed after the definition of p^* and before the policy description) that derives uniform (β-1)-Hölder continuity of p^*(u) directly from the (revised) revenue-geometry condition and the β-Hölder tail smoothness. The proof of Theorem 4.1 will then invoke this lemma to justify both the elliptical-exploration localization and the trust-region radius choice. With this addition the upper-bound argument is self-contained and the matching lower bound continues to establish minimax sharpness of the nonparametric term. revision: yes

Circularity Check

0 steps flagged

No circularity: theoretical derivation is self-contained

full rationale

The paper derives the oracle price map smoothness from the stated β-Hölder tail and revenue-geometry condition, then builds the ORBIT policy and regret bound from that smoothness via local polynomial approximation and bandit convex optimization. No step reduces the target regret expression to a fitted quantity or self-citation by construction; the pilot index, binning, and trust-region localization are constructed from external assumptions rather than from the final bound. The matching lower bound is established separately for fixed d. This is a standard non-circular theoretical analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two domain assumptions (Hölder smoothness of the noise tail and the revenue-geometry condition) plus the standard technical machinery of local polynomial regression and bandit convex optimization; no free parameters are fitted inside the regret bound itself and no new entities are postulated.

axioms (2)

domain assumption The tail function of the additive noise is β-Hölder smooth for β ≥ 2.
Stated in the abstract as the condition that makes the oracle price map itself (β-1)-smooth.
domain assumption Revenue-geometry condition ensuring a unique, stable, interior maximizer for each scalar index u.
Invoked to guarantee that the oracle map is well-defined and inherits the stated smoothness.

pith-pipeline@v0.9.0 · 5841 in / 1559 out tokens · 38049 ms · 2026-05-19T15:15:31.851902+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Under the β-Hölder smoothness of the tail function for β≥2 and a revenue-geometry condition that gives a unique, stable, interior maximizer, this oracle map is itself (β−1)-smooth.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

the global quadratic growth bounds hold that for all p∈[0,pmax], σ_r/2 |p−p∗(u)|² ≤ r(u,p∗(u))−r(u,p) ≤ L_r/2 |p−p∗(u)|²

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 1 internal anchor

[1]

Proceedings of IEEE 36th Annual Foundations of Computer Science , pages=

Gambling in a rigged casino: The adversarial multi-armed bandit problem , author=. Proceedings of IEEE 36th Annual Foundations of Computer Science , pages=. 1995 , organization=

work page 1995
[2]

SIAM Journal on Computing , volume=

The nonstochastic multiarmed bandit problem , author=. SIAM Journal on Computing , volume=

work page
[3]

Revenue Maximization Under Sequential Price Competition Via The Estimation Of s-Concave Demand Functions

Revenue Maximization Under Sequential Price Competition Via The Estimation Of s-Concave Demand Functions , author=. arXiv preprint arXiv:2503.16737 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[4]

A Distribution-Free Theory of Nonparametric Regression , publisher =

L. A Distribution-Free Theory of Nonparametric Regression , publisher =

work page
[5]

arXiv preprint arXiv:2405.06866 , year=

Dynamic contextual pricing with doubly non-parametric random utility models , author=. arXiv preprint arXiv:2405.06866 , year=

work page arXiv
[6]

Operations Research , volume=

Online decision making with high-dimensional covariates , author=. Operations Research , volume=. 2020 , publisher=

work page 2020
[7]

Conference on Learning Theory , pages=

Smooth contextual bandits: Bridging the parametric and non-differentiable regret regimes , author=. Conference on Learning Theory , pages=. 2020 , organization=

work page 2020
[8]

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers , author=

work page
[9]

2020 IEEE International Symposium on Information Theory (ISIT) , pages=

Multi-product dynamic pricing in high-dimensions with heterogeneous price sensitivity , author=. 2020 IEEE International Symposium on Information Theory (ISIT) , pages=. 2020 , organization=

work page 2020
[10]

2011 , publisher=

Statistics for high-dimensional data: methods, theory and applications , author=. 2011 , publisher=

work page 2011
[11]

Advances in Neural Information Processing Systems , volume=

High-dimensional sparse linear bandits , author=. Advances in Neural Information Processing Systems , volume=

work page
[12]

Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=

Contextual bandits with linear payoff functions , author=. Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=. 2011 , organization=

work page 2011
[13]

Bandit convex optimisation,

Bandit convex optimisation , author=. arXiv preprint arXiv:2402.06535 , year=

work page arXiv
[14]

arXiv preprint arXiv:2406.06506 , year=

Online Newton method for bandit convex optimisation , author=. arXiv preprint arXiv:2406.06506 , year=

work page arXiv
[15]

Conference on Learning Theory , pages=

Multi-scale exploration of convex functions and bandit convex optimization , author=. Conference on Learning Theory , pages=. 2016 , organization=

work page 2016
[16]

Journal of the ACM (JACM) , volume=

Kernel-based methods for bandit convex optimization , author=. Journal of the ACM (JACM) , volume=. 2021 , publisher=

work page 2021
[17]

Advances in Neural Information Processing Systems , volume=

Bandit convex optimization: Towards tight bounds , author=. Advances in Neural Information Processing Systems , volume=

work page
[18]

arXiv preprint arXiv:2106.00444 , year=

Minimax regret for bandit convex optimisation of ridge functions , author=. arXiv preprint arXiv:2106.00444 , year=

work page arXiv
[19]

Tsybakov , title =

Alexandre B. Tsybakov , title =

work page
[20]

, title =

den Boer, Arnoud V. , title =. Surveys in Operations Research and Management Science , volume =. 2015 , doi =

work page 2015
[21]

Management Science , volume =

Lobel, Ilan , title =. Management Science , volume =. 2020 , doi =

work page 2020
[22]

Service Science , volume =

Chen, Ningyuan and Hu, Ming , title =. Service Science , volume =. 2023 , doi =

work page 2023
[23]

Operations Research , volume =

Broder, Josef and Rusmevichientong, Paat , title =. Operations Research , volume =. 2012 , doi =

work page 2012
[24]

Journal of Machine Learning Research , volume =

Javanmard, Adel and Nazerzadeh, Hamid , title =. Journal of Machine Learning Research , volume =. 2019 , url =

work page 2019
[25]

and Lobel, Ilan and Paes Leme, Renato , title =

Cohen, Maxime C. and Lobel, Ilan and Paes Leme, Renato , title =. Management Science , volume =. 2020 , doi =

work page 2020
[26]

Bora , title =

Ban, Gah-Yi and Keskin, N. Bora , title =. Management Science , volume =. 2021 , doi =

work page 2021
[27]

Operations Research , volume =

Chen, Ningyuan and Gallego, Guillermo , title =. Operations Research , volume =. 2021 , doi =

work page 2021
[28]

Operations Research , volume =

Gong, Xueping and You, Wei and Zhang, Jiheng , title =. Operations Research , volume =. 2025 , doi =

work page 2025
[29]

Advances in Neural Information Processing Systems , volume =

Xu, Jianyu and Wang, Yu-Xiang , title =. Advances in Neural Information Processing Systems , volume =. 2021 , url =

work page 2021
[30]

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , series =

Xu, Jianyu and Wang, Yu-Xiang , title =. Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , series =. 2022 , publisher =

work page 2022
[31]

Proceedings of the 40th International Conference on Machine Learning , series =

Choi, Young-Geun and Kim, Gi-Soo and Choi, Yunseo and Cho, Wooseong and Paik, Myunghee Cho and Oh, Min-Hwan , title =. Proceedings of the 40th International Conference on Machine Learning , series =. 2023 , publisher =

work page 2023
[32]

Mathematics of Operations Research , volume =

Luo, Yiyun and Sun, Will Wei and Liu, Yufeng , title =. Mathematics of Operations Research , volume =. 2023 , doi =

work page 2023
[33]

Advances in Neural Information Processing Systems , volume =

Luo, Yiyun and Sun, Will Wei and Liu, Yufeng , title =. Advances in Neural Information Processing Systems , volume =. 2022 , url =

work page 2022
[34]

Advances in Neural Information Processing Systems , volume =

Tullii, Matilde and Gaucher, Solenne and Merlis, Nadav and Perchet, Vianney , title =. Advances in Neural Information Processing Systems , volume =. 2024 , doi =

work page 2024
[35]

Journal of the American Statistical Association , volume =

Fan, Jianqing and Guo, Yongyi and Yu, Mengxin , title =. Journal of the American Statistical Association , volume =. 2024 , doi =

work page 2024
[36]

2025 , note =

Wang, Yining and Chen, Boxiao , title =. 2025 , note =. doi:10.2139/ssrn.5133677 , url =

work page doi:10.2139/ssrn.5133677 2025
[37]

International Conference on Learning Representations , year =

Han, Yuxuan and Xu, Xiaocong and Wen, Yuxiao and Han, Yanjun and Lobel, Ilan and Zhou, Zhengyuan , title =. International Conference on Learning Representations , year =

work page
[38]

Management Science , volume =

Chen, Xi and Liu, Quanquan and Wang, Yining , title =. Management Science , volume =. 2022 , doi =

work page 2022
[39]

Proceedings of Thirty Fourth Conference on Learning Theory , series =

Lattimore, Tor and Gyorgy, Andras , title =. Proceedings of Thirty Fourth Conference on Learning Theory , series =. 2021 , publisher =

work page 2021
[40]

Advances in Neural Information Processing Systems , volume =

Shah, Virag and Johari, Ramesh and Blanchet, Jose , title =. Advances in Neural Information Processing Systems , volume =. 2019 , url =

work page 2019
[41]

2020 , publisher=

Bandit algorithms , author=. 2020 , publisher=

work page 2020
[42]

Advances in Neural Information Processing Systems , volume=

Eluder dimension and the sample complexity of optimistic exploration , author=. Advances in Neural Information Processing Systems , volume=

work page
[43]

Operations Research , volume=

Close the gaps: A learning-while-doing algorithm for single-product revenue management problems , author=. Operations Research , volume=. 2014 , publisher=

work page 2014
[44]

44th Annual IEEE Symposium on Foundations of Computer Science, 2003

The value of knowing a demand curve: Bounds on regret for online posted-price auctions , author=. 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings. , pages=. 2003 , organization=

work page 2003
[45]

International Conference on Algorithmic Learning Theory , pages=

Efficient local planning with linear function approximation , author=. International Conference on Algorithmic Learning Theory , pages=. 2022 , organization=

work page 2022
[46]

International Conference on Machine Learning , pages=

Provably optimal algorithms for generalized linear contextual bandits , author=. International Conference on Machine Learning , pages=. 2017 , organization=

work page 2017
[47]

Management Science , volume=

Multimodal dynamic pricing , author=. Management Science , volume=. 2021 , publisher=

work page 2021
[48]

Mathematics of Operations Research , volume=

Nonparametric self-adjusting control for joint learning and optimization of multiproduct pricing with finite resource capacity , author=. Mathematics of Operations Research , volume=. 2019 , publisher=

work page 2019
[49]

International Conference on Artificial Intelligence and Statistics , pages=

Smooth bandit optimization: generalization to holder space , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

work page 2021
[50]

Operations Research , volume=

Smoothness-adaptive contextual bandits , author=. Operations Research , volume=. 2022 , publisher=

work page 2022
[51]

Mathematics of Operations Research , volume=

Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability , author=. Mathematics of Operations Research , volume=. 2022 , publisher=

work page 2022
[52]

International Conference on Machine Learning , pages=

Practical contextual bandits with regression oracles , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018
[53]

Advances in Neural Information Processing Systems , volume=

Stochastic convex optimization with bandit feedback , author=. Advances in Neural Information Processing Systems , volume=

work page
[54]

Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=

Improved regret guarantees for online smooth convex optimization with bandit feedback , author=. Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=. 2011 , organization=

work page 2011
[55]

Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms , pages=

Online convex optimization in the bandit setting: Gradient descent without a gradient , author=. Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms , pages=

work page
[56]

arXiv preprint arXiv:2502.05776 , year=

Dynamic pricing in the linear valuation model using shape constraints , author=. arXiv preprint arXiv:2502.05776 , year=

work page arXiv

[1] [1]

Proceedings of IEEE 36th Annual Foundations of Computer Science , pages=

Gambling in a rigged casino: The adversarial multi-armed bandit problem , author=. Proceedings of IEEE 36th Annual Foundations of Computer Science , pages=. 1995 , organization=

work page 1995

[2] [2]

SIAM Journal on Computing , volume=

The nonstochastic multiarmed bandit problem , author=. SIAM Journal on Computing , volume=

work page

[3] [3]

Revenue Maximization Under Sequential Price Competition Via The Estimation Of s-Concave Demand Functions

Revenue Maximization Under Sequential Price Competition Via The Estimation Of s-Concave Demand Functions , author=. arXiv preprint arXiv:2503.16737 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

A Distribution-Free Theory of Nonparametric Regression , publisher =

L. A Distribution-Free Theory of Nonparametric Regression , publisher =

work page

[5] [5]

arXiv preprint arXiv:2405.06866 , year=

Dynamic contextual pricing with doubly non-parametric random utility models , author=. arXiv preprint arXiv:2405.06866 , year=

work page arXiv

[6] [6]

Operations Research , volume=

Online decision making with high-dimensional covariates , author=. Operations Research , volume=. 2020 , publisher=

work page 2020

[7] [7]

Conference on Learning Theory , pages=

Smooth contextual bandits: Bridging the parametric and non-differentiable regret regimes , author=. Conference on Learning Theory , pages=. 2020 , organization=

work page 2020

[8] [8]

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers , author=

work page

[9] [9]

2020 IEEE International Symposium on Information Theory (ISIT) , pages=

Multi-product dynamic pricing in high-dimensions with heterogeneous price sensitivity , author=. 2020 IEEE International Symposium on Information Theory (ISIT) , pages=. 2020 , organization=

work page 2020

[10] [10]

2011 , publisher=

Statistics for high-dimensional data: methods, theory and applications , author=. 2011 , publisher=

work page 2011

[11] [11]

Advances in Neural Information Processing Systems , volume=

High-dimensional sparse linear bandits , author=. Advances in Neural Information Processing Systems , volume=

work page

[12] [12]

Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=

Contextual bandits with linear payoff functions , author=. Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=. 2011 , organization=

work page 2011

[13] [13]

Bandit convex optimisation,

Bandit convex optimisation , author=. arXiv preprint arXiv:2402.06535 , year=

work page arXiv

[14] [14]

arXiv preprint arXiv:2406.06506 , year=

Online Newton method for bandit convex optimisation , author=. arXiv preprint arXiv:2406.06506 , year=

work page arXiv

[15] [15]

Conference on Learning Theory , pages=

Multi-scale exploration of convex functions and bandit convex optimization , author=. Conference on Learning Theory , pages=. 2016 , organization=

work page 2016

[16] [16]

Journal of the ACM (JACM) , volume=

Kernel-based methods for bandit convex optimization , author=. Journal of the ACM (JACM) , volume=. 2021 , publisher=

work page 2021

[17] [17]

Advances in Neural Information Processing Systems , volume=

Bandit convex optimization: Towards tight bounds , author=. Advances in Neural Information Processing Systems , volume=

work page

[18] [18]

arXiv preprint arXiv:2106.00444 , year=

Minimax regret for bandit convex optimisation of ridge functions , author=. arXiv preprint arXiv:2106.00444 , year=

work page arXiv

[19] [19]

Tsybakov , title =

Alexandre B. Tsybakov , title =

work page

[20] [20]

, title =

den Boer, Arnoud V. , title =. Surveys in Operations Research and Management Science , volume =. 2015 , doi =

work page 2015

[21] [21]

Management Science , volume =

Lobel, Ilan , title =. Management Science , volume =. 2020 , doi =

work page 2020

[22] [22]

Service Science , volume =

Chen, Ningyuan and Hu, Ming , title =. Service Science , volume =. 2023 , doi =

work page 2023

[23] [23]

Operations Research , volume =

Broder, Josef and Rusmevichientong, Paat , title =. Operations Research , volume =. 2012 , doi =

work page 2012

[24] [24]

Journal of Machine Learning Research , volume =

Javanmard, Adel and Nazerzadeh, Hamid , title =. Journal of Machine Learning Research , volume =. 2019 , url =

work page 2019

[25] [25]

and Lobel, Ilan and Paes Leme, Renato , title =

Cohen, Maxime C. and Lobel, Ilan and Paes Leme, Renato , title =. Management Science , volume =. 2020 , doi =

work page 2020

[26] [26]

Bora , title =

Ban, Gah-Yi and Keskin, N. Bora , title =. Management Science , volume =. 2021 , doi =

work page 2021

[27] [27]

Operations Research , volume =

Chen, Ningyuan and Gallego, Guillermo , title =. Operations Research , volume =. 2021 , doi =

work page 2021

[28] [28]

Operations Research , volume =

Gong, Xueping and You, Wei and Zhang, Jiheng , title =. Operations Research , volume =. 2025 , doi =

work page 2025

[29] [29]

Advances in Neural Information Processing Systems , volume =

Xu, Jianyu and Wang, Yu-Xiang , title =. Advances in Neural Information Processing Systems , volume =. 2021 , url =

work page 2021

[30] [30]

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , series =

Xu, Jianyu and Wang, Yu-Xiang , title =. Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , series =. 2022 , publisher =

work page 2022

[31] [31]

Proceedings of the 40th International Conference on Machine Learning , series =

Choi, Young-Geun and Kim, Gi-Soo and Choi, Yunseo and Cho, Wooseong and Paik, Myunghee Cho and Oh, Min-Hwan , title =. Proceedings of the 40th International Conference on Machine Learning , series =. 2023 , publisher =

work page 2023

[32] [32]

Mathematics of Operations Research , volume =

Luo, Yiyun and Sun, Will Wei and Liu, Yufeng , title =. Mathematics of Operations Research , volume =. 2023 , doi =

work page 2023

[33] [33]

Advances in Neural Information Processing Systems , volume =

Luo, Yiyun and Sun, Will Wei and Liu, Yufeng , title =. Advances in Neural Information Processing Systems , volume =. 2022 , url =

work page 2022

[34] [34]

Advances in Neural Information Processing Systems , volume =

Tullii, Matilde and Gaucher, Solenne and Merlis, Nadav and Perchet, Vianney , title =. Advances in Neural Information Processing Systems , volume =. 2024 , doi =

work page 2024

[35] [35]

Journal of the American Statistical Association , volume =

Fan, Jianqing and Guo, Yongyi and Yu, Mengxin , title =. Journal of the American Statistical Association , volume =. 2024 , doi =

work page 2024

[36] [36]

2025 , note =

Wang, Yining and Chen, Boxiao , title =. 2025 , note =. doi:10.2139/ssrn.5133677 , url =

work page doi:10.2139/ssrn.5133677 2025

[37] [37]

International Conference on Learning Representations , year =

Han, Yuxuan and Xu, Xiaocong and Wen, Yuxiao and Han, Yanjun and Lobel, Ilan and Zhou, Zhengyuan , title =. International Conference on Learning Representations , year =

work page

[38] [38]

Management Science , volume =

Chen, Xi and Liu, Quanquan and Wang, Yining , title =. Management Science , volume =. 2022 , doi =

work page 2022

[39] [39]

Proceedings of Thirty Fourth Conference on Learning Theory , series =

Lattimore, Tor and Gyorgy, Andras , title =. Proceedings of Thirty Fourth Conference on Learning Theory , series =. 2021 , publisher =

work page 2021

[40] [40]

Advances in Neural Information Processing Systems , volume =

Shah, Virag and Johari, Ramesh and Blanchet, Jose , title =. Advances in Neural Information Processing Systems , volume =. 2019 , url =

work page 2019

[41] [41]

2020 , publisher=

Bandit algorithms , author=. 2020 , publisher=

work page 2020

[42] [42]

Advances in Neural Information Processing Systems , volume=

Eluder dimension and the sample complexity of optimistic exploration , author=. Advances in Neural Information Processing Systems , volume=

work page

[43] [43]

Operations Research , volume=

Close the gaps: A learning-while-doing algorithm for single-product revenue management problems , author=. Operations Research , volume=. 2014 , publisher=

work page 2014

[44] [44]

44th Annual IEEE Symposium on Foundations of Computer Science, 2003

The value of knowing a demand curve: Bounds on regret for online posted-price auctions , author=. 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings. , pages=. 2003 , organization=

work page 2003

[45] [45]

International Conference on Algorithmic Learning Theory , pages=

Efficient local planning with linear function approximation , author=. International Conference on Algorithmic Learning Theory , pages=. 2022 , organization=

work page 2022

[46] [46]

International Conference on Machine Learning , pages=

Provably optimal algorithms for generalized linear contextual bandits , author=. International Conference on Machine Learning , pages=. 2017 , organization=

work page 2017

[47] [47]

Management Science , volume=

Multimodal dynamic pricing , author=. Management Science , volume=. 2021 , publisher=

work page 2021

[48] [48]

Mathematics of Operations Research , volume=

Nonparametric self-adjusting control for joint learning and optimization of multiproduct pricing with finite resource capacity , author=. Mathematics of Operations Research , volume=. 2019 , publisher=

work page 2019

[49] [49]

International Conference on Artificial Intelligence and Statistics , pages=

Smooth bandit optimization: generalization to holder space , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

work page 2021

[50] [50]

Operations Research , volume=

Smoothness-adaptive contextual bandits , author=. Operations Research , volume=. 2022 , publisher=

work page 2022

[51] [51]

Mathematics of Operations Research , volume=

Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability , author=. Mathematics of Operations Research , volume=. 2022 , publisher=

work page 2022

[52] [52]

International Conference on Machine Learning , pages=

Practical contextual bandits with regression oracles , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018

[53] [53]

Advances in Neural Information Processing Systems , volume=

Stochastic convex optimization with bandit feedback , author=. Advances in Neural Information Processing Systems , volume=

work page

[54] [54]

Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=

Improved regret guarantees for online smooth convex optimization with bandit feedback , author=. Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=. 2011 , organization=

work page 2011

[55] [55]

Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms , pages=

Online convex optimization in the bandit setting: Gradient descent without a gradient , author=. Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms , pages=

work page

[56] [56]

arXiv preprint arXiv:2502.05776 , year=

Dynamic pricing in the linear valuation model using shape constraints , author=. arXiv preprint arXiv:2502.05776 , year=

work page arXiv