pith. sign in

arxiv: 1907.05381 · v1 · pith:EELGPZWHnew · submitted 2019-07-02 · 💰 econ.EM · math.OC· q-fin.MF· q-fin.RM· stat.ML

Adaptive Pricing in Insurance: Generalized Linear Models and Gaussian Process Regression Approaches

Pith reviewed 2026-05-25 11:00 UTC · model grok-4.3

classification 💰 econ.EM math.OCq-fin.MFq-fin.RMstat.ML
keywords adaptive pricinginsurancegeneralized linear modelsgaussian process regressiondynamic pricingmaximum quasi-likelihood estimationrevenue managementonline learning
0
0 comments X

The pith

If insurance prices are chosen with suitably decreasing variability, maximum quasi-likelihood estimates converge to true parameter values and prices converge to the revenue-maximizing level.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats insurance pricing as an online revenue management task in which an insurer must set prices for a new product to maximize long-run revenue. Demand and claims are modeled by generalized linear models, the standard approach in insurance. An adaptive GLM policy uses maximum quasi-likelihood estimation to learn unknown parameters while choosing prices that trade off learning against immediate revenue. The central result is that when price variability is reduced at a suitable rate, the estimates exist and converge to the correct values, which forces the chosen prices themselves to converge to the optimum. A parallel adaptive Gaussian process policy samples demand and claims from Gaussian processes and selects prices via upper confidence bounds; both policies are also examined when claim information arrives with delay.

Core claim

In the adaptive GLM design, if prices are chosen with suitably decreasing variability, the MQLE parameters eventually exist and converge to the correct values, which in turn implies that the sequence of chosen prices will also converge to the optimal price. The adaptive GP regression model samples demand and claims from Gaussian Processes and selects selling prices by the upper confidence bound rule. Both algorithms are analyzed under delayed claims.

What carries the argument

maximum quasi-likelihood estimation (MQLE) inside an adaptive pricing loop whose price sequence is constructed with suitably decreasing variability

If this is right

  • The chosen prices converge to the revenue-maximizing price.
  • The same convergence holds when claim information is received with delay.
  • An alternative Gaussian process policy achieves comparable learning and exploitation balance without parametric assumptions.
  • Regret, measured as expected revenue loss relative to the optimal price, is controlled through the exploration schedule.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The convergence argument may extend to other revenue-management domains that already use GLM demand models.
  • Real-world insurance data could be used to test whether the required rate of variability reduction produces acceptable short-term revenue loss.
  • Hybrid policies that switch from GP exploration to GLM exploitation once parameters stabilize could reduce long-run regret.

Load-bearing premise

Demand and claims follow generalized linear models and the pricing rule reduces variability at a rate sufficient for the estimates to exist and converge.

What would settle it

A sequence of observed demands and claims generated under a pricing policy whose variability does not decrease sufficiently, such that the MQLE estimates fail to converge or the prices fail to approach the revenue optimum.

Figures

Figures reproduced from arXiv: 1907.05381 by Neil Walton, Yuqing Zhang.

Figure 1
Figure 1. Figure 1: Price dispersion and convergence of parameter estimates. [PITH_FULL_IMAGE:figures/full_fig_p017_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Cumulative regret and convergence rate for GLM algorithm. GLM denotes the non-delayed case and D-GLM [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cumulative regret and convergence rate for GP algorithm. GP denotes the non-delayed case and D-GP denotes [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
read the original abstract

We study the application of dynamic pricing to insurance. We view this as an online revenue management problem where the insurance company looks to set prices to optimize the long-run revenue from selling a new insurance product. We develop two pricing models: an adaptive Generalized Linear Model (GLM) and an adaptive Gaussian Process (GP) regression model. Both balance between exploration, where we choose prices in order to learn the distribution of demands & claims for the insurance product, and exploitation, where we myopically choose the best price from the information gathered so far. The performance of the pricing policies is measured in terms of regret: the expected revenue loss caused by not using the optimal price. As is commonplace in insurance, we model demand and claims by GLMs. In our adaptive GLM design, we use the maximum quasi-likelihood estimation (MQLE) to estimate the unknown parameters. We show that, if prices are chosen with suitably decreasing variability, the MQLE parameters eventually exist and converge to the correct values, which in turn implies that the sequence of chosen prices will also converge to the optimal price. In the adaptive GP regression model, we sample demand and claims from Gaussian Processes and then choose selling prices by the upper confidence bound rule. We also analyze these GLM and GP pricing algorithms with delayed claims. Although similar results exist in other domains, this is among the first works to consider dynamic pricing problems in the field of insurance. We also believe this is the first work to consider Gaussian Process regression in the context of insurance pricing. These initial findings suggest that online machine learning algorithms could be a fruitful area of future investigation and application in insurance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops two adaptive pricing policies for insurance products where demand and claims follow GLMs. The first uses maximum quasi-likelihood estimation (MQLE) under a pricing policy with suitably decreasing variability, proving eventual existence and consistency of the MQLE estimates and consequent convergence of prices to the revenue-maximizing price. The second employs Gaussian process regression with an upper-confidence-bound rule for price selection. Both are analyzed for regret, and the GLM and GP approaches are extended to the case of delayed claims. The work positions these as among the first applications of online learning methods to insurance pricing.

Significance. If the convergence and regret results are rigorously established, the paper makes a useful contribution by importing standard adaptive-design techniques for GLMs and introducing GP regression into insurance pricing, a domain where such dynamic, data-driven policies have received limited attention. The explicit link between controlled exploration (decreasing variability) and consistency of MQLE is a standard but practically relevant technical step; the GP-UCB analysis and delayed-claims extension broaden the scope.

major comments (2)
  1. [theoretical results on MQLE convergence] The central claim (abstract and theoretical results section) that a policy with suitably decreasing variability guarantees eventual existence and consistency of the MQLE, which in turn implies price convergence, rests on standard GLM adaptive-design conditions. The manuscript should state the precise rate condition on price variability (e.g., the summability or decay rate required for the design matrix to accumulate full rank) and supply the key steps or error bounds that close the implication from parameter consistency to price convergence.
  2. [Gaussian process regression model and regret analysis] For the GP regression approach, the regret analysis and UCB rule require explicit assumptions on the kernel, the sub-Gaussian noise, and the resulting regret bound (e.g., O(sqrt(T log T)) or similar). Without these, it is difficult to compare the GP policy's performance guarantees with the GLM policy or with existing UCB results in the literature.
minor comments (2)
  1. Notation for the decreasing-variability condition and for the delayed-claims observation process should be introduced once and used consistently; a short table summarizing the two policies (assumptions, estimation method, exploration mechanism, regret order) would aid readability.
  2. [Introduction] The abstract states that 'similar results exist in other domains'; adding two or three key references from the adaptive-design or online-learning literature would strengthen the positioning without lengthening the introduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. The suggestions help clarify the theoretical conditions and assumptions. We address each major comment below and have revised the manuscript accordingly.

read point-by-point responses
  1. Referee: [theoretical results on MQLE convergence] The central claim (abstract and theoretical results section) that a policy with suitably decreasing variability guarantees eventual existence and consistency of the MQLE, which in turn implies price convergence, rests on standard GLM adaptive-design conditions. The manuscript should state the precise rate condition on price variability (e.g., the summability or decay rate required for the design matrix to accumulate full rank) and supply the key steps or error bounds that close the implication from parameter consistency to price convergence.

    Authors: We agree that the precise rate condition and key steps should be stated explicitly for rigor. In the revised manuscript, we will add that the price variability sequence must satisfy sum_t var(p_t) = infinity (ensuring the design matrix accumulates full rank asymptotically, per standard adaptive GLM results such as those in Lai and Wei (1982)). We will include the key steps: (i) the MQLE exists eventually under this condition by showing the information matrix diverges; (ii) consistency follows from the martingale convergence theorem applied to the quasi-score; (iii) price convergence to the optimum follows from continuous mapping and a uniform bound on the price deviation |p_t - p*| <= C ||theta_hat_t - theta*||, with the error bound O(1/sqrt(t)) in probability. These additions close the implication without altering the main results. revision: yes

  2. Referee: [Gaussian process regression model and regret analysis] For the GP regression approach, the regret analysis and UCB rule require explicit assumptions on the kernel, the sub-Gaussian noise, and the resulting regret bound (e.g., O(sqrt(T log T)) or similar). Without these, it is difficult to compare the GP policy's performance guarantees with the GLM policy or with existing UCB results in the literature.

    Authors: We agree that explicit assumptions and the regret bound should be stated to facilitate comparison. In the revised manuscript, we will specify: the kernel is continuous and positive definite (e.g., squared exponential with length-scale l); observations are sub-Gaussian with variance proxy sigma^2; and under these, the GP-UCB policy achieves regret O(sqrt(T log T)) (following standard bounds from Srinivas et al. (2010) adapted to the insurance setting with delayed claims). This allows direct comparison to the GLM policy's O(T^{2/3}) regret and to the broader UCB literature. The analysis for delayed claims remains unchanged. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central derivation is a standard consistency result for MQLE under adaptive GLM designs with controlled exploration (decreasing price variability). The abstract states the implication directly as a theorem to be shown, notes that similar results exist in other domains, and relies on external GLM modeling conventions plus regret definitions rather than any self-referential fit, ansatz, or self-citation chain. No load-bearing step reduces by construction to the paper's own inputs or prior self-work; the argument is self-contained against external benchmarks in adaptive design literature.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption that demand and claims follow GLMs and on the modeling choice of decreasing price variability to guarantee MQLE convergence.

axioms (1)
  • domain assumption Demand and claims are modeled by generalized linear models
    Described as commonplace in insurance and used as the basis for both pricing models.

pith-pipeline@v0.9.0 · 5837 in / 1037 out tokens · 26630 ms · 2026-05-25T11:00:47.132932+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages · 3 internal anchors

  1. [1]

    R. L. Phillips, Pricing and revenue optimization, Stanford Business Books, Stanford, Calif., 2005

  2. [2]

    Talluri, G

    K. Talluri, G. van Ryzin, The theory and practice of revenue management, Springer, Boston, MA, 2005. URL https://doi.org/10.1007/b139000

  3. [3]

    J. A. Nelder, R. W. M. Wedderburn, Generalized linear models, Journal of the Royal Statistical Society 135 (3) (1972) 370–384. doi:10.2307/2344614. URL http://www.jstor.org/stable/2344614

  4. [4]

    McCullagh, J

    P. McCullagh, J. A. Nelder, Generalized Linear Models, 2nd Edition, Chapman and Hall, London, 1989. URL http://www.utstat.toronto.edu/~brunner/oldclass/2201s11/readings/glmbook.pdf 28

  5. [5]

    T. L. Lai, H. Robbins, Adaptive design and stochastic approximation, The Annals of Statistics 7 (6) (1979) 1196–1221. doi:10.1214/aos/1176344840

  6. [6]

    T. Lai, H. Robbins, Consistency and asymptotic efficiency of slope estimates in stochastic approxi- mation schemes, Z. Wahrscheinlichkeitstheorie verw. Gebiete 56 (3) (1981) 329–360. URL https://doi.org/10.1007/BF00536178

  7. [7]

    T. Lai, H. Robbins, Iterated least squares in multiperiod control, Advances in Applied Mathematics 3 (1) (1982) 50–73. URL https://doi.org/10.1016/S0196-8858(82)80005-5

  8. [8]

    T. L. Lai, C. Z. Wei, Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems, The Annals of Statistics 10 (1) (1982) 154–166. doi:10.1214/aos/1176345697

  9. [9]

    Lai, Stochastic approximation: invited paper, The Annals of Statistics 31 (2) (2003) 391–406

    T. Lai, Stochastic approximation: invited paper, The Annals of Statistics 31 (2) (2003) 391–406. doi:10.1214/aos/1051027873

  10. [10]

    Den Boer, B

    A. Den Boer, B. Zwart, Simultaneously learning and optimizing using controlled variance pricing, Management Science 60 (3) (2013) 770–783. URL https://doi.org/10.1287/mnsc.2013.1788

  11. [11]

    Moˇ ckus, Bayesian Approach to Global Optimization, Mathematics and its Applications, Springer, Netherlands, 1989

    J. Moˇ ckus, Bayesian Approach to Global Optimization, Mathematics and its Applications, Springer, Netherlands, 1989. doi:10.1007/978-94-009-0909-0

  12. [12]

    Moˇ ckus, Bayesian approach to global optimization and application to multiobjective and con- strained problems, Journal of Global Optimization 4 (4) (1994) 347–365

    J. Moˇ ckus, Bayesian approach to global optimization and application to multiobjective and con- strained problems, Journal of Global Optimization 4 (4) (1994) 347–365. doi:10.1007/BF01099263

  13. [14]

    Srinivas, A

    N. Srinivas, A. Krause, S. M. Kakade, M. Seeger, Information-theoretic regret bounds for gaussian process optimization in the bandit setting, IEEE Transactions on Information Theory 58 (5) (2012) 389–434. doi:10.1109/TIT.2011.2182033. URL https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6138914

  14. [15]

    C. E. Rasmussen, C. K. I. Williams, Gaussian processes for machine learning, Adaptive computation and machine learning, MIT Press, Cambridge, Mass, 2006

  15. [16]

    A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

    E. Brochu, V. M. Cora, N. de Freitas, A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning, CoRR abs/1012.2599. URL http://arxiv.org/abs/1012.2599

  16. [17]

    URL http://www.biztositasiszemle.hu/files/201512/sigma6_2015_en.pdf

    SwissRe, Life insurance in the digital age: fundamental transformation ahead, Swiss Re Sigma Report. URL http://www.biztositasiszemle.hu/files/201512/sigma6_2015_en.pdf

  17. [18]

    B¨ uhlmann, Mathematical Methods in Risk Theory, Grundlehren der mathematischen Wis- senschaften, Springer-Verlag Berlin Heidelberg, 2005

    H. B¨ uhlmann, Mathematical Methods in Risk Theory, Grundlehren der mathematischen Wis- senschaften, Springer-Verlag Berlin Heidelberg, 2005. doi:10.1007/978-3-540-30711-2

  18. [19]

    McClenahan, Ratemaking, 4th Edition, Casualty Actuarial Society, 1984, Ch

    C. McClenahan, Ratemaking, 4th Edition, Casualty Actuarial Society, 1984, Ch. 3, pp. 75–148

  19. [20]

    de Jong, G

    P. de Jong, G. Z. Heller, Generalized Linear Models for Insurance Data, Cambridge University Press, Cambridge, 2008. URL https://feb.kuleuven.be/public/u0017833/boek.pdf 29

  20. [21]

    L. A. Baxter, S. M. Coutts, G. A. F. Ross, Applications of linear models in motor insurance, in: 21st International Congress of Actuaries, Vol. 2, Elsevier, 1980, pp. 11–29

  21. [22]

    S. M. Coutts, Motor insurance rating, an actuarial approach, Journal of the Institute of Actuaries 111 (1) (1984) 87–148. URL https://www.jstor.org/stable/41140673

  22. [23]

    R. A. Bailey, L. J. Simon, Two studies in automobile insurance ratemaking, ASTIN Bulletin: The Journal of the International Actuarial Association 1 (4) (1960) 192–217. doi:10.1017/ S0515036100009569

  23. [24]

    David, Auto insurance premium calculation using generalized linear models, Procedia Economics and Finance 20 (2015) 147–156

    M. David, Auto insurance premium calculation using generalized linear models, Procedia Economics and Finance 20 (2015) 147–156. doi:10.1016/S2212-5671(15)00059-3

  24. [25]

    Ohlsson, J

    E. Ohlsson, J. B., Non-Life Insurance Pricing with Generalized Linear Models, Springer, Berlin, Heidelberg, 2010. URL https://link.springer.com/book/10.1007/978-3-642-10791-7

  25. [26]

    Haberman, A

    S. Haberman, A. E. Renshaw, Generalized linear models and actuarial science, Journal of the Royal Statistical Society 45 (4) (1996) 407–436. doi:10.2307/2988543

  26. [27]

    R. Kaas, M. Goovaerts, J. Dhaene, M. Denuit, Modern actuarial risk theory : using R, 2nd Edition, Springer, Berlin, 2009. doi:10.1007/978-3-540-70998-5

  27. [28]

    E. W. Frees, Regression modeling with actuarial and financial applications, International series on actuarial science, Cambridge University Press, Cambridge, 2010

  28. [29]

    M. V. W¨ uthrich, C. Buser, Data analytics for non-life insurance pricing, Swiss Finance Institute Research Paper No. 16-68. URL https://ssrn.com/abstract=2870308

  29. [30]

    G. C. Evans, The dynamics of monopoly, The American Mathematical Monthly 31 (2) (1924) 77–83. doi:10.2307/2300113. URL http://www.jstor.org/stable/2300113

  30. [31]

    G. C. Evans, Mathematical introduction to economics, McGraw-Hill, New York, 1930. URL http://hdl.handle.net/2027/uc1.b3427705

  31. [32]

    E. A. Greenleaf, The impact of reference price effects on the profitability of price promotions, Mar- keting Science 14 (1) (1995) 82–104. doi:10.1287/mksc.14.1.82. URL https://pubsonline.informs.org/doi/pdf/10.1287/mksc.14.1.82

  32. [33]

    Kopalle, A

    P. Kopalle, A. Rao, J. Assuncao, Asymmetric reference price effects and dynamic pricing policies, Marketing Science 15 (1) (1996) 60–85. URL http://www.jstor.org/stable/184184

  33. [34]

    Fibich, A

    G. Fibich, A. Gavious, Explicit solutions of optimization models and differential games with nons- mooth (asymmetric) reference-price effects, Operations Research 51 (5) (2003) 721–734. URL http://www.jstor.org/stable/4132433

  34. [35]

    Y. Aviv, G. Vulcano, Dynamic list pricing, in: The Oxford handbook of pricing management, Oxford University Press, UK, 2012, Ch. 23, pp. 522–58.doi:10.1093/oxfordhb/9780199543175.013.0023

  35. [36]

    Den Boer, Dynamic pricing and learning: Historical origins, current research, and new directions, Surveys in operations research and management science 20 (1) (2015) 1–18

    A. Den Boer, Dynamic pricing and learning: Historical origins, current research, and new directions, Surveys in operations research and management science 20 (1) (2015) 1–18. doi:10.1016/j.sorms. 2015.03.001. 30

  36. [37]

    M. V. W¨ uthrich, Non-life insurance: Mathematics & statistics (2017). URL http://dx.doi.org/10.2139/ssrn.2319328

  37. [38]

    R. W. M. Wedderburn, Quasi-likelihood functions, generalized linear models, and the gauss-newton method, Biometrika 61 (3) (1974) 439–447. doi:10.2307/2334725

  38. [39]

    T. W. Anderson, J. B. Taylor, Strong consistency of least squares estimates in normal linear regres- sion, The Annals of Statistics 4 (4) (1976) 788–790. doi:10.1214/aos/1176343552

  39. [40]

    T. L. Lai, H. Robbins, C. Z. Wei, Strong consistency of least squares estimates in multiple regression, Proceedings of the National Academy of Sciences of the United States of America 75 (7) (1978) 343–

  40. [41]

    doi:10.1016/0047-259X(79)90093-9

  41. [42]

    K. Chen, I. Hu, Z. Ying, Strong consistency of maximum quasi-likelihood estimators in generalized linear models with fixed and adaptive designs, The Annals of Statistics 27 (4) (1999) 1155–1163. doi:10.1214/aos/1017938919

  42. [43]

    Bubeck, N

    S. Bubeck, N. Cesa-Bianchi, Regret analysis of stochastic and nonstochastic multi-armed ban- dit problems, Foundations and Trends in Machine Learning 5 (1) (2012) 1–122. doi:10.1561/ 2200000024. URL http://sbubeck.com/SurveyBCB12.pdf

  43. [44]

    P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning 47 (2-3) (2002) 235–256. doi:10.1023/A:1013689704352

  44. [45]

    T. Lai, H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathe- matics 6 (1) (1985) 4–22. URL http://dx.doi.org/10.1016/0196-8858(85)90002-8

  45. [46]

    V. Dani, T. P. Hayes, S. M. Kakade, Stochastic linear optimization under bandit feedback, in: 21st Annual Conference on Learning Theory (COLT), 2008, pp. 355–366. URL http://colt2008.cs.helsinki.fi/papers/80-Dani.pdf

  46. [47]

    Rusmevichientong, J

    P. Rusmevichientong, J. N. Tsitsiklis, Linearly parameterized bandits, Mathematics of Operations Research 35 (2) (2010) 395–411. URL https://pubsonline.informs.org/doi/pdf/10.1287/moor.1100.0446

  47. [48]

    Filippi, O

    S. Filippi, O. Cappe, A. Garivier, C. Szepesv´ ari, Parametric bandits: The generalized linear case, in: Advances in Neural Information Processing Systems 23 (NIPS 2010), 2010, pp. 586–594. URL https://sites.ualberta.ca/~szepesva/papers/GenLinBandits-NIPS2010.pdf

  48. [49]

    Bubeck, R

    S. Bubeck, R. Munos, G. Stoltz, N. Cesa-Bianchi, X-armed bandits, Journal of Machine Learning Research 12 (2011) 1655–1695. URL http://www.jmlr.org/papers/volume12/bubeck11a/bubeck11a.pdf

  49. [50]

    Kleinberg, A

    R. Kleinberg, A. Slivkins, E. Upfal, Multi-armed bandits in metric spaces, in: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, ACM, 2008, pp. 681–690. doi:10. 1145/1374376.1374475. URL http://doi.acm.org/10.1145/1374376.1374475

  50. [51]

    Agrawal, N

    S. Agrawal, N. Goyal, Analysis of thompson sampling for the multi-armed bandit problem, in: 25th Annual Conference on Learning Theory, Vol. 23, 2012, pp. 39.1–39.26. URL http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf

  51. [52]

    Agrawal, N

    S. Agrawal, N. Goyal, Further optimal regret bounds for thompson sampling, in: 16th International Conference on Artificial Intelligence and Statistics (AISTATS), Vol. 31, 2013, pp. 90–107. URL http://proceedings.mlr.press/v31/agrawal13a.pdf 31

  52. [53]

    Russo, B

    D. Russo, B. Van Roy, Learning to optimize via posterior sampling, Mathematics of Operations Research 39 (4) (2013) 1221–1243. URL https://pubsonline.informs.org/doi/pdf/10.1287/moor.2014.0650

  53. [54]

    Moˇ ckus, V

    J. Moˇ ckus, V. Tiesis, A. Zilinskas, On bayesian methods for seeking the extremum, in: Towards Global Optimization, 2nd Edition, Vol. 2, Elsevier Science Ltd, North Holland, Amsterdam, 1978, pp. 117–129

  54. [55]

    Snoek, H

    J. Snoek, H. Larochelle, R. P. Adams, Practical bayesian optimization of machine learning algo- rithms, in: F. Pereira, C. J. C. Burges, L. Bottou, K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25, Curran Associates, Inc., 2012, pp. 2951–2959. URL http://papers.nips.cc/paper/4522-practical-bayesian-optimization

  55. [56]

    Gallego, G

    G. Gallego, G. Van Ryzin, Optimal dynamic pricing of inventories with stochastic demand over finite horizons, Management Science 40 (8) (1994) 999–1020. URL http://www.jstor.org.manchester.idm.oclc.org/stable/2633090

  56. [57]

    Y. Aviv, A. Pazgal, A partially observed markov decision process for dynamic pricing, Management Science 51 (9) (2005) 1400–1416. URL http://www.jstor.org.manchester.idm.oclc.org/stable/20110429

  57. [58]

    Exponential spectra in $L^2(\mu)$

    J. Harrison, N. Keskin, A. Zeevi, Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution, Management Science 58 (3) (2012) 570–586. URL https://doi.org/10.1287/mnsc.1110.1426

  58. [59]

    Broder, P

    J. Broder, P. Rusmevichientong, Dynamic pricing under a general parametric choice model, Opera- tions Research 60 (4) (2012) 965–980. URL http://search.proquest.com/docview/1041256005/

  59. [60]

    Kleinberg, T

    R. Kleinberg, T. Leighton, The value of knowing a demand curve: bounds on regret for online posted-price auctions, in: Proceedings of the 44th IEEE Symposium on Foundations of Computer Science, IEEE, USA, 2003, pp. 594–605. doi:10.1109/SFCS.2003.1238232

  60. [61]

    Cope, Bayesian strategies for dynamic pricing in e-commerce, Naval Research Logistics (NRL) 54 (3) (2007) 265–281

    E. Cope, Bayesian strategies for dynamic pricing in e-commerce, Naval Research Logistics (NRL) 54 (3) (2007) 265–281. doi:10.1002/nav.20204

  61. [62]

    Rusmevichientong, B

    P. Rusmevichientong, B. Van Roy, P. W. Glynn, A nonparametric approach to multiproduct pricing, Operations Research 54 (1) (2006) 82–98. doi:10.1287/opre.1050.0252

  62. [63]

    Besbes, A

    O. Besbes, A. Zeevi, Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms, Operations Research 57 (6) (2009) 1407–1420. URL http://www.jstor.org/stable/25614853

  63. [64]

    Besbes, A

    O. Besbes, A. Zeevi, Blind network revenue management, Operations Research 60 (6) (2012) 1537– 1550. URL https://pubsonline.informs.org/doi/pdf/10.1287/opre.1120.1103

  64. [65]

    Dud´ ık, D

    M. Dud´ ık, D. J. Hsu, S. Kale, N. Karampatziakis, J. Langford, L. Reyzin, T. Zhangn, Efficient optimal learning for contextual bandits, in: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI), 2011, pp. 1–20. URL http://www.cs.columbia.edu/~djhsu/papers/amo.pdf

  65. [66]

    Chapelle, L

    O. Chapelle, L. Li, An empirical evaluation of thompson sampling, in: J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 24, Curran Associates, Inc., 2012, pp. 2249–2257. URL https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/thompson. pdf 32

  66. [67]

    Cesa-Bianchi, C

    N. Cesa-Bianchi, C. Gentile, Y. Mansour, A. Minora, Delay and cooperation in nonstochastic bandits, Journal of Machine Learning Research 20 (17) (2016) 1–38. URL http://www.jmlr.org/papers/volume20/17-631/17-631.pdf

  67. [68]

    Pike-Burke, S

    C. Pike-Burke, S. Agrawal, C. Szepesvari, S. Grunewalder, Bandits with delayed, aggregated anony- mous feedback, in: Proceedings of Machine Learning (ICML), Vol. 80, 2018, pp. 4105–4113. URL http://proceedings.mlr.press/v80/pike-burke18a/pike-burke18a.pdf

  68. [69]

    Agarwal, J

    A. Agarwal, J. C. Duchi, Distributed delayed stochastic optimization, in: Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS), NIPS’11, Curran As- sociates Inc., USA, 2011, pp. 2312–2320. URL https://papers.nips.cc/paper/4247-distributed-delayed-stochastic-optimization

  69. [70]

    Desautels, A

    T. Desautels, A. Krause, J. W. Burdick, Parallelizing exploration-exploitation tradeoffs in gaussian process bandit optimization, Journal of Machine Learning Research 15 (2014) 4053–4103. URL http://jmlr.org/papers/volume15/desautels14a/desautels14a.pdf

  70. [71]

    Stochastic Bandit Models for Delayed Conversions

    C. Vernade, O. Capp´ e, V. Perchet, Stochastic bandit models for delayed conversions, arXiv preprint abs/1706.09186. arXiv:1706.09186. URL http://arxiv.org/abs/1706.09186

  71. [72]

    Joulani, A

    P. Joulani, A. Gy¨ orgy, C. Szepesv´ ari, Online learning under delayed feedback, in: Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, Georgia, USA, 2013, pp. 1453–1461. URL http://proceedings.mlr.press/v28/joulani13.pdf

  72. [73]

    T. W. Anderson, J. B. Taylor, Strong consistency of least squares estimates in dynamic models, The Annals of Statistics 7 (3) (1979) 484–489. doi:10.1214/aos/1176344670

  73. [74]

    N. B. Keskin, A. Zeevi, Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies, Operations Research 62 (5) (2014) 1142–1167. URL https://doi.org/10.1287/opre.2014.1294

  74. [75]

    Joulani, A

    P. Joulani, A. Gy¨ orgy, C. Szepesv´ ari, Delay-tolerant online convex optimization: Unified analysis and adaptive-gradient algorithms, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), Phoenix, Arizona, USA, 2016, pp. 1744–1750. URL https://sites.ualberta.ca/~pooria/publications/AAAI16-Extended.pdf

  75. [76]

    M. S. Bartlett, An inverse matrix adjustment arising in discriminant analysis, Annals of Mathemat- ical Statistics 22 (1) (1951) 107–111. doi:10.1214/aoms/1177729698

  76. [77]

    J. J. Duistermaat, J. A. C. Kolk, Multidimensional Real Analysis I: Differentiation, Cambridge Studies in Advanced Mathematics, Cambridge University Press, Cambridge, Mass, 2004. doi: 10.1017/CBO978051161671

  77. [78]

    Dugundji, Topology, Series in advanced mathematics, Allyn & Bacon, Boston, 1966

    J. Dugundji, Topology, Series in advanced mathematics, Allyn & Bacon, Boston, 1966

  78. [79]

    Y. S. Chow, Local convergence of martingales and the law of large numbers, The Annals of Mathe- matical Statistics 36 (2) (1965) 552–558. doi:10.1214/aoms/1177700166

  79. [80]

    Freedman, Another note on the borel-cantelli lemma and the strong law, with the poisson approxi- mation as a by-product, Annals of Probability 1 (6) (1973) 910–925

    D. Freedman, Another note on the borel-cantelli lemma and the strong law, with the poisson approxi- mation as a by-product, Annals of Probability 1 (6) (1973) 910–925. doi:10.1214/aop/1176996800. 33