Adaptive Pricing in Insurance: Generalized Linear Models and Gaussian Process Regression Approaches

Neil Walton; Yuqing Zhang

arxiv: 1907.05381 · v1 · pith:EELGPZWHnew · submitted 2019-07-02 · 💰 econ.EM · math.OC· q-fin.MF· q-fin.RM· stat.ML

Adaptive Pricing in Insurance: Generalized Linear Models and Gaussian Process Regression Approaches

Yuqing Zhang , Neil Walton This is my paper

Pith reviewed 2026-05-25 11:00 UTC · model grok-4.3

classification 💰 econ.EM math.OCq-fin.MFq-fin.RMstat.ML

keywords adaptive pricinginsurancegeneralized linear modelsgaussian process regressiondynamic pricingmaximum quasi-likelihood estimationrevenue managementonline learning

0 comments

The pith

If insurance prices are chosen with suitably decreasing variability, maximum quasi-likelihood estimates converge to true parameter values and prices converge to the revenue-maximizing level.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats insurance pricing as an online revenue management task in which an insurer must set prices for a new product to maximize long-run revenue. Demand and claims are modeled by generalized linear models, the standard approach in insurance. An adaptive GLM policy uses maximum quasi-likelihood estimation to learn unknown parameters while choosing prices that trade off learning against immediate revenue. The central result is that when price variability is reduced at a suitable rate, the estimates exist and converge to the correct values, which forces the chosen prices themselves to converge to the optimum. A parallel adaptive Gaussian process policy samples demand and claims from Gaussian processes and selects prices via upper confidence bounds; both policies are also examined when claim information arrives with delay.

Core claim

In the adaptive GLM design, if prices are chosen with suitably decreasing variability, the MQLE parameters eventually exist and converge to the correct values, which in turn implies that the sequence of chosen prices will also converge to the optimal price. The adaptive GP regression model samples demand and claims from Gaussian Processes and selects selling prices by the upper confidence bound rule. Both algorithms are analyzed under delayed claims.

What carries the argument

maximum quasi-likelihood estimation (MQLE) inside an adaptive pricing loop whose price sequence is constructed with suitably decreasing variability

If this is right

The chosen prices converge to the revenue-maximizing price.
The same convergence holds when claim information is received with delay.
An alternative Gaussian process policy achieves comparable learning and exploitation balance without parametric assumptions.
Regret, measured as expected revenue loss relative to the optimal price, is controlled through the exploration schedule.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The convergence argument may extend to other revenue-management domains that already use GLM demand models.
Real-world insurance data could be used to test whether the required rate of variability reduction produces acceptable short-term revenue loss.
Hybrid policies that switch from GP exploration to GLM exploitation once parameters stabilize could reduce long-run regret.

Load-bearing premise

Demand and claims follow generalized linear models and the pricing rule reduces variability at a rate sufficient for the estimates to exist and converge.

What would settle it

A sequence of observed demands and claims generated under a pricing policy whose variability does not decrease sufficiently, such that the MQLE estimates fail to converge or the prices fail to approach the revenue optimum.

Figures

Figures reproduced from arXiv: 1907.05381 by Neil Walton, Yuqing Zhang.

**Figure 2.** Figure 2: Cumulative regret and convergence rate for GLM algorithm. GLM denotes the non-delayed case and D-GLM [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗

**Figure 3.** Figure 3: Cumulative regret and convergence rate for GP algorithm. GP denotes the non-delayed case and D-GP denotes [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗

read the original abstract

We study the application of dynamic pricing to insurance. We view this as an online revenue management problem where the insurance company looks to set prices to optimize the long-run revenue from selling a new insurance product. We develop two pricing models: an adaptive Generalized Linear Model (GLM) and an adaptive Gaussian Process (GP) regression model. Both balance between exploration, where we choose prices in order to learn the distribution of demands & claims for the insurance product, and exploitation, where we myopically choose the best price from the information gathered so far. The performance of the pricing policies is measured in terms of regret: the expected revenue loss caused by not using the optimal price. As is commonplace in insurance, we model demand and claims by GLMs. In our adaptive GLM design, we use the maximum quasi-likelihood estimation (MQLE) to estimate the unknown parameters. We show that, if prices are chosen with suitably decreasing variability, the MQLE parameters eventually exist and converge to the correct values, which in turn implies that the sequence of chosen prices will also converge to the optimal price. In the adaptive GP regression model, we sample demand and claims from Gaussian Processes and then choose selling prices by the upper confidence bound rule. We also analyze these GLM and GP pricing algorithms with delayed claims. Although similar results exist in other domains, this is among the first works to consider dynamic pricing problems in the field of insurance. We also believe this is the first work to consider Gaussian Process regression in the context of insurance pricing. These initial findings suggest that online machine learning algorithms could be a fruitful area of future investigation and application in insurance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper applies established dynamic pricing methods to insurance with a GLM convergence result under decreasing variability and a GP UCB variant, with the main value in the domain application and delayed-claims extension.

read the letter

This paper applies dynamic pricing techniques to insurance by developing adaptive GLM and GP regression models for setting prices to maximize long-run revenue. The central result is that if the pricing policy has suitably decreasing variability, then the maximum quasi-likelihood estimates for the GLM parameters will eventually exist and converge to the true values, implying that the prices converge to the optimal one. They also analyze the case with delayed claims and use upper confidence bound for the GP model to choose prices. What the paper does well is to incorporate the standard insurance modeling of demand and claims with GLMs into an online learning framework, and to provide the theoretical backing for consistency in this adaptive setting. The delayed claims analysis is a nice addition that addresses a real-world aspect of insurance where claims may not be observed immediately. The positioning as one of the first works in insurance dynamic pricing is reasonable given the literature. The soft spots are minor. The GP approach follows the usual UCB rule without much customization beyond the insurance context, so the contribution there is mostly the application rather than new theory. There is no detailed empirical section described that shows performance on actual or simulated insurance data, which would strengthen the practical case and allow readers to see the magnitude of regret reduction. The regret analysis is mentioned but the specific bounds are not given in the provided abstract, though the convergence claim seems to hold under the stated conditions without internal contradictions. The citation pattern looks standard, drawing on GLM estimation and regret in revenue management without over-relying on self-citation. This paper is for readers in actuarial science or operations research who want to see dynamic pricing applied to insurance. It would be of value to someone looking for initial theoretical results in this intersection of fields. I recommend sending it for peer review, as the core argument is coherent and the application fills a gap in the literature even if the methods themselves are established elsewhere.

Referee Report

2 major / 2 minor

Summary. The manuscript develops two adaptive pricing policies for insurance products where demand and claims follow GLMs. The first uses maximum quasi-likelihood estimation (MQLE) under a pricing policy with suitably decreasing variability, proving eventual existence and consistency of the MQLE estimates and consequent convergence of prices to the revenue-maximizing price. The second employs Gaussian process regression with an upper-confidence-bound rule for price selection. Both are analyzed for regret, and the GLM and GP approaches are extended to the case of delayed claims. The work positions these as among the first applications of online learning methods to insurance pricing.

Significance. If the convergence and regret results are rigorously established, the paper makes a useful contribution by importing standard adaptive-design techniques for GLMs and introducing GP regression into insurance pricing, a domain where such dynamic, data-driven policies have received limited attention. The explicit link between controlled exploration (decreasing variability) and consistency of MQLE is a standard but practically relevant technical step; the GP-UCB analysis and delayed-claims extension broaden the scope.

major comments (2)

[theoretical results on MQLE convergence] The central claim (abstract and theoretical results section) that a policy with suitably decreasing variability guarantees eventual existence and consistency of the MQLE, which in turn implies price convergence, rests on standard GLM adaptive-design conditions. The manuscript should state the precise rate condition on price variability (e.g., the summability or decay rate required for the design matrix to accumulate full rank) and supply the key steps or error bounds that close the implication from parameter consistency to price convergence.
[Gaussian process regression model and regret analysis] For the GP regression approach, the regret analysis and UCB rule require explicit assumptions on the kernel, the sub-Gaussian noise, and the resulting regret bound (e.g., O(sqrt(T log T)) or similar). Without these, it is difficult to compare the GP policy's performance guarantees with the GLM policy or with existing UCB results in the literature.

minor comments (2)

Notation for the decreasing-variability condition and for the delayed-claims observation process should be introduced once and used consistently; a short table summarizing the two policies (assumptions, estimation method, exploration mechanism, regret order) would aid readability.
[Introduction] The abstract states that 'similar results exist in other domains'; adding two or three key references from the adaptive-design or online-learning literature would strengthen the positioning without lengthening the introduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. The suggestions help clarify the theoretical conditions and assumptions. We address each major comment below and have revised the manuscript accordingly.

read point-by-point responses

Referee: [theoretical results on MQLE convergence] The central claim (abstract and theoretical results section) that a policy with suitably decreasing variability guarantees eventual existence and consistency of the MQLE, which in turn implies price convergence, rests on standard GLM adaptive-design conditions. The manuscript should state the precise rate condition on price variability (e.g., the summability or decay rate required for the design matrix to accumulate full rank) and supply the key steps or error bounds that close the implication from parameter consistency to price convergence.

Authors: We agree that the precise rate condition and key steps should be stated explicitly for rigor. In the revised manuscript, we will add that the price variability sequence must satisfy sum_t var(p_t) = infinity (ensuring the design matrix accumulates full rank asymptotically, per standard adaptive GLM results such as those in Lai and Wei (1982)). We will include the key steps: (i) the MQLE exists eventually under this condition by showing the information matrix diverges; (ii) consistency follows from the martingale convergence theorem applied to the quasi-score; (iii) price convergence to the optimum follows from continuous mapping and a uniform bound on the price deviation |p_t - p*| <= C ||theta_hat_t - theta*||, with the error bound O(1/sqrt(t)) in probability. These additions close the implication without altering the main results. revision: yes
Referee: [Gaussian process regression model and regret analysis] For the GP regression approach, the regret analysis and UCB rule require explicit assumptions on the kernel, the sub-Gaussian noise, and the resulting regret bound (e.g., O(sqrt(T log T)) or similar). Without these, it is difficult to compare the GP policy's performance guarantees with the GLM policy or with existing UCB results in the literature.

Authors: We agree that explicit assumptions and the regret bound should be stated to facilitate comparison. In the revised manuscript, we will specify: the kernel is continuous and positive definite (e.g., squared exponential with length-scale l); observations are sub-Gaussian with variance proxy sigma^2; and under these, the GP-UCB policy achieves regret O(sqrt(T log T)) (following standard bounds from Srinivas et al. (2010) adapted to the insurance setting with delayed claims). This allows direct comparison to the GLM policy's O(T^{2/3}) regret and to the broader UCB literature. The analysis for delayed claims remains unchanged. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central derivation is a standard consistency result for MQLE under adaptive GLM designs with controlled exploration (decreasing price variability). The abstract states the implication directly as a theorem to be shown, notes that similar results exist in other domains, and relies on external GLM modeling conventions plus regret definitions rather than any self-referential fit, ansatz, or self-citation chain. No load-bearing step reduces by construction to the paper's own inputs or prior self-work; the argument is self-contained against external benchmarks in adaptive design literature.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption that demand and claims follow GLMs and on the modeling choice of decreasing price variability to guarantee MQLE convergence.

axioms (1)

domain assumption Demand and claims are modeled by generalized linear models
Described as commonplace in insurance and used as the basis for both pricing models.

pith-pipeline@v0.9.0 · 5837 in / 1037 out tokens · 26630 ms · 2026-05-25T11:00:47.132932+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages · 3 internal anchors

[1]

R. L. Phillips, Pricing and revenue optimization, Stanford Business Books, Stanford, Calif., 2005

work page 2005
[2]

Talluri, G

K. Talluri, G. van Ryzin, The theory and practice of revenue management, Springer, Boston, MA, 2005. URL https://doi.org/10.1007/b139000

work page doi:10.1007/b139000 2005
[3]

J. A. Nelder, R. W. M. Wedderburn, Generalized linear models, Journal of the Royal Statistical Society 135 (3) (1972) 370–384. doi:10.2307/2344614. URL http://www.jstor.org/stable/2344614

work page doi:10.2307/2344614 1972
[4]

McCullagh, J

P. McCullagh, J. A. Nelder, Generalized Linear Models, 2nd Edition, Chapman and Hall, London, 1989. URL http://www.utstat.toronto.edu/~brunner/oldclass/2201s11/readings/glmbook.pdf 28

work page 1989
[5]

T. L. Lai, H. Robbins, Adaptive design and stochastic approximation, The Annals of Statistics 7 (6) (1979) 1196–1221. doi:10.1214/aos/1176344840

work page doi:10.1214/aos/1176344840 1979
[6]

T. Lai, H. Robbins, Consistency and asymptotic eﬃciency of slope estimates in stochastic approxi- mation schemes, Z. Wahrscheinlichkeitstheorie verw. Gebiete 56 (3) (1981) 329–360. URL https://doi.org/10.1007/BF00536178

work page doi:10.1007/bf00536178 1981
[7]

T. Lai, H. Robbins, Iterated least squares in multiperiod control, Advances in Applied Mathematics 3 (1) (1982) 50–73. URL https://doi.org/10.1016/S0196-8858(82)80005-5

work page doi:10.1016/s0196-8858(82)80005-5 1982
[8]

T. L. Lai, C. Z. Wei, Least squares estimates in stochastic regression models with applications to identiﬁcation and control of dynamic systems, The Annals of Statistics 10 (1) (1982) 154–166. doi:10.1214/aos/1176345697

work page doi:10.1214/aos/1176345697 1982
[9]

Lai, Stochastic approximation: invited paper, The Annals of Statistics 31 (2) (2003) 391–406

T. Lai, Stochastic approximation: invited paper, The Annals of Statistics 31 (2) (2003) 391–406. doi:10.1214/aos/1051027873

work page doi:10.1214/aos/1051027873 2003
[10]

Den Boer, B

A. Den Boer, B. Zwart, Simultaneously learning and optimizing using controlled variance pricing, Management Science 60 (3) (2013) 770–783. URL https://doi.org/10.1287/mnsc.2013.1788

work page doi:10.1287/mnsc.2013.1788 2013
[11]

Moˇ ckus, Bayesian Approach to Global Optimization, Mathematics and its Applications, Springer, Netherlands, 1989

J. Moˇ ckus, Bayesian Approach to Global Optimization, Mathematics and its Applications, Springer, Netherlands, 1989. doi:10.1007/978-94-009-0909-0

work page doi:10.1007/978-94-009-0909-0 1989
[12]

Moˇ ckus, Bayesian approach to global optimization and application to multiobjective and con- strained problems, Journal of Global Optimization 4 (4) (1994) 347–365

J. Moˇ ckus, Bayesian approach to global optimization and application to multiobjective and con- strained problems, Journal of Global Optimization 4 (4) (1994) 347–365. doi:10.1007/BF01099263

work page doi:10.1007/bf01099263 1994
[14]

Srinivas, A

N. Srinivas, A. Krause, S. M. Kakade, M. Seeger, Information-theoretic regret bounds for gaussian process optimization in the bandit setting, IEEE Transactions on Information Theory 58 (5) (2012) 389–434. doi:10.1109/TIT.2011.2182033. URL https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6138914

work page doi:10.1109/tit.2011.2182033 2012
[15]

C. E. Rasmussen, C. K. I. Williams, Gaussian processes for machine learning, Adaptive computation and machine learning, MIT Press, Cambridge, Mass, 2006

work page 2006
[16]

A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

E. Brochu, V. M. Cora, N. de Freitas, A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning, CoRR abs/1012.2599. URL http://arxiv.org/abs/1012.2599

work page internal anchor Pith review Pith/arXiv arXiv
[17]

URL http://www.biztositasiszemle.hu/files/201512/sigma6_2015_en.pdf

SwissRe, Life insurance in the digital age: fundamental transformation ahead, Swiss Re Sigma Report. URL http://www.biztositasiszemle.hu/files/201512/sigma6_2015_en.pdf

work page
[18]

B¨ uhlmann, Mathematical Methods in Risk Theory, Grundlehren der mathematischen Wis- senschaften, Springer-Verlag Berlin Heidelberg, 2005

H. B¨ uhlmann, Mathematical Methods in Risk Theory, Grundlehren der mathematischen Wis- senschaften, Springer-Verlag Berlin Heidelberg, 2005. doi:10.1007/978-3-540-30711-2

work page doi:10.1007/978-3-540-30711-2 2005
[19]

McClenahan, Ratemaking, 4th Edition, Casualty Actuarial Society, 1984, Ch

C. McClenahan, Ratemaking, 4th Edition, Casualty Actuarial Society, 1984, Ch. 3, pp. 75–148

work page 1984
[20]

de Jong, G

P. de Jong, G. Z. Heller, Generalized Linear Models for Insurance Data, Cambridge University Press, Cambridge, 2008. URL https://feb.kuleuven.be/public/u0017833/boek.pdf 29

work page 2008
[21]

L. A. Baxter, S. M. Coutts, G. A. F. Ross, Applications of linear models in motor insurance, in: 21st International Congress of Actuaries, Vol. 2, Elsevier, 1980, pp. 11–29

work page 1980
[22]

S. M. Coutts, Motor insurance rating, an actuarial approach, Journal of the Institute of Actuaries 111 (1) (1984) 87–148. URL https://www.jstor.org/stable/41140673

work page arXiv 1984
[23]

R. A. Bailey, L. J. Simon, Two studies in automobile insurance ratemaking, ASTIN Bulletin: The Journal of the International Actuarial Association 1 (4) (1960) 192–217. doi:10.1017/ S0515036100009569

work page 1960
[24]

David, Auto insurance premium calculation using generalized linear models, Procedia Economics and Finance 20 (2015) 147–156

M. David, Auto insurance premium calculation using generalized linear models, Procedia Economics and Finance 20 (2015) 147–156. doi:10.1016/S2212-5671(15)00059-3

work page doi:10.1016/s2212-5671(15)00059-3 2015
[25]

Ohlsson, J

E. Ohlsson, J. B., Non-Life Insurance Pricing with Generalized Linear Models, Springer, Berlin, Heidelberg, 2010. URL https://link.springer.com/book/10.1007/978-3-642-10791-7

work page doi:10.1007/978-3-642-10791-7 2010
[26]

Haberman, A

S. Haberman, A. E. Renshaw, Generalized linear models and actuarial science, Journal of the Royal Statistical Society 45 (4) (1996) 407–436. doi:10.2307/2988543

work page doi:10.2307/2988543 1996
[27]

R. Kaas, M. Goovaerts, J. Dhaene, M. Denuit, Modern actuarial risk theory : using R, 2nd Edition, Springer, Berlin, 2009. doi:10.1007/978-3-540-70998-5

work page doi:10.1007/978-3-540-70998-5 2009
[28]

E. W. Frees, Regression modeling with actuarial and ﬁnancial applications, International series on actuarial science, Cambridge University Press, Cambridge, 2010

work page 2010
[29]

M. V. W¨ uthrich, C. Buser, Data analytics for non-life insurance pricing, Swiss Finance Institute Research Paper No. 16-68. URL https://ssrn.com/abstract=2870308

work page
[30]

G. C. Evans, The dynamics of monopoly, The American Mathematical Monthly 31 (2) (1924) 77–83. doi:10.2307/2300113. URL http://www.jstor.org/stable/2300113

work page doi:10.2307/2300113 1924
[31]

G. C. Evans, Mathematical introduction to economics, McGraw-Hill, New York, 1930. URL http://hdl.handle.net/2027/uc1.b3427705

work page 1930
[32]

E. A. Greenleaf, The impact of reference price eﬀects on the proﬁtability of price promotions, Mar- keting Science 14 (1) (1995) 82–104. doi:10.1287/mksc.14.1.82. URL https://pubsonline.informs.org/doi/pdf/10.1287/mksc.14.1.82

work page doi:10.1287/mksc.14.1.82 1995
[33]

Kopalle, A

P. Kopalle, A. Rao, J. Assuncao, Asymmetric reference price eﬀects and dynamic pricing policies, Marketing Science 15 (1) (1996) 60–85. URL http://www.jstor.org/stable/184184

work page 1996
[34]

Fibich, A

G. Fibich, A. Gavious, Explicit solutions of optimization models and diﬀerential games with nons- mooth (asymmetric) reference-price eﬀects, Operations Research 51 (5) (2003) 721–734. URL http://www.jstor.org/stable/4132433

work page arXiv 2003
[35]

Y. Aviv, G. Vulcano, Dynamic list pricing, in: The Oxford handbook of pricing management, Oxford University Press, UK, 2012, Ch. 23, pp. 522–58.doi:10.1093/oxfordhb/9780199543175.013.0023

work page doi:10.1093/oxfordhb/9780199543175.013.0023 2012
[36]

Den Boer, Dynamic pricing and learning: Historical origins, current research, and new directions, Surveys in operations research and management science 20 (1) (2015) 1–18

A. Den Boer, Dynamic pricing and learning: Historical origins, current research, and new directions, Surveys in operations research and management science 20 (1) (2015) 1–18. doi:10.1016/j.sorms. 2015.03.001. 30

work page doi:10.1016/j.sorms 2015
[37]

M. V. W¨ uthrich, Non-life insurance: Mathematics & statistics (2017). URL http://dx.doi.org/10.2139/ssrn.2319328

work page doi:10.2139/ssrn.2319328 2017
[38]

R. W. M. Wedderburn, Quasi-likelihood functions, generalized linear models, and the gauss-newton method, Biometrika 61 (3) (1974) 439–447. doi:10.2307/2334725

work page doi:10.2307/2334725 1974
[39]

T. W. Anderson, J. B. Taylor, Strong consistency of least squares estimates in normal linear regres- sion, The Annals of Statistics 4 (4) (1976) 788–790. doi:10.1214/aos/1176343552

work page doi:10.1214/aos/1176343552 1976
[40]

T. L. Lai, H. Robbins, C. Z. Wei, Strong consistency of least squares estimates in multiple regression, Proceedings of the National Academy of Sciences of the United States of America 75 (7) (1978) 343–

work page 1978
[41]

doi:10.1016/0047-259X(79)90093-9

work page doi:10.1016/0047-259x(79)90093-9
[42]

K. Chen, I. Hu, Z. Ying, Strong consistency of maximum quasi-likelihood estimators in generalized linear models with ﬁxed and adaptive designs, The Annals of Statistics 27 (4) (1999) 1155–1163. doi:10.1214/aos/1017938919

work page doi:10.1214/aos/1017938919 1999
[43]

Bubeck, N

S. Bubeck, N. Cesa-Bianchi, Regret analysis of stochastic and nonstochastic multi-armed ban- dit problems, Foundations and Trends in Machine Learning 5 (1) (2012) 1–122. doi:10.1561/ 2200000024. URL http://sbubeck.com/SurveyBCB12.pdf

work page 2012
[44]

P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning 47 (2-3) (2002) 235–256. doi:10.1023/A:1013689704352

work page doi:10.1023/a:1013689704352 2002
[45]

T. Lai, H. Robbins, Asymptotically eﬃcient adaptive allocation rules, Advances in Applied Mathe- matics 6 (1) (1985) 4–22. URL http://dx.doi.org/10.1016/0196-8858(85)90002-8

work page doi:10.1016/0196-8858(85)90002-8 1985
[46]

V. Dani, T. P. Hayes, S. M. Kakade, Stochastic linear optimization under bandit feedback, in: 21st Annual Conference on Learning Theory (COLT), 2008, pp. 355–366. URL http://colt2008.cs.helsinki.fi/papers/80-Dani.pdf

work page 2008
[47]

Rusmevichientong, J

P. Rusmevichientong, J. N. Tsitsiklis, Linearly parameterized bandits, Mathematics of Operations Research 35 (2) (2010) 395–411. URL https://pubsonline.informs.org/doi/pdf/10.1287/moor.1100.0446

work page doi:10.1287/moor.1100.0446 2010
[48]

Filippi, O

S. Filippi, O. Cappe, A. Garivier, C. Szepesv´ ari, Parametric bandits: The generalized linear case, in: Advances in Neural Information Processing Systems 23 (NIPS 2010), 2010, pp. 586–594. URL https://sites.ualberta.ca/~szepesva/papers/GenLinBandits-NIPS2010.pdf

work page 2010
[49]

Bubeck, R

S. Bubeck, R. Munos, G. Stoltz, N. Cesa-Bianchi, X-armed bandits, Journal of Machine Learning Research 12 (2011) 1655–1695. URL http://www.jmlr.org/papers/volume12/bubeck11a/bubeck11a.pdf

work page 2011
[50]

Kleinberg, A

R. Kleinberg, A. Slivkins, E. Upfal, Multi-armed bandits in metric spaces, in: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, ACM, 2008, pp. 681–690. doi:10. 1145/1374376.1374475. URL http://doi.acm.org/10.1145/1374376.1374475

work page doi:10.1145/1374376.1374475 2008
[51]

Agrawal, N

S. Agrawal, N. Goyal, Analysis of thompson sampling for the multi-armed bandit problem, in: 25th Annual Conference on Learning Theory, Vol. 23, 2012, pp. 39.1–39.26. URL http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf

work page 2012
[52]

Agrawal, N

S. Agrawal, N. Goyal, Further optimal regret bounds for thompson sampling, in: 16th International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), Vol. 31, 2013, pp. 90–107. URL http://proceedings.mlr.press/v31/agrawal13a.pdf 31

work page 2013
[53]

Russo, B

D. Russo, B. Van Roy, Learning to optimize via posterior sampling, Mathematics of Operations Research 39 (4) (2013) 1221–1243. URL https://pubsonline.informs.org/doi/pdf/10.1287/moor.2014.0650

work page doi:10.1287/moor.2014.0650 2013
[54]

Moˇ ckus, V

J. Moˇ ckus, V. Tiesis, A. Zilinskas, On bayesian methods for seeking the extremum, in: Towards Global Optimization, 2nd Edition, Vol. 2, Elsevier Science Ltd, North Holland, Amsterdam, 1978, pp. 117–129

work page 1978
[55]

Snoek, H

J. Snoek, H. Larochelle, R. P. Adams, Practical bayesian optimization of machine learning algo- rithms, in: F. Pereira, C. J. C. Burges, L. Bottou, K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25, Curran Associates, Inc., 2012, pp. 2951–2959. URL http://papers.nips.cc/paper/4522-practical-bayesian-optimization

work page 2012
[56]

Gallego, G

G. Gallego, G. Van Ryzin, Optimal dynamic pricing of inventories with stochastic demand over ﬁnite horizons, Management Science 40 (8) (1994) 999–1020. URL http://www.jstor.org.manchester.idm.oclc.org/stable/2633090

work page arXiv 1994
[57]

Y. Aviv, A. Pazgal, A partially observed markov decision process for dynamic pricing, Management Science 51 (9) (2005) 1400–1416. URL http://www.jstor.org.manchester.idm.oclc.org/stable/20110429

work page arXiv 2005
[58]

Exponential spectra in $L^2(\mu)$

J. Harrison, N. Keskin, A. Zeevi, Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution, Management Science 58 (3) (2012) 570–586. URL https://doi.org/10.1287/mnsc.1110.1426

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1287/mnsc.1110.1426 2012
[59]

Broder, P

J. Broder, P. Rusmevichientong, Dynamic pricing under a general parametric choice model, Opera- tions Research 60 (4) (2012) 965–980. URL http://search.proquest.com/docview/1041256005/

work page arXiv 2012
[60]

Kleinberg, T

R. Kleinberg, T. Leighton, The value of knowing a demand curve: bounds on regret for online posted-price auctions, in: Proceedings of the 44th IEEE Symposium on Foundations of Computer Science, IEEE, USA, 2003, pp. 594–605. doi:10.1109/SFCS.2003.1238232

work page doi:10.1109/sfcs.2003.1238232 2003
[61]

Cope, Bayesian strategies for dynamic pricing in e-commerce, Naval Research Logistics (NRL) 54 (3) (2007) 265–281

E. Cope, Bayesian strategies for dynamic pricing in e-commerce, Naval Research Logistics (NRL) 54 (3) (2007) 265–281. doi:10.1002/nav.20204

work page doi:10.1002/nav.20204 2007
[62]

Rusmevichientong, B

P. Rusmevichientong, B. Van Roy, P. W. Glynn, A nonparametric approach to multiproduct pricing, Operations Research 54 (1) (2006) 82–98. doi:10.1287/opre.1050.0252

work page doi:10.1287/opre.1050.0252 2006
[63]

Besbes, A

O. Besbes, A. Zeevi, Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms, Operations Research 57 (6) (2009) 1407–1420. URL http://www.jstor.org/stable/25614853

work page arXiv 2009
[64]

Besbes, A

O. Besbes, A. Zeevi, Blind network revenue management, Operations Research 60 (6) (2012) 1537– 1550. URL https://pubsonline.informs.org/doi/pdf/10.1287/opre.1120.1103

work page doi:10.1287/opre.1120.1103 2012
[65]

Dud´ ık, D

M. Dud´ ık, D. J. Hsu, S. Kale, N. Karampatziakis, J. Langford, L. Reyzin, T. Zhangn, Eﬃcient optimal learning for contextual bandits, in: Proceedings of the 27th Conference on Uncertainty in Artiﬁcial Intelligence (UAI), 2011, pp. 1–20. URL http://www.cs.columbia.edu/~djhsu/papers/amo.pdf

work page 2011
[66]

Chapelle, L

O. Chapelle, L. Li, An empirical evaluation of thompson sampling, in: J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 24, Curran Associates, Inc., 2012, pp. 2249–2257. URL https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/thompson. pdf 32

work page 2012
[67]

Cesa-Bianchi, C

N. Cesa-Bianchi, C. Gentile, Y. Mansour, A. Minora, Delay and cooperation in nonstochastic bandits, Journal of Machine Learning Research 20 (17) (2016) 1–38. URL http://www.jmlr.org/papers/volume20/17-631/17-631.pdf

work page 2016
[68]

Pike-Burke, S

C. Pike-Burke, S. Agrawal, C. Szepesvari, S. Grunewalder, Bandits with delayed, aggregated anony- mous feedback, in: Proceedings of Machine Learning (ICML), Vol. 80, 2018, pp. 4105–4113. URL http://proceedings.mlr.press/v80/pike-burke18a/pike-burke18a.pdf

work page 2018
[69]

Agarwal, J

A. Agarwal, J. C. Duchi, Distributed delayed stochastic optimization, in: Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS), NIPS’11, Curran As- sociates Inc., USA, 2011, pp. 2312–2320. URL https://papers.nips.cc/paper/4247-distributed-delayed-stochastic-optimization

work page 2011
[70]

Desautels, A

T. Desautels, A. Krause, J. W. Burdick, Parallelizing exploration-exploitation tradeoﬀs in gaussian process bandit optimization, Journal of Machine Learning Research 15 (2014) 4053–4103. URL http://jmlr.org/papers/volume15/desautels14a/desautels14a.pdf

work page 2014
[71]

Stochastic Bandit Models for Delayed Conversions

C. Vernade, O. Capp´ e, V. Perchet, Stochastic bandit models for delayed conversions, arXiv preprint abs/1706.09186. arXiv:1706.09186. URL http://arxiv.org/abs/1706.09186

work page internal anchor Pith review Pith/arXiv arXiv
[72]

Joulani, A

P. Joulani, A. Gy¨ orgy, C. Szepesv´ ari, Online learning under delayed feedback, in: Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, Georgia, USA, 2013, pp. 1453–1461. URL http://proceedings.mlr.press/v28/joulani13.pdf

work page 2013
[73]

T. W. Anderson, J. B. Taylor, Strong consistency of least squares estimates in dynamic models, The Annals of Statistics 7 (3) (1979) 484–489. doi:10.1214/aos/1176344670

work page doi:10.1214/aos/1176344670 1979
[74]

N. B. Keskin, A. Zeevi, Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies, Operations Research 62 (5) (2014) 1142–1167. URL https://doi.org/10.1287/opre.2014.1294

work page doi:10.1287/opre.2014.1294 2014
[75]

Joulani, A

P. Joulani, A. Gy¨ orgy, C. Szepesv´ ari, Delay-tolerant online convex optimization: Uniﬁed analysis and adaptive-gradient algorithms, in: Proceedings of the Thirtieth AAAI Conference on Artiﬁcial Intelligence (AAAI-16), Phoenix, Arizona, USA, 2016, pp. 1744–1750. URL https://sites.ualberta.ca/~pooria/publications/AAAI16-Extended.pdf

work page 2016
[76]

M. S. Bartlett, An inverse matrix adjustment arising in discriminant analysis, Annals of Mathemat- ical Statistics 22 (1) (1951) 107–111. doi:10.1214/aoms/1177729698

work page doi:10.1214/aoms/1177729698 1951
[77]

J. J. Duistermaat, J. A. C. Kolk, Multidimensional Real Analysis I: Diﬀerentiation, Cambridge Studies in Advanced Mathematics, Cambridge University Press, Cambridge, Mass, 2004. doi: 10.1017/CBO978051161671

work page doi:10.1017/cbo978051161671 2004
[78]

Dugundji, Topology, Series in advanced mathematics, Allyn & Bacon, Boston, 1966

J. Dugundji, Topology, Series in advanced mathematics, Allyn & Bacon, Boston, 1966

work page 1966
[79]

Y. S. Chow, Local convergence of martingales and the law of large numbers, The Annals of Mathe- matical Statistics 36 (2) (1965) 552–558. doi:10.1214/aoms/1177700166

work page doi:10.1214/aoms/1177700166 1965
[80]

Freedman, Another note on the borel-cantelli lemma and the strong law, with the poisson approxi- mation as a by-product, Annals of Probability 1 (6) (1973) 910–925

D. Freedman, Another note on the borel-cantelli lemma and the strong law, with the poisson approxi- mation as a by-product, Annals of Probability 1 (6) (1973) 910–925. doi:10.1214/aop/1176996800. 33

work page doi:10.1214/aop/1176996800 1973

[1] [1]

R. L. Phillips, Pricing and revenue optimization, Stanford Business Books, Stanford, Calif., 2005

work page 2005

[2] [2]

Talluri, G

K. Talluri, G. van Ryzin, The theory and practice of revenue management, Springer, Boston, MA, 2005. URL https://doi.org/10.1007/b139000

work page doi:10.1007/b139000 2005

[3] [3]

J. A. Nelder, R. W. M. Wedderburn, Generalized linear models, Journal of the Royal Statistical Society 135 (3) (1972) 370–384. doi:10.2307/2344614. URL http://www.jstor.org/stable/2344614

work page doi:10.2307/2344614 1972

[4] [4]

McCullagh, J

P. McCullagh, J. A. Nelder, Generalized Linear Models, 2nd Edition, Chapman and Hall, London, 1989. URL http://www.utstat.toronto.edu/~brunner/oldclass/2201s11/readings/glmbook.pdf 28

work page 1989

[5] [5]

T. L. Lai, H. Robbins, Adaptive design and stochastic approximation, The Annals of Statistics 7 (6) (1979) 1196–1221. doi:10.1214/aos/1176344840

work page doi:10.1214/aos/1176344840 1979

[6] [6]

T. Lai, H. Robbins, Consistency and asymptotic eﬃciency of slope estimates in stochastic approxi- mation schemes, Z. Wahrscheinlichkeitstheorie verw. Gebiete 56 (3) (1981) 329–360. URL https://doi.org/10.1007/BF00536178

work page doi:10.1007/bf00536178 1981

[7] [7]

T. Lai, H. Robbins, Iterated least squares in multiperiod control, Advances in Applied Mathematics 3 (1) (1982) 50–73. URL https://doi.org/10.1016/S0196-8858(82)80005-5

work page doi:10.1016/s0196-8858(82)80005-5 1982

[8] [8]

T. L. Lai, C. Z. Wei, Least squares estimates in stochastic regression models with applications to identiﬁcation and control of dynamic systems, The Annals of Statistics 10 (1) (1982) 154–166. doi:10.1214/aos/1176345697

work page doi:10.1214/aos/1176345697 1982

[9] [9]

Lai, Stochastic approximation: invited paper, The Annals of Statistics 31 (2) (2003) 391–406

T. Lai, Stochastic approximation: invited paper, The Annals of Statistics 31 (2) (2003) 391–406. doi:10.1214/aos/1051027873

work page doi:10.1214/aos/1051027873 2003

[10] [10]

Den Boer, B

A. Den Boer, B. Zwart, Simultaneously learning and optimizing using controlled variance pricing, Management Science 60 (3) (2013) 770–783. URL https://doi.org/10.1287/mnsc.2013.1788

work page doi:10.1287/mnsc.2013.1788 2013

[11] [11]

Moˇ ckus, Bayesian Approach to Global Optimization, Mathematics and its Applications, Springer, Netherlands, 1989

J. Moˇ ckus, Bayesian Approach to Global Optimization, Mathematics and its Applications, Springer, Netherlands, 1989. doi:10.1007/978-94-009-0909-0

work page doi:10.1007/978-94-009-0909-0 1989

[12] [12]

Moˇ ckus, Bayesian approach to global optimization and application to multiobjective and con- strained problems, Journal of Global Optimization 4 (4) (1994) 347–365

J. Moˇ ckus, Bayesian approach to global optimization and application to multiobjective and con- strained problems, Journal of Global Optimization 4 (4) (1994) 347–365. doi:10.1007/BF01099263

work page doi:10.1007/bf01099263 1994

[13] [14]

Srinivas, A

N. Srinivas, A. Krause, S. M. Kakade, M. Seeger, Information-theoretic regret bounds for gaussian process optimization in the bandit setting, IEEE Transactions on Information Theory 58 (5) (2012) 389–434. doi:10.1109/TIT.2011.2182033. URL https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6138914

work page doi:10.1109/tit.2011.2182033 2012

[14] [15]

C. E. Rasmussen, C. K. I. Williams, Gaussian processes for machine learning, Adaptive computation and machine learning, MIT Press, Cambridge, Mass, 2006

work page 2006

[15] [16]

A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

E. Brochu, V. M. Cora, N. de Freitas, A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning, CoRR abs/1012.2599. URL http://arxiv.org/abs/1012.2599

work page internal anchor Pith review Pith/arXiv arXiv

[16] [17]

URL http://www.biztositasiszemle.hu/files/201512/sigma6_2015_en.pdf

SwissRe, Life insurance in the digital age: fundamental transformation ahead, Swiss Re Sigma Report. URL http://www.biztositasiszemle.hu/files/201512/sigma6_2015_en.pdf

work page

[17] [18]

B¨ uhlmann, Mathematical Methods in Risk Theory, Grundlehren der mathematischen Wis- senschaften, Springer-Verlag Berlin Heidelberg, 2005

H. B¨ uhlmann, Mathematical Methods in Risk Theory, Grundlehren der mathematischen Wis- senschaften, Springer-Verlag Berlin Heidelberg, 2005. doi:10.1007/978-3-540-30711-2

work page doi:10.1007/978-3-540-30711-2 2005

[18] [19]

McClenahan, Ratemaking, 4th Edition, Casualty Actuarial Society, 1984, Ch

C. McClenahan, Ratemaking, 4th Edition, Casualty Actuarial Society, 1984, Ch. 3, pp. 75–148

work page 1984

[19] [20]

de Jong, G

P. de Jong, G. Z. Heller, Generalized Linear Models for Insurance Data, Cambridge University Press, Cambridge, 2008. URL https://feb.kuleuven.be/public/u0017833/boek.pdf 29

work page 2008

[20] [21]

L. A. Baxter, S. M. Coutts, G. A. F. Ross, Applications of linear models in motor insurance, in: 21st International Congress of Actuaries, Vol. 2, Elsevier, 1980, pp. 11–29

work page 1980

[21] [22]

S. M. Coutts, Motor insurance rating, an actuarial approach, Journal of the Institute of Actuaries 111 (1) (1984) 87–148. URL https://www.jstor.org/stable/41140673

work page arXiv 1984

[22] [23]

R. A. Bailey, L. J. Simon, Two studies in automobile insurance ratemaking, ASTIN Bulletin: The Journal of the International Actuarial Association 1 (4) (1960) 192–217. doi:10.1017/ S0515036100009569

work page 1960

[23] [24]

David, Auto insurance premium calculation using generalized linear models, Procedia Economics and Finance 20 (2015) 147–156

M. David, Auto insurance premium calculation using generalized linear models, Procedia Economics and Finance 20 (2015) 147–156. doi:10.1016/S2212-5671(15)00059-3

work page doi:10.1016/s2212-5671(15)00059-3 2015

[24] [25]

Ohlsson, J

E. Ohlsson, J. B., Non-Life Insurance Pricing with Generalized Linear Models, Springer, Berlin, Heidelberg, 2010. URL https://link.springer.com/book/10.1007/978-3-642-10791-7

work page doi:10.1007/978-3-642-10791-7 2010

[25] [26]

Haberman, A

S. Haberman, A. E. Renshaw, Generalized linear models and actuarial science, Journal of the Royal Statistical Society 45 (4) (1996) 407–436. doi:10.2307/2988543

work page doi:10.2307/2988543 1996

[26] [27]

R. Kaas, M. Goovaerts, J. Dhaene, M. Denuit, Modern actuarial risk theory : using R, 2nd Edition, Springer, Berlin, 2009. doi:10.1007/978-3-540-70998-5

work page doi:10.1007/978-3-540-70998-5 2009

[27] [28]

E. W. Frees, Regression modeling with actuarial and ﬁnancial applications, International series on actuarial science, Cambridge University Press, Cambridge, 2010

work page 2010

[28] [29]

M. V. W¨ uthrich, C. Buser, Data analytics for non-life insurance pricing, Swiss Finance Institute Research Paper No. 16-68. URL https://ssrn.com/abstract=2870308

work page

[29] [30]

G. C. Evans, The dynamics of monopoly, The American Mathematical Monthly 31 (2) (1924) 77–83. doi:10.2307/2300113. URL http://www.jstor.org/stable/2300113

work page doi:10.2307/2300113 1924

[30] [31]

G. C. Evans, Mathematical introduction to economics, McGraw-Hill, New York, 1930. URL http://hdl.handle.net/2027/uc1.b3427705

work page 1930

[31] [32]

E. A. Greenleaf, The impact of reference price eﬀects on the proﬁtability of price promotions, Mar- keting Science 14 (1) (1995) 82–104. doi:10.1287/mksc.14.1.82. URL https://pubsonline.informs.org/doi/pdf/10.1287/mksc.14.1.82

work page doi:10.1287/mksc.14.1.82 1995

[32] [33]

Kopalle, A

P. Kopalle, A. Rao, J. Assuncao, Asymmetric reference price eﬀects and dynamic pricing policies, Marketing Science 15 (1) (1996) 60–85. URL http://www.jstor.org/stable/184184

work page 1996

[33] [34]

Fibich, A

G. Fibich, A. Gavious, Explicit solutions of optimization models and diﬀerential games with nons- mooth (asymmetric) reference-price eﬀects, Operations Research 51 (5) (2003) 721–734. URL http://www.jstor.org/stable/4132433

work page arXiv 2003

[34] [35]

Y. Aviv, G. Vulcano, Dynamic list pricing, in: The Oxford handbook of pricing management, Oxford University Press, UK, 2012, Ch. 23, pp. 522–58.doi:10.1093/oxfordhb/9780199543175.013.0023

work page doi:10.1093/oxfordhb/9780199543175.013.0023 2012

[35] [36]

Den Boer, Dynamic pricing and learning: Historical origins, current research, and new directions, Surveys in operations research and management science 20 (1) (2015) 1–18

A. Den Boer, Dynamic pricing and learning: Historical origins, current research, and new directions, Surveys in operations research and management science 20 (1) (2015) 1–18. doi:10.1016/j.sorms. 2015.03.001. 30

work page doi:10.1016/j.sorms 2015

[36] [37]

M. V. W¨ uthrich, Non-life insurance: Mathematics & statistics (2017). URL http://dx.doi.org/10.2139/ssrn.2319328

work page doi:10.2139/ssrn.2319328 2017

[37] [38]

R. W. M. Wedderburn, Quasi-likelihood functions, generalized linear models, and the gauss-newton method, Biometrika 61 (3) (1974) 439–447. doi:10.2307/2334725

work page doi:10.2307/2334725 1974

[38] [39]

T. W. Anderson, J. B. Taylor, Strong consistency of least squares estimates in normal linear regres- sion, The Annals of Statistics 4 (4) (1976) 788–790. doi:10.1214/aos/1176343552

work page doi:10.1214/aos/1176343552 1976

[39] [40]

T. L. Lai, H. Robbins, C. Z. Wei, Strong consistency of least squares estimates in multiple regression, Proceedings of the National Academy of Sciences of the United States of America 75 (7) (1978) 343–

work page 1978

[40] [41]

doi:10.1016/0047-259X(79)90093-9

work page doi:10.1016/0047-259x(79)90093-9

[41] [42]

K. Chen, I. Hu, Z. Ying, Strong consistency of maximum quasi-likelihood estimators in generalized linear models with ﬁxed and adaptive designs, The Annals of Statistics 27 (4) (1999) 1155–1163. doi:10.1214/aos/1017938919

work page doi:10.1214/aos/1017938919 1999

[42] [43]

Bubeck, N

S. Bubeck, N. Cesa-Bianchi, Regret analysis of stochastic and nonstochastic multi-armed ban- dit problems, Foundations and Trends in Machine Learning 5 (1) (2012) 1–122. doi:10.1561/ 2200000024. URL http://sbubeck.com/SurveyBCB12.pdf

work page 2012

[43] [44]

P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning 47 (2-3) (2002) 235–256. doi:10.1023/A:1013689704352

work page doi:10.1023/a:1013689704352 2002

[44] [45]

T. Lai, H. Robbins, Asymptotically eﬃcient adaptive allocation rules, Advances in Applied Mathe- matics 6 (1) (1985) 4–22. URL http://dx.doi.org/10.1016/0196-8858(85)90002-8

work page doi:10.1016/0196-8858(85)90002-8 1985

[45] [46]

V. Dani, T. P. Hayes, S. M. Kakade, Stochastic linear optimization under bandit feedback, in: 21st Annual Conference on Learning Theory (COLT), 2008, pp. 355–366. URL http://colt2008.cs.helsinki.fi/papers/80-Dani.pdf

work page 2008

[46] [47]

Rusmevichientong, J

P. Rusmevichientong, J. N. Tsitsiklis, Linearly parameterized bandits, Mathematics of Operations Research 35 (2) (2010) 395–411. URL https://pubsonline.informs.org/doi/pdf/10.1287/moor.1100.0446

work page doi:10.1287/moor.1100.0446 2010

[47] [48]

Filippi, O

S. Filippi, O. Cappe, A. Garivier, C. Szepesv´ ari, Parametric bandits: The generalized linear case, in: Advances in Neural Information Processing Systems 23 (NIPS 2010), 2010, pp. 586–594. URL https://sites.ualberta.ca/~szepesva/papers/GenLinBandits-NIPS2010.pdf

work page 2010

[48] [49]

Bubeck, R

S. Bubeck, R. Munos, G. Stoltz, N. Cesa-Bianchi, X-armed bandits, Journal of Machine Learning Research 12 (2011) 1655–1695. URL http://www.jmlr.org/papers/volume12/bubeck11a/bubeck11a.pdf

work page 2011

[49] [50]

Kleinberg, A

R. Kleinberg, A. Slivkins, E. Upfal, Multi-armed bandits in metric spaces, in: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, ACM, 2008, pp. 681–690. doi:10. 1145/1374376.1374475. URL http://doi.acm.org/10.1145/1374376.1374475

work page doi:10.1145/1374376.1374475 2008

[50] [51]

Agrawal, N

S. Agrawal, N. Goyal, Analysis of thompson sampling for the multi-armed bandit problem, in: 25th Annual Conference on Learning Theory, Vol. 23, 2012, pp. 39.1–39.26. URL http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf

work page 2012

[51] [52]

Agrawal, N

S. Agrawal, N. Goyal, Further optimal regret bounds for thompson sampling, in: 16th International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), Vol. 31, 2013, pp. 90–107. URL http://proceedings.mlr.press/v31/agrawal13a.pdf 31

work page 2013

[52] [53]

Russo, B

D. Russo, B. Van Roy, Learning to optimize via posterior sampling, Mathematics of Operations Research 39 (4) (2013) 1221–1243. URL https://pubsonline.informs.org/doi/pdf/10.1287/moor.2014.0650

work page doi:10.1287/moor.2014.0650 2013

[53] [54]

Moˇ ckus, V

J. Moˇ ckus, V. Tiesis, A. Zilinskas, On bayesian methods for seeking the extremum, in: Towards Global Optimization, 2nd Edition, Vol. 2, Elsevier Science Ltd, North Holland, Amsterdam, 1978, pp. 117–129

work page 1978

[54] [55]

Snoek, H

J. Snoek, H. Larochelle, R. P. Adams, Practical bayesian optimization of machine learning algo- rithms, in: F. Pereira, C. J. C. Burges, L. Bottou, K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25, Curran Associates, Inc., 2012, pp. 2951–2959. URL http://papers.nips.cc/paper/4522-practical-bayesian-optimization

work page 2012

[55] [56]

Gallego, G

G. Gallego, G. Van Ryzin, Optimal dynamic pricing of inventories with stochastic demand over ﬁnite horizons, Management Science 40 (8) (1994) 999–1020. URL http://www.jstor.org.manchester.idm.oclc.org/stable/2633090

work page arXiv 1994

[56] [57]

Y. Aviv, A. Pazgal, A partially observed markov decision process for dynamic pricing, Management Science 51 (9) (2005) 1400–1416. URL http://www.jstor.org.manchester.idm.oclc.org/stable/20110429

work page arXiv 2005

[57] [58]

Exponential spectra in $L^2(\mu)$

J. Harrison, N. Keskin, A. Zeevi, Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution, Management Science 58 (3) (2012) 570–586. URL https://doi.org/10.1287/mnsc.1110.1426

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1287/mnsc.1110.1426 2012

[58] [59]

Broder, P

J. Broder, P. Rusmevichientong, Dynamic pricing under a general parametric choice model, Opera- tions Research 60 (4) (2012) 965–980. URL http://search.proquest.com/docview/1041256005/

work page arXiv 2012

[59] [60]

Kleinberg, T

R. Kleinberg, T. Leighton, The value of knowing a demand curve: bounds on regret for online posted-price auctions, in: Proceedings of the 44th IEEE Symposium on Foundations of Computer Science, IEEE, USA, 2003, pp. 594–605. doi:10.1109/SFCS.2003.1238232

work page doi:10.1109/sfcs.2003.1238232 2003

[60] [61]

Cope, Bayesian strategies for dynamic pricing in e-commerce, Naval Research Logistics (NRL) 54 (3) (2007) 265–281

E. Cope, Bayesian strategies for dynamic pricing in e-commerce, Naval Research Logistics (NRL) 54 (3) (2007) 265–281. doi:10.1002/nav.20204

work page doi:10.1002/nav.20204 2007

[61] [62]

Rusmevichientong, B

P. Rusmevichientong, B. Van Roy, P. W. Glynn, A nonparametric approach to multiproduct pricing, Operations Research 54 (1) (2006) 82–98. doi:10.1287/opre.1050.0252

work page doi:10.1287/opre.1050.0252 2006

[62] [63]

Besbes, A

O. Besbes, A. Zeevi, Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms, Operations Research 57 (6) (2009) 1407–1420. URL http://www.jstor.org/stable/25614853

work page arXiv 2009

[63] [64]

Besbes, A

O. Besbes, A. Zeevi, Blind network revenue management, Operations Research 60 (6) (2012) 1537– 1550. URL https://pubsonline.informs.org/doi/pdf/10.1287/opre.1120.1103

work page doi:10.1287/opre.1120.1103 2012

[64] [65]

Dud´ ık, D

M. Dud´ ık, D. J. Hsu, S. Kale, N. Karampatziakis, J. Langford, L. Reyzin, T. Zhangn, Eﬃcient optimal learning for contextual bandits, in: Proceedings of the 27th Conference on Uncertainty in Artiﬁcial Intelligence (UAI), 2011, pp. 1–20. URL http://www.cs.columbia.edu/~djhsu/papers/amo.pdf

work page 2011

[65] [66]

Chapelle, L

O. Chapelle, L. Li, An empirical evaluation of thompson sampling, in: J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 24, Curran Associates, Inc., 2012, pp. 2249–2257. URL https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/thompson. pdf 32

work page 2012

[66] [67]

Cesa-Bianchi, C

N. Cesa-Bianchi, C. Gentile, Y. Mansour, A. Minora, Delay and cooperation in nonstochastic bandits, Journal of Machine Learning Research 20 (17) (2016) 1–38. URL http://www.jmlr.org/papers/volume20/17-631/17-631.pdf

work page 2016

[67] [68]

Pike-Burke, S

C. Pike-Burke, S. Agrawal, C. Szepesvari, S. Grunewalder, Bandits with delayed, aggregated anony- mous feedback, in: Proceedings of Machine Learning (ICML), Vol. 80, 2018, pp. 4105–4113. URL http://proceedings.mlr.press/v80/pike-burke18a/pike-burke18a.pdf

work page 2018

[68] [69]

Agarwal, J

A. Agarwal, J. C. Duchi, Distributed delayed stochastic optimization, in: Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS), NIPS’11, Curran As- sociates Inc., USA, 2011, pp. 2312–2320. URL https://papers.nips.cc/paper/4247-distributed-delayed-stochastic-optimization

work page 2011

[69] [70]

Desautels, A

T. Desautels, A. Krause, J. W. Burdick, Parallelizing exploration-exploitation tradeoﬀs in gaussian process bandit optimization, Journal of Machine Learning Research 15 (2014) 4053–4103. URL http://jmlr.org/papers/volume15/desautels14a/desautels14a.pdf

work page 2014

[70] [71]

Stochastic Bandit Models for Delayed Conversions

C. Vernade, O. Capp´ e, V. Perchet, Stochastic bandit models for delayed conversions, arXiv preprint abs/1706.09186. arXiv:1706.09186. URL http://arxiv.org/abs/1706.09186

work page internal anchor Pith review Pith/arXiv arXiv

[71] [72]

Joulani, A

P. Joulani, A. Gy¨ orgy, C. Szepesv´ ari, Online learning under delayed feedback, in: Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, Georgia, USA, 2013, pp. 1453–1461. URL http://proceedings.mlr.press/v28/joulani13.pdf

work page 2013

[72] [73]

T. W. Anderson, J. B. Taylor, Strong consistency of least squares estimates in dynamic models, The Annals of Statistics 7 (3) (1979) 484–489. doi:10.1214/aos/1176344670

work page doi:10.1214/aos/1176344670 1979

[73] [74]

N. B. Keskin, A. Zeevi, Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies, Operations Research 62 (5) (2014) 1142–1167. URL https://doi.org/10.1287/opre.2014.1294

work page doi:10.1287/opre.2014.1294 2014

[74] [75]

Joulani, A

P. Joulani, A. Gy¨ orgy, C. Szepesv´ ari, Delay-tolerant online convex optimization: Uniﬁed analysis and adaptive-gradient algorithms, in: Proceedings of the Thirtieth AAAI Conference on Artiﬁcial Intelligence (AAAI-16), Phoenix, Arizona, USA, 2016, pp. 1744–1750. URL https://sites.ualberta.ca/~pooria/publications/AAAI16-Extended.pdf

work page 2016

[75] [76]

M. S. Bartlett, An inverse matrix adjustment arising in discriminant analysis, Annals of Mathemat- ical Statistics 22 (1) (1951) 107–111. doi:10.1214/aoms/1177729698

work page doi:10.1214/aoms/1177729698 1951

[76] [77]

J. J. Duistermaat, J. A. C. Kolk, Multidimensional Real Analysis I: Diﬀerentiation, Cambridge Studies in Advanced Mathematics, Cambridge University Press, Cambridge, Mass, 2004. doi: 10.1017/CBO978051161671

work page doi:10.1017/cbo978051161671 2004

[77] [78]

Dugundji, Topology, Series in advanced mathematics, Allyn & Bacon, Boston, 1966

J. Dugundji, Topology, Series in advanced mathematics, Allyn & Bacon, Boston, 1966

work page 1966

[78] [79]

Y. S. Chow, Local convergence of martingales and the law of large numbers, The Annals of Mathe- matical Statistics 36 (2) (1965) 552–558. doi:10.1214/aoms/1177700166

work page doi:10.1214/aoms/1177700166 1965

[79] [80]

Freedman, Another note on the borel-cantelli lemma and the strong law, with the poisson approxi- mation as a by-product, Annals of Probability 1 (6) (1973) 910–925

D. Freedman, Another note on the borel-cantelli lemma and the strong law, with the poisson approxi- mation as a by-product, Annals of Probability 1 (6) (1973) 910–925. doi:10.1214/aop/1176996800. 33

work page doi:10.1214/aop/1176996800 1973