Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

Joel Q. L. Chang

arxiv: 2606.09191 · v1 · pith:Q7CJZPVNnew · submitted 2026-06-08 · 💻 cs.LG · stat.ML

Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

Joel Q. L. Chang This is my paper

Pith reviewed 2026-06-27 17:16 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords risk-averse banditsThompson Samplingasymptotic optimalitysub-Gaussian rewardscontinuous risk functionalsnonparametric algorithmsregret boundsCVaR

0 comments

The pith

A nonparametric Thompson Sampling algorithm matches the instance-dependent regret lower bound for risk-averse bandits under any continuous risk functional on sub-Gaussian distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that its anchor-free nonparametric Thompson Sampling procedure attains regret matching the instance-dependent lower bound to leading order in log n. This asymptotic optimality result applies to any continuous risk functional on reward distributions with bounded density and sub-Gaussian tails, including common choices such as CVaR, mean-variance, Sharpe ratio, and distortion risk measures. The proof requires only continuity of the risk functional, a strictly weaker condition than the dominance or Lipschitz assumptions used in prior work, and it covers both bounded-support and sub-Gaussian settings. The technical advance consists of discretisation lemmas that project the Dirichlet posterior onto a fixed grid while keeping all polynomial prefactors of fixed degree independent of sample size. Readers would care because the result removes previous barriers to achieving instance-optimal performance in risk-sensitive sequential decision problems without imposing parametric reward forms.

Core claim

We prove that ρ-NPTS_SG achieves regret matching the instance-dependent lower bound to leading order in log n, establishing it as asymptotically optimal for any continuous risk functional ρ on the class of distributions with bounded density and sub-Gaussian tails, including Gaussian arms. Both this result and its bounded-support counterpart require only continuity of ρ: strictly weaker than the dominance condition of prior parametric Thompson Sampling results, and strictly weaker than the Lipschitz condition of UCB-type algorithms, yielding the first instance-optimal guarantees for non-Lipschitz functionals such as the Sharpe ratio without parametric reward assumptions. The key technical con

What carries the argument

The truncated discretisation lemma, which projects the growing-alphabet Dirichlet posterior onto a fixed grid via the Dirichlet aggregation property while keeping all polynomial prefactors of fixed degree independent of sample size.

If this is right

The algorithm attains asymptotic optimality for CVaR, mean-variance, Sharpe ratio, and distortion risk measures on sub-Gaussian arms.
Instance-optimal guarantees now hold for non-Lipschitz risk functionals without requiring parametric reward distributions.
The same proof structure covers both the bounded-support case and the sub-Gaussian case.
Only continuity of ρ is needed, removing the need for dominance or Lipschitz conditions used in earlier analyses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The continuity-only condition may allow the same optimality result to apply to additional risk measures not previously analyzed.
The discretisation technique could be adapted to other nonparametric bandit settings where the support grows with the number of observations.
Practical risk-averse systems could adopt the algorithm directly without first verifying Lipschitz properties of their chosen risk measure.

Load-bearing premise

The risk functional must be continuous.

What would settle it

An explicit construction of a continuous risk functional and a family of sub-Gaussian distributions where the regret of ρ-NPTS_SG grows faster than the instance-dependent lower bound by more than lower-order terms in log n would falsify the asymptotic optimality claim.

Figures

Figures reproduced from arXiv: 2606.09191 by Joel Q. L. Chang.

**Figure 1.** Figure 1: Cumulative regret (mean ± 1-σ band, 200 trials) of ρ-NPTS (solid green) and Risk-LCB [Tan et al., 2022] (dashed orange, Experiment 1 only) against the asymptotic lower bound (dotted red, Theorem 1). All K = 2, n = 10,000. Experiments 1 and 3 use Beta arms Experiment 2 uses TruncNorm arms. 0 2000 4000 6000 8000 10000 Rounds n 0 10 20 30 40 50 60 70 Cumulative regret Sharpe, Gaussian: N (1.5, 1) vs N (1.0, … view at source ↗

**Figure 2.** Figure 2: Experiment 4: Sharpe ratio on Gaussian arms [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

We prove that $\rho\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any continuous risk functional $\rho$ (CVaR, mean-variance, Sharpe ratio, distortion risk measures, and more) on the class of distributions with bounded density and sub-Gaussian tails, including Gaussian arms. Both this result and its bounded-support counterpart require only continuity of $\rho$: strictly weaker than the dominance condition of prior parametric Thompson Sampling results, and strictly weaker than the Lipschitz condition of UCB-type algorithms, yielding the first instance-optimal guarantees for non-Lipschitz functionals such as the Sharpe ratio without parametric reward assumptions. The bounded-support case is developed first as a stepping stone sharing the same proof structure. The key technical contributions are a discretisation lemma (bounded support) and a truncated discretisation lemma (sub-Gaussian tails), each projecting the growing-alphabet Dirichlet posterior onto a fixed grid via the Dirichlet aggregation property, holding all polynomial prefactors at fixed degree independent of sample size and breaking the super-exponential barrier that blocked prior proofs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper proves asymptotic optimality for nonparametric Thompson Sampling on risk-averse bandits under only continuity of the risk functional, via new discretization lemmas that control polynomial factors.

read the letter

This paper proves that ρ-NPTS_SG achieves regret matching the instance-dependent lower bound to leading order in log n for any continuous risk functional on distributions with bounded density and sub-Gaussian tails. It covers CVaR, mean-variance, Sharpe ratio and others without parametric assumptions.

The main advance is the discretization lemma for bounded support and the truncated version for sub-Gaussian tails. Both use the Dirichlet aggregation property to project the growing-alphabet posterior onto a fixed grid while keeping all polynomial prefactors at fixed degree independent of n. This structure directly addresses the super-exponential growth that blocked earlier proofs and lets the result hold under weaker conditions than dominance or Lipschitz.

The bounded-support case is handled first as a stepping stone with the same proof outline. The approach is coherent and the weakening of assumptions is genuine.

Soft spots are limited to the lemma details. The projection arguments and alphabet control need close checking to confirm they deliver the claimed bounds without extra regularity on ρ. The bounded-density condition is stated but its necessity could be spelled out more explicitly. These are standard technical points rather than load-bearing gaps.

The work is for researchers in theoretical bandits and risk-sensitive online learning. A reader interested in regret bounds and Thompson Sampling would get concrete value from the lemmas. It shows clear engagement with the technical barriers in the literature.

Send it to peer review.

Referee Report

0 major / 2 minor

Summary. The paper proves that ρ-NPTS_SG, an anchor-free nonparametric Thompson Sampling algorithm, achieves regret matching the instance-dependent lower bound to leading order in log n for any continuous risk functional ρ (including CVaR, mean-variance, Sharpe ratio, and distortion risk measures) on the class of distributions with bounded density and sub-Gaussian tails. The bounded-support case is developed first, followed by the sub-Gaussian case; both rely only on continuity of ρ. The key technical tools are a discretization lemma (bounded support) and a truncated discretization lemma (sub-Gaussian tails) that project the growing-alphabet Dirichlet posterior onto a fixed grid via the Dirichlet aggregation property while keeping all polynomial prefactors of bounded degree independent of n.

Significance. If the result holds, the work establishes the first instance-optimal guarantees for non-Lipschitz risk functionals such as the Sharpe ratio without parametric reward assumptions or dominance/Lipschitz conditions on ρ. The discretization lemmas that control prefactors independently of sample size and break the super-exponential barrier constitute a clear technical advance over prior parametric Thompson Sampling analyses.

minor comments (2)

The abstract states that the bounded-support case is a 'stepping stone' sharing the same proof structure; a brief forward reference to the section containing the truncated discretization lemma would help readers track the shared steps.
Notation for the algorithm (ρ-NPTS_SG) and the risk functional ρ is introduced in the abstract without an immediate pointer to the formal definition; adding a parenthetical reference to the relevant section would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, for highlighting the significance of the discretization lemmas and the instance-optimal guarantees for non-Lipschitz functionals such as the Sharpe ratio, and for recommending acceptance.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's derivation establishes asymptotic optimality via new discretization and truncated discretization lemmas that project the Dirichlet posterior onto a fixed grid using the Dirichlet aggregation property, with all polynomial prefactors of bounded degree independent of n. These lemmas are presented as original technical contributions that break the super-exponential barrier, directly supporting the regret bound matching the instance-dependent lower bound to leading order in log n under the sole assumption of continuity of ρ. No step reduces by construction to a fitted parameter, self-citation chain, ansatz smuggled via prior work, or renamed empirical pattern; the central claim retains independent content from the stated assumptions and lower bound.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on continuity of the risk functional and the sub-Gaussian plus bounded-density assumptions on rewards; no free parameters or invented entities are introduced.

axioms (2)

domain assumption The risk functional ρ is continuous.
Stated as the minimal condition sufficient for the asymptotic optimality result.
domain assumption Reward distributions have bounded density and sub-Gaussian tails.
Defines the class of distributions for which the regret bound holds, including Gaussians.

pith-pipeline@v0.9.1-grok · 5747 in / 1311 out tokens · 23235 ms · 2026-06-27T17:16:12.144730+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 5 canonical work pages

[1]

Ang, Ming Liang and Lim, Eloise Y. Y. and Chang, Joel Q. L. , year =. Thompson Sampling for. 2105.06960 , archivePrefix=

arXiv
[2]

Order Optimal Regret Bounds for

Shah, Mohammad Taha and Khurshid, Sabrina and Ghatak, Gourab , year =. Order Optimal Regret Bounds for. 2508.13749 , archivePrefix=

arXiv
[3]

Advances in Applied Mathematics , volume =

Asymptotically efficient adaptive allocation rules , author =. Advances in Applied Mathematics , volume =
[4]

Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-ECAI) , year =

A Survey of Risk-Aware Multi-Armed Bandits , author =. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-ECAI) , year =
[5]

Chang, Joel Q. L. and Tan, Vincent Y. F. , booktitle =. A Unifying Theory of. 2022 , volume =

2022
[6]

A Bayesian Learning Automaton for Solving Two-Armed Bernoulli Bandit Problems , year=

Granmo, Ole-Christoffer , booktitle=. A Bayesian Learning Automaton for Solving Two-Armed Bernoulli Bandit Problems , year=
[7]

2012 , eprint=

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis , author=. 2012 , eprint=

2012
[8]

2021 , eprint=

From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits , author=. 2021 , eprint=

2021
[9]

and Wiley, J

Huber, P.J. and Wiley, J. and InterScience, W. , biburl =
[10]

Halmos , title =

Paul R. Halmos , title =. Bulletin of the American Mathematical Society , number =. 1959 , doi =

1959
[11]

Theory of Probability and Its Applications , year=

Convergence of Random Processes and Limit Theorems in Probability Theory , author=. Theory of Probability and Its Applications , year=
[12]

, journal=

Posner, E. , journal=. Random coding strategies for minimum entropy , year=
[13]

Proceedings of the 31st Conference On Learning Theory , pages =

A General Approach to Multi-Armed Bandits Under Risk Criteria , author =. Proceedings of the 31st Conference On Learning Theory , pages =. 2018 , editor =

2018
[14]

1985 , issn =

Asymptotically efficient adaptive allocation rules , journal =. 1985 , issn =. doi:https://doi.org/10.1016/0196-8858(85)90002-8 , url =

work page doi:10.1016/0196-8858(85)90002-8 1985
[15]

Proceedings of the 25th Annual Conference on Learning Theory , pages =

Analysis of Thompson Sampling for the Multi-armed Bandit Problem , author =. Proceedings of the 25th Annual Conference on Learning Theory , pages =. 2012 , editor =

2012
[16]

Learning Bounds for Risk-sensitive Learning , url =

Lee, Jaeho and Park, Sejun and Shin, Jinwoo , booktitle =. Learning Bounds for Risk-sensitive Learning , url =
[17]

Continuous Mean-Covariance Bandits , year =

Yihan Du and Siwei Wang and Zhixuan Fang and Longbo Huang , booktitle =. Continuous Mean-Covariance Bandits , year =
[18]

Mean-variance and value at risk in multi-armed bandit problems , year=

Vakili, Sattar and Zhao, Qing , booktitle=. Mean-variance and value at risk in multi-armed bandit problems , year=
[19]

Proceedings of the 5th Asian Conference on Machine Learning , pages =

Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits , author =. Proceedings of the 5th Asian Conference on Machine Learning , pages =. 2013 , editor =

2013
[20]

2009 , publisher=

Large Deviations Techniques and Applications , author=. 2009 , publisher=

2009
[21]

Bernoulli , pages=

A large-deviation principle for Dirichlet posteriors , author=. Bernoulli , pages=. 2000 , publisher=

2000
[22]

Koolen and Sandeep Juneja , title =

Shubhada Agrawal and Wouter M. Koolen and Sandeep Juneja , title =. CoRR , volume =. 2020 , url =

2020
[23]

2021 , eprint=

Off-Policy Risk Assessment in Contextual Bandits , author=. 2021 , eprint=

2021
[24]

International Conference on Machine Learning , pages=

Thompson Sampling Algorithms for Mean-Variance Bandits , author=. International Conference on Machine Learning , pages=
[25]

Advances in Neural Information Processing Systems , pages=

Risk-aversion in multi-armed bandits , author=. Advances in Neural Information Processing Systems , pages=
[26]

Distributionally-Aware Exploration for

Tamkin, Alex and Keramati, Ramtin and Dann, Christoph and Brunskill, Emma , booktitle=. Distributionally-Aware Exploration for
[27]

A revised approach for risk-averse multi-armed bandits under CVaR criterion , journal =

Najakorn Khajonchotpanya and Yilin Xue and Napat Rujeerapaiboon , keywords =. A revised approach for risk-averse multi-armed bandits under CVaR criterion , journal =. 2021 , issn =. doi:https://doi.org/10.1016/j.orl.2021.05.005 , url =

work page doi:10.1016/j.orl.2021.05.005 2021
[28]

Proceedings of the 38th International Conference on Machine Learning , pages =

Optimal Thompson Sampling strategies for support-aware CVaR bandits , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

2021
[29]

2021 , eprint=

Risk-Constrained Thompson Sampling for CVaR Bandits , author=. 2021 , eprint=

2021
[30]

Mathematical Finance , volume=

Coherent measures of risk , author=. Mathematical Finance , volume=. 1999 , publisher=

1999
[31]

Proceedings of the 31st International Conference on Algorithmic Learning Theory , pages =

Bandit Algorithms Based on Thompson Sampling for Bounded Reward Distributions , author =. Proceedings of the 31st International Conference on Algorithmic Learning Theory , pages =. 2020 , editor =

2020
[32]

Premium Calculation by Transforming the Layer Premium Density , volume=

Wang, Shaun , year=. Premium Calculation by Transforming the Layer Premium Density , volume=. ASTIN Bulletin , publisher=. doi:10.2143/AST.26.1.563234 , number=

work page doi:10.2143/ast.26.1.563234
[33]

Mathematics of Operations Research , volume =

Garivier, Aur\'elien and M\'enard, Pierre and Stoltz, Gilles , title =. Mathematics of Operations Research , volume =. 2019 , doi =. https://doi.org/10.1287/moor.2017.0928 , abstract =

work page doi:10.1287/moor.2017.0928 2019
[34]

Biometrika , volume=

On the likelihood that one unknown probability exceeds another in view of the evidence of two samples , author=. Biometrika , volume=. 1933 , publisher=

1933
[35]

On the lookback distortion risk measure: theory and applications , volume =

Hürlimann, Werner , year =. On the lookback distortion risk measure: theory and applications , volume =
[36]

Massart , title =

P. Massart , title =. The Annals of Probability , number =. 1990 , doi =

1990
[37]

Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)
[38]

Classification Problem Solving

Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence
[39]

, title =

Robinson, Arthur L. , title =. 1980 , doi =. https://science.sciencemag.org/content/208/4447/1019.full.pdf , journal =

1980
[40]

New Ways to Make Microcircuits Smaller---Duplicate Entry

Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science
[41]

Clancey and Glenn Rennels , abstract =

Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =

work page doi:10.1016/s0020-7373(84)80003-6 1984
[42]

and Rennels, Glenn R

Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies
[43]

Poligon: A System for Parallel Problem Solving

Rice, James. Poligon: A System for Parallel Problem Solving
[44]

Transfer of Rule-Based Expertise through a Tutorial Dialogue

Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue
[45]

The Engineering of Qualitative Models

Clancey, William J. The Engineering of Qualitative Models
[46]

2008 , eprint=

Crime and punishment in scientific research , author=. 2008 , eprint=

2008
[47]

Pluto: The 'Other' Red Planet

NASA. Pluto: The 'Other' Red Planet

[1] [1]

Ang, Ming Liang and Lim, Eloise Y. Y. and Chang, Joel Q. L. , year =. Thompson Sampling for. 2105.06960 , archivePrefix=

arXiv

[2] [2]

Order Optimal Regret Bounds for

Shah, Mohammad Taha and Khurshid, Sabrina and Ghatak, Gourab , year =. Order Optimal Regret Bounds for. 2508.13749 , archivePrefix=

arXiv

[3] [3]

Advances in Applied Mathematics , volume =

Asymptotically efficient adaptive allocation rules , author =. Advances in Applied Mathematics , volume =

[4] [4]

Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-ECAI) , year =

A Survey of Risk-Aware Multi-Armed Bandits , author =. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-ECAI) , year =

[5] [5]

Chang, Joel Q. L. and Tan, Vincent Y. F. , booktitle =. A Unifying Theory of. 2022 , volume =

2022

[6] [6]

A Bayesian Learning Automaton for Solving Two-Armed Bernoulli Bandit Problems , year=

Granmo, Ole-Christoffer , booktitle=. A Bayesian Learning Automaton for Solving Two-Armed Bernoulli Bandit Problems , year=

[7] [7]

2012 , eprint=

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis , author=. 2012 , eprint=

2012

[8] [8]

2021 , eprint=

From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits , author=. 2021 , eprint=

2021

[9] [9]

and Wiley, J

Huber, P.J. and Wiley, J. and InterScience, W. , biburl =

[10] [10]

Halmos , title =

Paul R. Halmos , title =. Bulletin of the American Mathematical Society , number =. 1959 , doi =

1959

[11] [11]

Theory of Probability and Its Applications , year=

Convergence of Random Processes and Limit Theorems in Probability Theory , author=. Theory of Probability and Its Applications , year=

[12] [12]

, journal=

Posner, E. , journal=. Random coding strategies for minimum entropy , year=

[13] [13]

Proceedings of the 31st Conference On Learning Theory , pages =

A General Approach to Multi-Armed Bandits Under Risk Criteria , author =. Proceedings of the 31st Conference On Learning Theory , pages =. 2018 , editor =

2018

[14] [14]

1985 , issn =

Asymptotically efficient adaptive allocation rules , journal =. 1985 , issn =. doi:https://doi.org/10.1016/0196-8858(85)90002-8 , url =

work page doi:10.1016/0196-8858(85)90002-8 1985

[15] [15]

Proceedings of the 25th Annual Conference on Learning Theory , pages =

Analysis of Thompson Sampling for the Multi-armed Bandit Problem , author =. Proceedings of the 25th Annual Conference on Learning Theory , pages =. 2012 , editor =

2012

[16] [16]

Learning Bounds for Risk-sensitive Learning , url =

Lee, Jaeho and Park, Sejun and Shin, Jinwoo , booktitle =. Learning Bounds for Risk-sensitive Learning , url =

[17] [17]

Continuous Mean-Covariance Bandits , year =

Yihan Du and Siwei Wang and Zhixuan Fang and Longbo Huang , booktitle =. Continuous Mean-Covariance Bandits , year =

[18] [18]

Mean-variance and value at risk in multi-armed bandit problems , year=

Vakili, Sattar and Zhao, Qing , booktitle=. Mean-variance and value at risk in multi-armed bandit problems , year=

[19] [19]

Proceedings of the 5th Asian Conference on Machine Learning , pages =

Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits , author =. Proceedings of the 5th Asian Conference on Machine Learning , pages =. 2013 , editor =

2013

[20] [20]

2009 , publisher=

Large Deviations Techniques and Applications , author=. 2009 , publisher=

2009

[21] [21]

Bernoulli , pages=

A large-deviation principle for Dirichlet posteriors , author=. Bernoulli , pages=. 2000 , publisher=

2000

[22] [22]

Koolen and Sandeep Juneja , title =

Shubhada Agrawal and Wouter M. Koolen and Sandeep Juneja , title =. CoRR , volume =. 2020 , url =

2020

[23] [23]

2021 , eprint=

Off-Policy Risk Assessment in Contextual Bandits , author=. 2021 , eprint=

2021

[24] [24]

International Conference on Machine Learning , pages=

Thompson Sampling Algorithms for Mean-Variance Bandits , author=. International Conference on Machine Learning , pages=

[25] [25]

Advances in Neural Information Processing Systems , pages=

Risk-aversion in multi-armed bandits , author=. Advances in Neural Information Processing Systems , pages=

[26] [26]

Distributionally-Aware Exploration for

Tamkin, Alex and Keramati, Ramtin and Dann, Christoph and Brunskill, Emma , booktitle=. Distributionally-Aware Exploration for

[27] [27]

A revised approach for risk-averse multi-armed bandits under CVaR criterion , journal =

Najakorn Khajonchotpanya and Yilin Xue and Napat Rujeerapaiboon , keywords =. A revised approach for risk-averse multi-armed bandits under CVaR criterion , journal =. 2021 , issn =. doi:https://doi.org/10.1016/j.orl.2021.05.005 , url =

work page doi:10.1016/j.orl.2021.05.005 2021

[28] [28]

Proceedings of the 38th International Conference on Machine Learning , pages =

Optimal Thompson Sampling strategies for support-aware CVaR bandits , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

2021

[29] [29]

2021 , eprint=

Risk-Constrained Thompson Sampling for CVaR Bandits , author=. 2021 , eprint=

2021

[30] [30]

Mathematical Finance , volume=

Coherent measures of risk , author=. Mathematical Finance , volume=. 1999 , publisher=

1999

[31] [31]

Proceedings of the 31st International Conference on Algorithmic Learning Theory , pages =

Bandit Algorithms Based on Thompson Sampling for Bounded Reward Distributions , author =. Proceedings of the 31st International Conference on Algorithmic Learning Theory , pages =. 2020 , editor =

2020

[32] [32]

Premium Calculation by Transforming the Layer Premium Density , volume=

Wang, Shaun , year=. Premium Calculation by Transforming the Layer Premium Density , volume=. ASTIN Bulletin , publisher=. doi:10.2143/AST.26.1.563234 , number=

work page doi:10.2143/ast.26.1.563234

[33] [33]

Mathematics of Operations Research , volume =

Garivier, Aur\'elien and M\'enard, Pierre and Stoltz, Gilles , title =. Mathematics of Operations Research , volume =. 2019 , doi =. https://doi.org/10.1287/moor.2017.0928 , abstract =

work page doi:10.1287/moor.2017.0928 2019

[34] [34]

Biometrika , volume=

On the likelihood that one unknown probability exceeds another in view of the evidence of two samples , author=. Biometrika , volume=. 1933 , publisher=

1933

[35] [35]

On the lookback distortion risk measure: theory and applications , volume =

Hürlimann, Werner , year =. On the lookback distortion risk measure: theory and applications , volume =

[36] [36]

Massart , title =

P. Massart , title =. The Annals of Probability , number =. 1990 , doi =

1990

[37] [37]

Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)

[38] [38]

Classification Problem Solving

Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence

[39] [39]

, title =

Robinson, Arthur L. , title =. 1980 , doi =. https://science.sciencemag.org/content/208/4447/1019.full.pdf , journal =

1980

[40] [40]

New Ways to Make Microcircuits Smaller---Duplicate Entry

Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science

[41] [41]

Clancey and Glenn Rennels , abstract =

Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =

work page doi:10.1016/s0020-7373(84)80003-6 1984

[42] [42]

and Rennels, Glenn R

Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies

[43] [43]

Poligon: A System for Parallel Problem Solving

Rice, James. Poligon: A System for Parallel Problem Solving

[44] [44]

Transfer of Rule-Based Expertise through a Tutorial Dialogue

Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue

[45] [45]

The Engineering of Qualitative Models

Clancey, William J. The Engineering of Qualitative Models

[46] [46]

2008 , eprint=

Crime and punishment in scientific research , author=. 2008 , eprint=

2008

[47] [47]

Pluto: The 'Other' Red Planet

NASA. Pluto: The 'Other' Red Planet