Mean-based algorithms: A lower bound and regret

Amelie Kleber; Julius Durmann

arxiv: 2606.04931 · v1 · pith:YURA3SOFnew · submitted 2026-06-03 · 💻 cs.LG · cs.GT

Mean-based algorithms: A lower bound and regret

Julius Durmann , Amelie Kleber This is my paper

Pith reviewed 2026-06-28 07:15 UTC · model grok-4.3

classification 💻 cs.LG cs.GT

keywords mean-based algorithmslower boundbandit feedbackonline learningregretno-regret algorithmsepsilon-greedyExp3

0 comments

The pith

Mean-based algorithms cannot learn faster than a lower bound on their defining sequence γ_t allows under bandit feedback with unknown horizon.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes the first lower bound on the sequence γ_t that defines mean-based algorithms, proving a limit on how quickly these algorithms can shift probability mass away from low-mean actions when only bandit feedback is available and the time horizon is unknown. This bound arises directly from the requirement that the algorithm remain mean-based while operating without knowledge of the total number of rounds. The authors construct two algorithms satisfying the definition—one that generalizes ε-greedy and one that adapts the mean-based version of Exp3 to unknown horizons—and show through experiments that they remain competitive with other bandit algorithms despite the imposed limit. They further prove that the class of mean-based algorithms intersects non-trivially with no-regret algorithms, so that choices of γ_t exist for which an algorithm satisfies both properties simultaneously.

Core claim

In the bandit-feedback model with unknown finite horizon, any mean-based algorithm defined by a sequence γ_t must obey a lower bound on the rate at which γ_t can decrease; this bound is the first formal result that limits the speed at which such algorithms can learn, and it is accompanied by explicit constructions of mean-based algorithms that achieve competitive performance and by a demonstration that the mean-based and no-regret classes have a non-empty intersection.

What carries the argument

The non-increasing sequence γ_t that forces a mean-based algorithm to place at least γ_t probability mass on every action whose empirical mean is below the maximum observed mean.

If this is right

Mean-based algorithms have a provable speed limit on convergence to serially undominated actions in unknown-horizon bandit problems.
Algorithms can be constructed that are simultaneously mean-based and no-regret for suitable choices of γ_t.
The two new constructions extend ε-greedy and mean-based Exp3 to the unknown-horizon setting while preserving the mean-based property.
The intersection between mean-based and no-regret classes is non-trivial rather than empty.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The lower bound may constrain how rapidly mean-based dynamics can approximate Nash equilibria in repeated games when players receive only payoff feedback.
Designers of hybrid algorithms could use the bound to calibrate γ_t so that mean-based behavior coexists with sublinear regret.
Empirical studies could test whether real-world implementations of mean-based rules already operate near the derived limit or could safely accelerate γ_t decay.

Load-bearing premise

The modeling choice that mean-based algorithms are exactly those whose behavior is governed by a single sequence γ_t under the standard bandit-feedback and unknown-horizon assumptions.

What would settle it

An explicit mean-based algorithm whose γ_t sequence decreases faster than the derived lower bound while still satisfying the mean-based definition on all bandit-feedback instances with unknown horizon.

Figures

Figures reproduced from arXiv: 2606.04931 by Amelie Kleber, Julius Durmann.

**Figure 1.** Figure 1: The utility differences σ ′ (t) and σ ′′(t) (blue) and the bound ±γt ·t (orange). If the difference crosses either −γt · t (at t1) or γt · t (at tc), a γt-mean-based algorithm must set the probability of the corresponding dominated action to at most γt. P(atc+1 = 1 | X ′′ tc ) ≥ P(at1+1 ̸= 2, . . . , atc ̸= 2, atc+1 = 1 | X ′′ tc ) = P(at1+1 ̸= 2, . . . , atc ̸= 2, atc+1 = 1 | X ′ tc ) = 1 − P(at1+1 = 2 ∨ … view at source ↗

**Figure 2.** Figure 2: Visualization of the derived bound for γ ∗ t = 1 2 √ t+1 . The orange curve displays the exact value of Ptc,max t=1 γ(t) + k · γ(tc) with the tighter tc,max = l 1+2γt1 1−2γt1 · t1 m , while the gray line represents the integral approximation as described. Note that the numerical evaluation of the sum jumps because of the ceil operator in tc,max. We now show that its limit is 1. lim t1→∞ Z tc,max t1 γ(t − 1… view at source ↗

**Figure 3.** Figure 3: fγ ′ t for different variants of γ ′ t . We observe that, for γ0 < 1 2 or α > 1 2 , fγ ′ t crosses the threshold of 1. For γ0 = 1 2 , τ = 1, and α = 1 2 , we recover the rate analyzed above. We now vary these parameters to show that only slight changes can lead to an invalid mean-based rate [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Mean-based algorithms in the environment defined by the rewards in Eq. ( [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Results in the MAB setting. Displayed are averages and standard deviations over ten runs. [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Convergence behavior of our algorithms in a Bertrand oligopoly. Displayed are average and stan [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

read the original abstract

Mean-based algorithms are a class of online learning algorithms that assign low probability to actions with low average rewards. Recent work indicates these algorithms converge favorably to serially undominated actions, which approximate Nash equilibria in economic games. However, empirical studies also show slower convergence compared to established algorithms in bandit-feedback scenarios. We study mean-based algorithms when the time horizon is unknown and only bandit feedback is available. In this setting, we provide the first lower bound on the algorithm-defining sequence $\gamma_t$ that formally establishes a limit on how fast these algorithms can learn. Additionally, we propose two mean-based algorithms: one generalizes $\epsilon$-greedy, and the other extends the mean-based Exp3 to unknown horizons. Our experiments show that mean-based algorithms, although slightly slower, can perform competitively with other bandit-feedback algorithms. We further analyze the relationship to no-regret algorithms. Depending on the choice of $\gamma_t$, the intersection with no-regret algorithms is non-trivial, and we show that algorithms exist that are both mean-based and no-regret. This adds context to the "exploitability" of this class of algorithms that previous contributions suggest.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first lower bound on γ_t for mean-based algorithms in the unknown-horizon bandit setting along with two new algorithm variants.

read the letter

The paper's key contribution is the first lower bound on γ_t for mean-based algorithms under bandit feedback with unknown finite horizon, along with two new algorithms that handle that setting. It does a solid job laying out the problem and showing that mean-based algorithms can still be competitive in experiments, even if a bit slower than some baselines. The analysis of how they intersect with no-regret algorithms is useful too, especially the point that some choices of γ_t let you have both properties. That adds some nuance to earlier suggestions about their exploitability.

The lower bound itself is presented as novel, and the two extensions—one generalizing ε-greedy and one for mean-based Exp3—look like reasonable ways to adapt to unknown horizons. The experiments suggest they perform okay, which is good to see.

On the soft side, the abstract doesn't give the actual proof or much on the experimental setup, like what baselines were used or if there are error bars. Without those, it's tough to gauge how strong the bound is or how fair the comparisons are. The modeling choices seem standard, but if the practical regimes differ, the limit might not bite as hard.

This is the kind of paper for folks in online learning and algorithmic game theory who care about convergence rates in bandits. It has enough formal content and new pieces that it deserves a serious referee, even if revisions will be needed on the details.

Referee Report

2 major / 2 minor

Summary. The paper studies mean-based algorithms (defined via the sequence γ_t) in the bandit-feedback setting with unknown finite horizon. It claims to provide the first formal lower bound on γ_t that limits the speed at which these algorithms can learn, proposes two new mean-based algorithms (a generalization of ε-greedy and an extension of mean-based Exp3 to unknown horizons), reports that experiments show competitive performance relative to other bandit algorithms, and analyzes the intersection with no-regret algorithms, showing that the intersection is non-empty and that algorithms can be both mean-based and no-regret.

Significance. If the lower bound holds, it supplies a concrete, formal constraint on the convergence rate of mean-based algorithms under standard bandit assumptions, which is useful for understanding their behavior in game-theoretic settings where they converge to serially undominated actions. The construction of two new algorithms that respect the mean-based property while handling unknown horizons, together with the demonstration that mean-based and no-regret properties can coexist, adds technical value. The paper includes experimental comparisons, which is a positive feature even if details require clarification.

major comments (2)

[§4] §4 (Experiments): The statement that mean-based algorithms 'perform competitively' is central to the practical contribution, yet the text provides no description of the baselines used, the number of independent runs, error bars, or statistical tests. This makes it impossible to evaluate whether the observed slight slowdown is statistically meaningful or whether the competitiveness claim is supported.
[Theorem 1] Theorem 1 (lower bound on γ_t): The proof sketch relies on the unknown-horizon bandit model; it is unclear whether the same bound applies when the horizon is known or whether the unknown-horizon assumption is necessary for the result to be non-vacuous. Clarifying this would strengthen the claim that the bound 'formally establishes a limit on how fast these algorithms can learn.'

minor comments (2)

[Introduction] The definition of mean-based algorithms via γ_t is introduced in the abstract but should be restated with the precise mathematical condition in the first section for readers unfamiliar with the prior literature.
[§4] Figure captions and axis labels in the experimental plots should explicitly state the performance metric (e.g., cumulative regret) and the horizon values tested.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for minor revision. We address each major comment below.

read point-by-point responses

Referee: [§4] §4 (Experiments): The statement that mean-based algorithms 'perform competitively' is central to the practical contribution, yet the text provides no description of the baselines used, the number of independent runs, error bars, or statistical tests. This makes it impossible to evaluate whether the observed slight slowdown is statistically meaningful or whether the competitiveness claim is supported.

Authors: We agree that additional details are required to support the competitiveness claim. In the revised manuscript we will expand §4 to specify the baselines (standard Exp3, UCB, and Thompson Sampling), the number of independent runs (50 per environment), error bars (one standard deviation), and any statistical tests performed. This will allow readers to assess whether the observed differences are meaningful. revision: yes
Referee: [Theorem 1] Theorem 1 (lower bound on γ_t): The proof sketch relies on the unknown-horizon bandit model; it is unclear whether the same bound applies when the horizon is known or whether the unknown-horizon assumption is necessary for the result to be non-vacuous. Clarifying this would strengthen the claim that the bound 'formally establishes a limit on how fast these algorithms can learn.'

Authors: The lower bound of Theorem 1 is derived specifically for the unknown-horizon setting; the proof relies on the algorithm having to choose γ_t without knowledge of T. When the horizon is known in advance, a different γ_t sequence can be selected and the same lower bound need not hold. We will add a clarifying remark stating that the result is tied to the unknown-horizon model and explaining why that assumption is necessary for the bound to be non-vacuous. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper derives a lower bound on the algorithm-defining sequence γ_t from the standard definition of mean-based algorithms together with the bandit-feedback model and unknown finite horizon. This is presented as an independent analysis establishing a limit on learning speed, with additional constructions of two new algorithms and a non-trivial intersection with no-regret algorithms shown via choice of γ_t. No load-bearing step reduces by the paper's own equations or self-citation to its inputs; the central claim does not rely on fitted parameters renamed as predictions, uniqueness theorems imported from the authors' prior work, or ansatzes smuggled via citation. The derivation is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only; no free parameters, invented entities, or additional axioms are described beyond the standard definition of mean-based algorithms.

axioms (1)

domain assumption Mean-based algorithms assign low probability to actions with low average rewards via a sequence γ_t
This is the class definition used throughout the abstract.

pith-pipeline@v0.9.1-grok · 5729 in / 1151 out tokens · 23185 ms · 2026-06-28T07:15:01.965376+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 29 canonical work pages · 1 internal anchor

[1]

Lattimore, Tor and Szepesvári, Csaba , year =. Bandit. doi:10.1017/9781108571401 , publisher =

work page doi:10.1017/9781108571401
[2]

and Cesa-Bianchi, N

Auer, P. and Cesa-Bianchi, N. and Freund, Y. and Schapire, R.E. , year =. Gambling in a rigged casino:. doi:10.1109/SFCS.1995.492488 , booktitle =

work page doi:10.1109/sfcs.1995.492488 1995
[3]

Machine Learning , author =

Finite-time. Machine Learning , author =. 2002 , pages =. doi:10.1023/A:1013689704352 , number =

work page doi:10.1023/a:1013689704352 2002
[4]

, year =

Auer, Peter and Cesa-Bianchi, Nicolò and Freund, Yoav and Schapire, Robert E. , year =. The. SIAM Journal on Computing , publisher =. doi:10.1137/S0097539701398375 , number =

work page doi:10.1137/s0097539701398375
[5]

, year =

Thompson, William R. , year =. On the. Biometrika , publisher =. doi:10.2307/2332286 , number =

work page doi:10.2307/2332286
[6]

Deng, Xiaotie and Hu, Xinyan and Lin, Tao and Zheng, Weiqiang , year =. Nash. doi:10.1145/3485447.3512059 , booktitle =

work page doi:10.1145/3485447.3512059
[7]

Auctions between

Kolumbus, Yoav and Nisan, Noam , year =. Auctions between. doi:10.1145/3485447.3512055 , booktitle =

work page doi:10.1145/3485447.3512055
[8]

Selling to a

Braverman, Mark and Mao, Jieming and Schneider, Jon and Weinberg, Matt , year =. Selling to a. doi:10.1145/3219166.3219233 , booktitle =

work page doi:10.1145/3219166.3219233
[9]

Foundations and Trends® in Machine Learning , author =

Online. Foundations and Trends® in Machine Learning , author =. 2011 , pages =. doi:10.1561/2200000018 , number =

work page doi:10.1561/2200000018 2011
[10]

Rationalizability,

Milgrom, Paul and Roberts, John , year =. Rationalizability,. Econometrica , publisher =. doi:10.2307/2938316 , number =

work page doi:10.2307/2938316
[11]

, year =

Pearce, David G. , year =. Rationalizable. Econometrica , publisher =. doi:10.2307/1911197 , number =

work page doi:10.2307/1911197
[12]

Douglas , year =

Bernheim, B. Douglas , year =. Rationalizable. Econometrica , publisher =. doi:10.2307/1911196 , number =

work page doi:10.2307/1911196
[13]

Rationalizability and

Brandenburger, Adam and Dekel, Eddie , year =. Rationalizability and. Econometrica , publisher =. doi:10.2307/1913562 , number =

work page doi:10.2307/1913562
[14]

Journal of Mathematical Economics , author =

Subjectivity and correlation in randomized strategies , volume =. Journal of Mathematical Economics , author =. 1974 , pages =. doi:10.1016/0304-4068(74)90037-8 , number =

work page doi:10.1016/0304-4068(74)90037-8 1974
[15]

Solution

Levin, Jonathan , year =. Solution
[16]

, year =

Aumann, Robert J. , year =. Correlated. Econometrica , publisher =. doi:10.2307/1911154 , number =

work page doi:10.2307/1911154
[17]

Advances in Neural Information Processing Systems , author =

Fast. Advances in Neural Information Processing Systems , author =. 2024 , pages =

2024
[18]

Algorithmic

Arunachaleswaran, Eshwar Ram and Collina, Natalie and Kannan, Sampath and Roth, Aaron and Ziani, Juba , year =. Algorithmic. doi:10.48550/arXiv.2409.03956 , publisher =

work page doi:10.48550/arxiv.2409.03956
[19]

and Wang, Chang and Zhang, Chenhao , year =

Hartline, Jason D. and Wang, Chang and Zhang, Chenhao , year =. Regulation of. doi:10.1145/3709025.3712217 , booktitle =

work page doi:10.1145/3709025.3712217
[20]

Strategizing against

Deng, Yuan and Schneider, Jon and Sivan, Balasubramanian , editor =. Strategizing against. Advances in. 2019 , url_OPT =

2019
[21]

Gerchinovitz, Sébastien and Lattimore, Tor , year =. Refined. doi:10.48550/arXiv.1605.07416 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1605.07416
[22]

Efficient

Daskalakis, Constantinos and Farina, Gabriele and Fishelson, Maxwell and Pipis, Charilaos and Schneider, Jon , year =. Efficient. doi:10.1145/3717823.3718307 , booktitle =

work page doi:10.1145/3717823.3718307
[23]

Komiyama, Junpei and Honda, Junya and Kashima, Hisashi and Nakagawa, Hiroshi , year =. Regret. Proceedings of
[24]

Generalized

Lin, Tao and Chen, Yiling , year =. Generalized. doi:10.48550/arXiv.2402.09721 , publisher =

work page doi:10.48550/arxiv.2402.09721
[25]

Minimax policies for adversarial and stochastic bandits , booktitle =

Audibert, Jean-Yves and Bubeck, Sébastien , year =. Minimax policies for adversarial and stochastic bandits , booktitle =
[26]

Bichler, Martin and Durmann, Julius and Oberlechner, Matthias , year =. Online. doi:10.48550/arXiv.2412.15707 , publisher =

work page doi:10.48550/arxiv.2412.15707
[27]

and Weinberg, S

Kolumbus, Yoav and Schneider, Jon and Talgam-Cohen, Inbal and Vlatakis-Gkaragkounis, Emmanouil-Vasileios and Wang, Joshua R. and Weinberg, S. M. , year =. Contracting with a. doi:10.52202/079017-2460 , journal =

work page doi:10.52202/079017-2460
[28]

Arora, Sanjeev and Hazan, Elad and Kale, Satyen , year =. The. doi:10.4086/toc.2012.v008a006 , journal =

work page doi:10.4086/toc.2012.v008a006 2012
[29]

Learning Theory 2003 , author =

Efficient algorithms for online decision problems , volume =. Learning Theory 2003 , author =. 2005 , pages =. doi:10.1016/j.jcss.2004.10.016 , number =

work page doi:10.1016/j.jcss.2004.10.016 2003
[30]

Théorie mathématique de la richesse sociale , volume =

Bertrand, Joseph , year =. Théorie mathématique de la richesse sociale , volume =. Journal des savants , publisher =
[31]

Tsallis-

Zimmert, Julian and Seldin, Yevgeny , year =. Tsallis-. doi:10.48550/arXiv.1807.07623 , publisher =

work page doi:10.48550/arxiv.1807.07623
[32]

Vakili, Sattar and Khezeli, Kia and Picheny, Victor , editor =. On. Proceedings of. 2021 , pages =

2021
[33]

Journal of Computer and System Sciences , author =

A. Journal of Computer and System Sciences , author =. 1997 , pages =. doi:10.1006/jcss.1997.1504 , number =

work page doi:10.1006/jcss.1997.1504 1997
[34]

Information and Computation , author =

The. Information and Computation , author =. 1994 , pages =. doi:10.1006/inco.1994.1009 , number =

work page doi:10.1006/inco.1994.1009 1994
[35]

Wei, Chen-Yu and Lee, Chung-Wei and Zhang, Mengxiao and Luo, Haipeng , year =. Linear
[36]

and Weinberg, S

Guruganesh, Guru and Kolumbus, Yoav and Schneider, Jon and Talgam-Cohen, Inbal and Vlatakis-Gkaragkounis, Emmanouil-Vasileios and Wang, Joshua R. and Weinberg, S. Matthew , year =. Contracting with a learning agent , volume =. Proceedings of the 38th
[37]

Mathematics of Operations Research , author =

Optimal. Mathematics of Operations Research , author =. 1981 , pages =. doi:10.1287/moor.6.1.58 , number =

work page doi:10.1287/moor.6.1.58 1981
[38]

American Economic Review , author =

The. American Economic Review , author =. 1973 , pages =

1973
[39]

Games and Economic Behavior , author =

Regret-based continuous-time dynamics , volume =. Games and Economic Behavior , author =. 2003 , pages =. doi:10.1016/S0899-8256(03)00178-7 , number =

work page doi:10.1016/s0899-8256(03)00178-7 2003

[1] [1]

Lattimore, Tor and Szepesvári, Csaba , year =. Bandit. doi:10.1017/9781108571401 , publisher =

work page doi:10.1017/9781108571401

[2] [2]

and Cesa-Bianchi, N

Auer, P. and Cesa-Bianchi, N. and Freund, Y. and Schapire, R.E. , year =. Gambling in a rigged casino:. doi:10.1109/SFCS.1995.492488 , booktitle =

work page doi:10.1109/sfcs.1995.492488 1995

[3] [3]

Machine Learning , author =

Finite-time. Machine Learning , author =. 2002 , pages =. doi:10.1023/A:1013689704352 , number =

work page doi:10.1023/a:1013689704352 2002

[4] [4]

, year =

Auer, Peter and Cesa-Bianchi, Nicolò and Freund, Yoav and Schapire, Robert E. , year =. The. SIAM Journal on Computing , publisher =. doi:10.1137/S0097539701398375 , number =

work page doi:10.1137/s0097539701398375

[5] [5]

, year =

Thompson, William R. , year =. On the. Biometrika , publisher =. doi:10.2307/2332286 , number =

work page doi:10.2307/2332286

[6] [6]

Deng, Xiaotie and Hu, Xinyan and Lin, Tao and Zheng, Weiqiang , year =. Nash. doi:10.1145/3485447.3512059 , booktitle =

work page doi:10.1145/3485447.3512059

[7] [7]

Auctions between

Kolumbus, Yoav and Nisan, Noam , year =. Auctions between. doi:10.1145/3485447.3512055 , booktitle =

work page doi:10.1145/3485447.3512055

[8] [8]

Selling to a

Braverman, Mark and Mao, Jieming and Schneider, Jon and Weinberg, Matt , year =. Selling to a. doi:10.1145/3219166.3219233 , booktitle =

work page doi:10.1145/3219166.3219233

[9] [9]

Foundations and Trends® in Machine Learning , author =

Online. Foundations and Trends® in Machine Learning , author =. 2011 , pages =. doi:10.1561/2200000018 , number =

work page doi:10.1561/2200000018 2011

[10] [10]

Rationalizability,

Milgrom, Paul and Roberts, John , year =. Rationalizability,. Econometrica , publisher =. doi:10.2307/2938316 , number =

work page doi:10.2307/2938316

[11] [11]

, year =

Pearce, David G. , year =. Rationalizable. Econometrica , publisher =. doi:10.2307/1911197 , number =

work page doi:10.2307/1911197

[12] [12]

Douglas , year =

Bernheim, B. Douglas , year =. Rationalizable. Econometrica , publisher =. doi:10.2307/1911196 , number =

work page doi:10.2307/1911196

[13] [13]

Rationalizability and

Brandenburger, Adam and Dekel, Eddie , year =. Rationalizability and. Econometrica , publisher =. doi:10.2307/1913562 , number =

work page doi:10.2307/1913562

[14] [14]

Journal of Mathematical Economics , author =

Subjectivity and correlation in randomized strategies , volume =. Journal of Mathematical Economics , author =. 1974 , pages =. doi:10.1016/0304-4068(74)90037-8 , number =

work page doi:10.1016/0304-4068(74)90037-8 1974

[15] [15]

Solution

Levin, Jonathan , year =. Solution

[16] [16]

, year =

Aumann, Robert J. , year =. Correlated. Econometrica , publisher =. doi:10.2307/1911154 , number =

work page doi:10.2307/1911154

[17] [17]

Advances in Neural Information Processing Systems , author =

Fast. Advances in Neural Information Processing Systems , author =. 2024 , pages =

2024

[18] [18]

Algorithmic

Arunachaleswaran, Eshwar Ram and Collina, Natalie and Kannan, Sampath and Roth, Aaron and Ziani, Juba , year =. Algorithmic. doi:10.48550/arXiv.2409.03956 , publisher =

work page doi:10.48550/arxiv.2409.03956

[19] [19]

and Wang, Chang and Zhang, Chenhao , year =

Hartline, Jason D. and Wang, Chang and Zhang, Chenhao , year =. Regulation of. doi:10.1145/3709025.3712217 , booktitle =

work page doi:10.1145/3709025.3712217

[20] [20]

Strategizing against

Deng, Yuan and Schneider, Jon and Sivan, Balasubramanian , editor =. Strategizing against. Advances in. 2019 , url_OPT =

2019

[21] [21]

Gerchinovitz, Sébastien and Lattimore, Tor , year =. Refined. doi:10.48550/arXiv.1605.07416 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1605.07416

[22] [22]

Efficient

Daskalakis, Constantinos and Farina, Gabriele and Fishelson, Maxwell and Pipis, Charilaos and Schneider, Jon , year =. Efficient. doi:10.1145/3717823.3718307 , booktitle =

work page doi:10.1145/3717823.3718307

[23] [23]

Komiyama, Junpei and Honda, Junya and Kashima, Hisashi and Nakagawa, Hiroshi , year =. Regret. Proceedings of

[24] [24]

Generalized

Lin, Tao and Chen, Yiling , year =. Generalized. doi:10.48550/arXiv.2402.09721 , publisher =

work page doi:10.48550/arxiv.2402.09721

[25] [25]

Minimax policies for adversarial and stochastic bandits , booktitle =

Audibert, Jean-Yves and Bubeck, Sébastien , year =. Minimax policies for adversarial and stochastic bandits , booktitle =

[26] [26]

Bichler, Martin and Durmann, Julius and Oberlechner, Matthias , year =. Online. doi:10.48550/arXiv.2412.15707 , publisher =

work page doi:10.48550/arxiv.2412.15707

[27] [27]

and Weinberg, S

Kolumbus, Yoav and Schneider, Jon and Talgam-Cohen, Inbal and Vlatakis-Gkaragkounis, Emmanouil-Vasileios and Wang, Joshua R. and Weinberg, S. M. , year =. Contracting with a. doi:10.52202/079017-2460 , journal =

work page doi:10.52202/079017-2460

[28] [28]

Arora, Sanjeev and Hazan, Elad and Kale, Satyen , year =. The. doi:10.4086/toc.2012.v008a006 , journal =

work page doi:10.4086/toc.2012.v008a006 2012

[29] [29]

Learning Theory 2003 , author =

Efficient algorithms for online decision problems , volume =. Learning Theory 2003 , author =. 2005 , pages =. doi:10.1016/j.jcss.2004.10.016 , number =

work page doi:10.1016/j.jcss.2004.10.016 2003

[30] [30]

Théorie mathématique de la richesse sociale , volume =

Bertrand, Joseph , year =. Théorie mathématique de la richesse sociale , volume =. Journal des savants , publisher =

[31] [31]

Tsallis-

Zimmert, Julian and Seldin, Yevgeny , year =. Tsallis-. doi:10.48550/arXiv.1807.07623 , publisher =

work page doi:10.48550/arxiv.1807.07623

[32] [32]

Vakili, Sattar and Khezeli, Kia and Picheny, Victor , editor =. On. Proceedings of. 2021 , pages =

2021

[33] [33]

Journal of Computer and System Sciences , author =

A. Journal of Computer and System Sciences , author =. 1997 , pages =. doi:10.1006/jcss.1997.1504 , number =

work page doi:10.1006/jcss.1997.1504 1997

[34] [34]

Information and Computation , author =

The. Information and Computation , author =. 1994 , pages =. doi:10.1006/inco.1994.1009 , number =

work page doi:10.1006/inco.1994.1009 1994

[35] [35]

Wei, Chen-Yu and Lee, Chung-Wei and Zhang, Mengxiao and Luo, Haipeng , year =. Linear

[36] [36]

and Weinberg, S

Guruganesh, Guru and Kolumbus, Yoav and Schneider, Jon and Talgam-Cohen, Inbal and Vlatakis-Gkaragkounis, Emmanouil-Vasileios and Wang, Joshua R. and Weinberg, S. Matthew , year =. Contracting with a learning agent , volume =. Proceedings of the 38th

[37] [37]

Mathematics of Operations Research , author =

Optimal. Mathematics of Operations Research , author =. 1981 , pages =. doi:10.1287/moor.6.1.58 , number =

work page doi:10.1287/moor.6.1.58 1981

[38] [38]

American Economic Review , author =

The. American Economic Review , author =. 1973 , pages =

1973

[39] [39]

Games and Economic Behavior , author =

Regret-based continuous-time dynamics , volume =. Games and Economic Behavior , author =. 2003 , pages =. doi:10.1016/S0899-8256(03)00178-7 , number =

work page doi:10.1016/s0899-8256(03)00178-7 2003