On the Power of Adaptivity for $\varepsilon$-Best Arm Identification in Linear Bandits

Arnab Maiti; Kevin Jamieson; Yunbei Xu

arxiv: 2605.15663 · v1 · pith:CRU7G3E5new · submitted 2026-05-15 · 💻 cs.LG

On the Power of Adaptivity for varepsilon-Best Arm Identification in Linear Bandits

Arnab Maiti , Yunbei Xu , Kevin Jamieson This is my paper

Pith reviewed 2026-05-20 20:49 UTC · model grok-4.3

classification 💻 cs.LG

keywords linear banditsbest arm identificationadaptivitysample complexityepsilon-best armaction setnorm estimationGaussian width

0 comments

The pith

There exists an action set in linear bandits where adaptivity gives a polynomial improvement over non-adaptive methods for ε-best arm identification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the minimax sample complexity of identifying an ε-best arm in a linear bandit with unknown reward vector θ and compact action set X spanning R^d. It first derives a non-adaptive fixed-design algorithm with sample complexity O(d log(1/δ)/ε² + w(X)²/ε²) and proves a matching lower bound that holds for every non-adaptive method. For several common action sets including the hypercube, ℓ2 ball, m-sets, and multi-task bandits, the authors show that adaptivity improves the rate by only logarithmic factors. The main result constructs a specific action set X for which adaptivity yields a polynomial-factor improvement, using an efficient adaptive subroutine that estimates the Euclidean norm of θ to additive accuracy ε with O(d log(1/δ)/ε²) samples.

Core claim

The central claim is that there exists a structured action set X for which any non-adaptive algorithm requires polynomially more samples than an adaptive algorithm to output an arm whose expected reward is within ε of the optimal arm with probability 1-δ. The separation is obtained by reducing an ℓ2-norm estimation problem to best-arm identification on this X; an adaptive procedure solves the norm estimation task using O(d log(1/δ)/ε²) samples from the unit ball and thereby achieves the improved rate on the constructed instance.

What carries the argument

An adaptive ℓ2-norm estimation algorithm that draws O(d log(1/δ)/ε²) samples from the unit ℓ2 ball and returns an estimate r̂ satisfying |r̂ - ||θ||₂| ≤ ε with high probability.

If this is right

Non-adaptive fixed-design methods are lower-bounded by Ω(d log(1/δ)/ε² + w(X)²/ε²) for every action set X.
For the hypercube, ℓ2 ball, m-sets, and multi-task bandits, adaptivity improves the non-adaptive rate by at most logarithmic factors.
On the specially constructed action set, the adaptive minimax sample complexity is polynomially smaller than the non-adaptive minimax sample complexity.
The adaptive norm estimator itself succeeds with O(d log(1/δ)/ε²) samples drawn from the unit ball.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation result indicates that geometric structure in the action set can make adaptivity essential for near-optimal sample efficiency in linear identification problems.
Similar polynomial gaps may exist in related settings such as linear regression with structured design matrices or contextual bandit identification.
Future constructions could aim to replace the explicit geometric X with more natural or randomly generated action sets that still exhibit the polynomial separation.
Validating the norm-estimation subroutine empirically in moderate dimensions would provide a concrete test of the O(d log(1/δ)/ε²) bound.

Load-bearing premise

The claim depends on the existence of a specific geometric action set X that permits embedding the norm-estimation problem into the best-arm identification task.

What would settle it

A direct calculation or simulation on the constructed action set showing that every adaptive algorithm still requires Ω(w(X)²/ε²) samples or that the proposed norm estimator needs more than O(d log(1/δ)/ε²) draws to achieve additive ε accuracy.

read the original abstract

We study the minimax sample complexity of $\varepsilon$-best arm identification in linear bandits. Given a compact action set $\mathcal{X}$ that spans $\mathbb{R}^d$ and an unknown reward vector $\theta\in\mathbb{R}^d$, the goal is to output an arm $\widehat{x}\in\mathcal{X}$ such that $\langle \widehat{x},\theta\rangle \ge \max_{x\in\mathcal{X}} \langle x,\theta\rangle - \varepsilon$ with probability at least $1-\delta$, using as few samples as possible. First, we present a non-adaptive fixed-design method with sample complexity $\mathcal{O}\!\left(\frac{d\log(1/\delta)}{\varepsilon^2}+\frac{w(\mathcal{X})^2}{\varepsilon^2}\right)$, where $w(\mathcal{X})$ is a Gaussian width term dependent on $\mathcal{X}$, and we prove a matching lower bound $\Omega\!\left(\frac{d\log(1/\delta)}{\varepsilon^2}+\frac{w(\mathcal{X})^2}{\varepsilon^2}\right)$ for all non-adaptive fixed-design methods. We then turn to adaptive sampling. We raise an important structural question: beyond the canonical basis, are there structured action sets for which adaptivity yields only logarithmic-factor improvements over the optimal non-adaptive rate? We answer in the affirmative for several natural action sets, namely the hypercube, the $\ell_2$ ball, $m$-sets, and multi-task multi-armed bandits. Finally, we provide the first construction of an action set $\mathcal{X}$ for which adaptivity yields a polynomial-factor improvement over every non-adaptive algorithm. A key ingredient behind this separation is an $\ell_2$-norm estimation subroutine: we design an adaptive algorithm that uses $\mathcal{O}\!\left(\frac{d\log(1/\delta)}{\varepsilon^2}\right)$ samples from the unit $\ell_2$ ball in $\mathbb{R}^d$ and outputs an estimate $\widehat r$ satisfying $|\widehat r-\|\theta\|_2|\le \varepsilon$ with probability at least $1-\delta$, where $\theta$ is the unknown reward vector.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper constructs a specific action set in linear bandits where adaptivity gives a polynomial sample-complexity improvement over non-adaptive methods for epsilon-best arm ID, via a new adaptive l2-norm estimator.

read the letter

The main takeaway is that this paper gives the first example of an action set in linear bandits where adaptive sampling for epsilon-best arm identification beats every non-adaptive strategy by a polynomial factor in the sample complexity. The key tool is a new adaptive procedure that estimates the l2 norm of theta to accuracy epsilon with O(d log(1/delta)/eps squared) samples. On the positive side, the paper does a good job laying out the non-adaptive case. They give an upper bound using fixed design sampling that scales with d log(1/delta)/eps^2 plus the square of the Gaussian width of X over eps^2, and they match it with a lower bound for fixed-design methods. They also check several natural action sets, including the hypercube and the l2 ball, and show that for those, adaptivity only improves the rate by logarithmic factors. That helps frame when the big gains happen. The new result is the construction of a special X where the gap is polynomial, relying on reducing the problem to estimating the norm adaptively. The norm estimator itself looks like a useful primitive that could apply elsewhere. A soft spot worth checking is the scope of the lower bound. The matching lower bound is stated for non-adaptive fixed-design methods, but the separation claim is against every non-adaptive algorithm. If a non-adaptive randomized design, where you fix a distribution over arms without depending on observations, can achieve better rates than the fixed-design lower bound, then the polynomial improvement might not hold for the full class of non-adaptive methods. The paper needs to close that gap explicitly for the claim to go through. Overall, this is aimed at researchers in bandit theory who care about the value of adaptivity in structured settings. It organizes the landscape nicely with concrete examples. The thinking is clear and the results are specific, so it deserves a serious referee. I would send it out for review.

Referee Report

2 major / 2 minor

Summary. The manuscript studies the minimax sample complexity of ε-best arm identification in linear bandits over a compact action set X spanning R^d. It gives a non-adaptive fixed-design algorithm achieving O(d log(1/δ)/ε² + w(X)²/ε²) samples together with a matching Ω lower bound for all non-adaptive fixed-design methods. For several natural action sets (hypercube, ℓ₂ ball, m-sets, multi-task MAB) it shows that adaptivity improves the rate by only logarithmic factors. It then constructs a specific action set X for which adaptivity yields a polynomial-factor improvement over every non-adaptive algorithm, using as a key subroutine an adaptive procedure that estimates ||θ||₂ to additive ε accuracy with O(d log(1/δ)/ε²) samples from the unit ℓ₂ ball.

Significance. If the separation result holds, the paper supplies the first explicit geometric construction separating adaptive and non-adaptive rates by a polynomial factor in linear bandits, together with a reusable adaptive ℓ₂-norm estimator. The appearance of the Gaussian width w(X) as the precise geometric complexity term for the non-adaptive case is a clean and falsifiable contribution. These elements would strengthen the literature on when adaptivity helps in structured linear problems.

major comments (2)

[Abstract] Abstract: the matching lower bound Ω(d log(1/δ)/ε² + w(X)²/ε²) is stated only for non-adaptive fixed-design methods, yet the central separation claim asserts a polynomial improvement “over every non-adaptive algorithm.” Randomized non-adaptive designs (fixed distributions over arms chosen independently of observations) are not covered by the fixed-design lower bound; if such designs can achieve rates closer to the adaptive O(d log(1/δ)/ε²) norm-estimation subroutine, the polynomial separation does not hold for the full class of non-adaptive algorithms.
[Abstract] Abstract and the section describing the separating construction: the load-bearing modeling choice is the specific geometric action set X together with the reduction that embeds the ℓ₂-norm estimator into the best-arm identification problem. The manuscript must verify that this reduction preserves the polynomial gap for the reward distributions under consideration and that no simpler non-adaptive (possibly randomized) strategy on the same X can match the adaptive rate.

minor comments (2)

Notation: the precise definition of the Gaussian width w(X) and its relation to the action set geometry should be recalled explicitly when first used, even if standard.
The manuscript should clarify whether the O(d log(1/δ)/ε²) norm-estimation bound assumes sub-Gaussian noise or holds more generally.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable feedback on our work. The comments regarding the precise scope of the non-adaptive lower bound and the need for verification in the separating construction are well-taken. We provide point-by-point responses below and will make the necessary revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the matching lower bound Ω(d log(1/δ)/ε² + w(X)²/ε²) is stated only for non-adaptive fixed-design methods, yet the central separation claim asserts a polynomial improvement “over every non-adaptive algorithm.” Randomized non-adaptive designs (fixed distributions over arms chosen independently of observations) are not covered by the fixed-design lower bound; if such designs can achieve rates closer to the adaptive O(d log(1/δ)/ε²) norm-estimation subroutine, the polynomial separation does not hold for the full class of non-adaptive algorithms.

Authors: We acknowledge the validity of this comment. Our current lower bound applies exclusively to non-adaptive fixed-design methods. To strengthen the separation claim, we will revise the manuscript to include an analysis showing that randomized non-adaptive algorithms cannot achieve a substantially better rate than fixed-design methods for the action sets considered, particularly in the separating construction. Alternatively, we will adjust the claim to refer specifically to non-adaptive fixed-design algorithms if a general extension proves technically challenging. This revision will ensure the polynomial improvement is accurately positioned relative to the class of non-adaptive methods. revision: yes
Referee: [Abstract] Abstract and the section describing the separating construction: the load-bearing modeling choice is the specific geometric action set X together with the reduction that embeds the ℓ₂-norm estimator into the best-arm identification problem. The manuscript must verify that this reduction preserves the polynomial gap for the reward distributions under consideration and that no simpler non-adaptive (possibly randomized) strategy on the same X can match the adaptive rate.

Authors: We agree that explicit verification of the reduction and its implications is necessary. In the revised version, we will add a dedicated subsection that details the embedding of the adaptive ℓ₂-norm estimator into the ε-best arm identification task on the constructed action set X. This will demonstrate that the reward distributions and the polynomial sample-complexity gap are preserved. We will further establish that, for this specific X, no non-adaptive strategy—including those based on randomized fixed distributions—can match the adaptive rate, thereby confirming that the separation is robust to the modeling choices. revision: yes

Circularity Check

0 steps flagged

No circularity: rates from concentration and explicit geometric construction

full rationale

The paper derives the non-adaptive fixed-design upper and lower bounds directly from Gaussian concentration and minimax arguments on the action set geometry (w(X) term defined externally). The adaptive separation result rests on an explicit construction of a separating action set X together with a first-principles adaptive norm-estimation subroutine whose sample complexity is obtained from standard vector concentration, not from fitting to the target best-arm quantity. No equation reduces to a prior result by definition, no parameter is fitted to the claimed separation, and no load-bearing step is justified solely by self-citation. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper relies on standard sub-Gaussian concentration and minimax arguments for linear bandits; the separating action set is introduced as a new geometric object whose properties are proved rather than postulated without evidence.

axioms (1)

domain assumption Rewards are sub-Gaussian with known variance proxy
Standard assumption invoked for all concentration bounds in the sample-complexity derivations.

invented entities (1)

Separating action set X independent evidence
purpose: To exhibit a geometry where adaptive sampling achieves polynomial improvement over any non-adaptive procedure
The set is constructed explicitly in the paper; independent evidence consists of the matching lower bound for non-adaptive methods on that set.

pith-pipeline@v0.9.0 · 5945 in / 1541 out tokens · 69479 ms · 2026-05-20T20:49:03.186372+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

non-adaptive fixed-design method with sample complexity O(d log(1/δ)/ε² + w(X)²/ε²) … matching lower bound Ω(d log(1/δ)/ε² + w(X)²/ε²)
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ℓ₂-norm estimation subroutine … O(d log(1/δ)/ε²) samples … |r̂ − ∥θ∥₂| ≤ ε

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

International Conference on Computational Learning Theory , pages=

PAC bounds for multi-armed bandit and Markov decision processes , author=. International Conference on Computational Learning Theory , pages=. 2002 , organization=

work page 2002
[2]

Journal of Computer and System Sciences , volume=

Combinatorial bandits , author=. Journal of Computer and System Sciences , volume=. 2012 , publisher=

work page 2012
[3]

Conference on Learning Theory , pages=

Tight bounds for bandit combinatorial optimization , author=. Conference on Learning Theory , pages=. 2017 , organization=

work page 2017
[4]

arXiv preprint arXiv:2504.00461 , year=

Efficient near-optimal algorithm for online shortest paths in directed acyclic graphs with bandit feedback against adaptive adversaries , author=. arXiv preprint arXiv:2504.00461 , year=

work page arXiv
[5]

arXiv preprint arXiv:2510.17099 , year=

On the Universal Near Optimality of Hedge in Combinatorial Settings , author=. arXiv preprint arXiv:2510.17099 , year=

work page arXiv
[6]

Advances in neural information processing systems , volume=

Best-arm identification in linear bandits , author=. Advances in neural information processing systems , volume=

work page
[7]

Advances in Neural Information Processing Systems , volume=

Optimal best-arm identification in linear bandits , author=. Advances in Neural Information Processing Systems , volume=

work page
[8]

Advances in neural information processing systems , volume=

Sequential experimental design for transductive linear bandits , author=. Advances in neural information processing systems , volume=

work page
[9]

Advances in Neural Information Processing Systems , volume=

An empirical process approach to the union bound: Practical algorithms for combinatorial and linear bandits , author=. Advances in Neural Information Processing Systems , volume=

work page
[10]

International Conference on Machine Learning , pages=

Best arm identification in linear bandits with linear dimension dependency , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018
[11]

International Conference on Machine Learning , pages=

Gamification of pure exploration for linear bandits , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020
[12]

International Conference on Artificial Intelligence and Statistics , pages=

A fully adaptive algorithm for pure exploration in linear bandits , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2018 , organization=

work page 2018
[13]

Advances in Neural Information Processing Systems , volume=

Verification based solution for structured mab problems , author=. Advances in Neural Information Processing Systems , volume=

work page
[14]

International Conference on Artificial Intelligence and Statistics , pages=

Experimental design for regret minimization in linear bandits , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

work page 2021
[15]

2020 , publisher=

Bandit algorithms , author=. 2020 , publisher=

work page 2020
[16]

International conference on artificial intelligence and statistics , pages=

Sublinear optimal policy value estimation in contextual bandits , author=. International conference on artificial intelligence and statistics , pages=. 2020 , organization=

work page 2020
[17]

The Annals of Statistics , volume=

Testing Composite Hypotheses, Hermite Polynomials and Optimal Estimation of a Nonsmooth Functional , author=. The Annals of Statistics , volume=

work page
[18]

Bernoulli , year=

ON ESTIMATION OF NONSMOOTH FUNCTIONALS OF SPARSE NORMAL MEANS , author=. Bernoulli , year=

work page
[19]

Probability Theory and Related Fields , volume=

On estimation of L r-norms in Gaussian white noise models , author=. Probability Theory and Related Fields , volume=. 2020 , publisher=

work page 2020
[20]

Upper bounds via the oracle approach , author=

Adaptive estimation of the L 2-norm of a probability density and related topics II. Upper bounds via the oracle approach , author=. The Annals of Statistics , volume=. 2025 , publisher=

work page 2025
[21]

Advances in Neural Information Processing Systems , volume=

Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability , author=. Advances in Neural Information Processing Systems , volume=

work page
[22]

Conference on Learning Theory , pages=

On the complexity of bandit linear optimization , author=. Conference on Learning Theory , pages=. 2015 , organization=

work page 2015
[23]

2006 , publisher=

Optimal design of experiments , author=. 2006 , publisher=

work page 2006
[24]

IEEE Transactions on Information Theory , volume=

On the fundamental limits of adaptive sensing , author=. IEEE Transactions on Information Theory , volume=. 2012 , publisher=

work page 2012
[25]

1935 , publisher=

The Design of Experiments , author=. 1935 , publisher=

work page 1935
[26]

2018 , publisher=

High-dimensional probability: An introduction with applications in data science , author=. 2018 , publisher=

work page 2018
[27]

The gamma function and stirling’s formula , author=

work page
[28]

Flavors of geometry , volume=

An elementary introduction to modern convex geometry , author=. Flavors of geometry , volume=

work page
[29]

Mathematical Programming , volume=

Near-optimal discrete optimization for experimental design: A regret minimization approach , author=. Mathematical Programming , volume=. 2021 , publisher=

work page 2021
[30]

Machine learning , volume=

Linear least-squares algorithms for temporal difference learning , author=. Machine learning , volume=. 1996 , publisher=

work page 1996
[31]

Conference on learning theory , pages=

Provably efficient reinforcement learning with linear function approximation , author=. Conference on learning theory , pages=. 2020 , organization=

work page 2020

[1] [1]

International Conference on Computational Learning Theory , pages=

PAC bounds for multi-armed bandit and Markov decision processes , author=. International Conference on Computational Learning Theory , pages=. 2002 , organization=

work page 2002

[2] [2]

Journal of Computer and System Sciences , volume=

Combinatorial bandits , author=. Journal of Computer and System Sciences , volume=. 2012 , publisher=

work page 2012

[3] [3]

Conference on Learning Theory , pages=

Tight bounds for bandit combinatorial optimization , author=. Conference on Learning Theory , pages=. 2017 , organization=

work page 2017

[4] [4]

arXiv preprint arXiv:2504.00461 , year=

Efficient near-optimal algorithm for online shortest paths in directed acyclic graphs with bandit feedback against adaptive adversaries , author=. arXiv preprint arXiv:2504.00461 , year=

work page arXiv

[5] [5]

arXiv preprint arXiv:2510.17099 , year=

On the Universal Near Optimality of Hedge in Combinatorial Settings , author=. arXiv preprint arXiv:2510.17099 , year=

work page arXiv

[6] [6]

Advances in neural information processing systems , volume=

Best-arm identification in linear bandits , author=. Advances in neural information processing systems , volume=

work page

[7] [7]

Advances in Neural Information Processing Systems , volume=

Optimal best-arm identification in linear bandits , author=. Advances in Neural Information Processing Systems , volume=

work page

[8] [8]

Advances in neural information processing systems , volume=

Sequential experimental design for transductive linear bandits , author=. Advances in neural information processing systems , volume=

work page

[9] [9]

Advances in Neural Information Processing Systems , volume=

An empirical process approach to the union bound: Practical algorithms for combinatorial and linear bandits , author=. Advances in Neural Information Processing Systems , volume=

work page

[10] [10]

International Conference on Machine Learning , pages=

Best arm identification in linear bandits with linear dimension dependency , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018

[11] [11]

International Conference on Machine Learning , pages=

Gamification of pure exploration for linear bandits , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020

[12] [12]

International Conference on Artificial Intelligence and Statistics , pages=

A fully adaptive algorithm for pure exploration in linear bandits , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2018 , organization=

work page 2018

[13] [13]

Advances in Neural Information Processing Systems , volume=

Verification based solution for structured mab problems , author=. Advances in Neural Information Processing Systems , volume=

work page

[14] [14]

International Conference on Artificial Intelligence and Statistics , pages=

Experimental design for regret minimization in linear bandits , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

work page 2021

[15] [15]

2020 , publisher=

Bandit algorithms , author=. 2020 , publisher=

work page 2020

[16] [16]

International conference on artificial intelligence and statistics , pages=

Sublinear optimal policy value estimation in contextual bandits , author=. International conference on artificial intelligence and statistics , pages=. 2020 , organization=

work page 2020

[17] [17]

The Annals of Statistics , volume=

Testing Composite Hypotheses, Hermite Polynomials and Optimal Estimation of a Nonsmooth Functional , author=. The Annals of Statistics , volume=

work page

[18] [18]

Bernoulli , year=

ON ESTIMATION OF NONSMOOTH FUNCTIONALS OF SPARSE NORMAL MEANS , author=. Bernoulli , year=

work page

[19] [19]

Probability Theory and Related Fields , volume=

On estimation of L r-norms in Gaussian white noise models , author=. Probability Theory and Related Fields , volume=. 2020 , publisher=

work page 2020

[20] [20]

Upper bounds via the oracle approach , author=

Adaptive estimation of the L 2-norm of a probability density and related topics II. Upper bounds via the oracle approach , author=. The Annals of Statistics , volume=. 2025 , publisher=

work page 2025

[21] [21]

Advances in Neural Information Processing Systems , volume=

Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability , author=. Advances in Neural Information Processing Systems , volume=

work page

[22] [22]

Conference on Learning Theory , pages=

On the complexity of bandit linear optimization , author=. Conference on Learning Theory , pages=. 2015 , organization=

work page 2015

[23] [23]

2006 , publisher=

Optimal design of experiments , author=. 2006 , publisher=

work page 2006

[24] [24]

IEEE Transactions on Information Theory , volume=

On the fundamental limits of adaptive sensing , author=. IEEE Transactions on Information Theory , volume=. 2012 , publisher=

work page 2012

[25] [25]

1935 , publisher=

The Design of Experiments , author=. 1935 , publisher=

work page 1935

[26] [26]

2018 , publisher=

High-dimensional probability: An introduction with applications in data science , author=. 2018 , publisher=

work page 2018

[27] [27]

The gamma function and stirling’s formula , author=

work page

[28] [28]

Flavors of geometry , volume=

An elementary introduction to modern convex geometry , author=. Flavors of geometry , volume=

work page

[29] [29]

Mathematical Programming , volume=

Near-optimal discrete optimization for experimental design: A regret minimization approach , author=. Mathematical Programming , volume=. 2021 , publisher=

work page 2021

[30] [30]

Machine learning , volume=

Linear least-squares algorithms for temporal difference learning , author=. Machine learning , volume=. 1996 , publisher=

work page 1996

[31] [31]

Conference on learning theory , pages=

Provably efficient reinforcement learning with linear function approximation , author=. Conference on learning theory , pages=. 2020 , organization=

work page 2020