pith. sign in

arxiv: 1906.08399 · v1 · pith:MO4H5QOOnew · submitted 2019-06-20 · 📊 stat.ML · cs.LG

Sequential Experimental Design for Transductive Linear Bandits

Pith reviewed 2026-05-25 19:44 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords transductive linear banditslinear banditscombinatorial banditssequential experimental designinstance-dependent boundsinformation-theoretic lower boundsbandit algorithms
0
0 comments X

The pith

An algorithm for the transductive linear bandit problem nearly matches the information-theoretic lower bound on queries needed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines a transductive linear bandit setting in which the learner must identify the maximizing vector from a target set Z by making sequential noisy inner-product queries from a possibly different measurement set X. It derives instance-dependent lower bounds on the number of queries required to succeed with fixed probability and gives an algorithm whose query count matches those bounds up to logarithmic factors. The construction specializes to ordinary linear bandits when X equals Z and supplies the first non-asymptotic procedure for that case that comes close to the lower bound. A reader would care because the setting captures real constraints in which allowable measurements differ from the items whose quality must be ranked.

Core claim

We introduce the transductive linear bandit problem and supply instance-dependent lower bounds together with an algorithm that matches the bounds up to logarithmic factors; when X coincides with Z the same procedure is the first non-asymptotic linear-bandit algorithm whose sample complexity nearly attains the information-theoretic lower bound.

What carries the argument

A sequential algorithm that selects queries from the known finite set X, using precomputed instance-dependent quantities derived from the geometry of X and Z, to certify the argmax element of Z with probability 1-delta.

If this is right

  • When X equals Z the procedure gives a nearly optimal solution for ordinary linear bandits.
  • The same bounds and algorithm cover combinatorial bandits when X consists of standard basis vectors and Z consists of binary vectors.
  • The query count adapts to the particular geometry of X and Z, yielding fewer measurements on easier instances.
  • The approach directly supports experimental design tasks in which the affordable measurements differ from the items whose ranking is required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The finite-set assumption suggests that discretizations of continuous measurement and target spaces could inherit similar guarantees.
  • In applications such as drug screening the separation of X and Z could translate into concrete reductions in laboratory tests while still ranking in-vivo candidates.
  • The instance-dependent analysis points toward possible extensions that update bounds online when X or Z are revealed incrementally.

Load-bearing premise

The sets X and Z are finite and known in advance, allowing the algorithm to plan queries and compute instance-dependent bounds from them.

What would settle it

On a concrete finite instance with known theta-star, run the algorithm and check whether the number of queries needed to reach success probability 1-delta exceeds the paper's lower bound by more than a logarithmic factor.

Figures

Figures reproduced from arXiv: 1906.08399 by Kevin Jamieson, Lalit Jain, Lillian Ratliff, Tanner Fiez.

Figure 1
Figure 1. Figure 1: Simulation Results simulation are depicted in Fig. 1d. The RAGE algorithm significantly outperforms the static allocation. 6 Conclusion In this paper we have proposed the problem of best-arm identification for transductive linear bandits, provided an algorithm, and matching upper and lower bounds. As a remark it is straightforward to exit our algorithm early with an ε-good arm. It still remains to develop … view at source ↗
read the original abstract

In this paper we introduce the transductive linear bandit problem: given a set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$, a set of items $\mathcal{Z}\subset \mathbb{R}^d$, a fixed confidence $\delta$, and an unknown vector $\theta^{\ast}\in \mathbb{R}^d$, the goal is to infer $\text{argmax}_{z\in \mathcal{Z}} z^\top\theta^\ast$ with probability $1-\delta$ by making as few sequentially chosen noisy measurements of the form $x^\top\theta^{\ast}$ as possible. When $\mathcal{X}=\mathcal{Z}$, this setting generalizes linear bandits, and when $\mathcal{X}$ is the standard basis vectors and $\mathcal{Z}\subset \{0,1\}^d$, combinatorial bandits. Such a transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages $\mathcal{X}$ a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages $\mathcal{Z}$ that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books $\mathcal{X}$ a user is queried about may be restricted to well known best-sellers even though the goal might be to recommend more esoteric titles $\mathcal{Z}$. In this paper, we provide instance-dependent lower bounds for the transductive setting, an algorithm that matches these up to logarithmic factors, and an evaluation. In particular, we provide the first non-asymptotic algorithm for linear bandits that nearly achieves the information theoretic lower bound.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the transductive linear bandit problem: given finite known sets X and Z of vectors in R^d, a fixed delta, and unknown theta*, identify argmax_{z in Z} z^T theta* with probability 1-delta using as few sequential noisy queries x^T theta* (x in X) as possible. It derives instance-dependent lower bounds on the number of queries, presents an algorithm that matches these bounds up to logarithmic factors, and specializes the result to linear bandits when X=Z. The central claim is that this yields the first non-asymptotic algorithm for linear bandits that nearly achieves the information-theoretic lower bound.

Significance. If the lower bounds and matching analysis hold, the result strengthens the theory of optimal sequential design for linear models by handling the transductive case where measurement and target sets differ. The non-asymptotic, instance-dependent guarantee and explicit matching (up to logs) are strengths, as is the explicit scope to finite known X and Z that enables the planning step. Applications to settings like drug discovery are noted but not central to the contribution.

major comments (2)
  1. The abstract and introduction state that X and Z are finite and known in advance; this precondition is load-bearing for the algorithm's ability to precompute optimal allocations and stopping thresholds that achieve the near-matching to the lower bound. No extension removing this assumption is claimed, so the result is correctly scoped but the 'first non-asymptotic nearly matching' claim for linear bandits holds only inside this finite-known regime.
  2. The lower-bound derivation and algorithm analysis must be checked for whether the log-factor gap is unavoidable or an artifact of the proof technique; if the paper's main theorem (likely in the analysis section) shows an explicit constant or improved log term under additional assumptions, that should be highlighted as it affects how close 'nearly achieves' is.
minor comments (2)
  1. Notation for the transductive setting (X vs Z) should be introduced with a clear diagram or example early in the paper to aid readers coming from standard linear bandits.
  2. The evaluation section would benefit from explicit comparison tables showing query complexity versus existing linear-bandit algorithms on the same instances.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their careful reading of the manuscript and for the constructive comments. We respond to each major comment below.

read point-by-point responses
  1. Referee: The abstract and introduction state that X and Z are finite and known in advance; this precondition is load-bearing for the algorithm's ability to precompute optimal allocations and stopping thresholds that achieve the near-matching to the lower bound. No extension removing this assumption is claimed, so the result is correctly scoped but the 'first non-asymptotic nearly matching' claim for linear bandits holds only inside this finite-known regime.

    Authors: We agree that the finiteness and a priori knowledge of both X and Z are essential to the algorithm, as they enable the offline computation of the optimal allocation and the stopping thresholds used to achieve the near-matching guarantee. The manuscript already scopes all results—including the specialization to linear bandits when X=Z—to this finite-known setting, with no claims made about extensions that remove the assumption. The 'first non-asymptotic nearly matching' statement is therefore already correctly limited to the regime in which the paper operates. revision: no

  2. Referee: The lower-bound derivation and algorithm analysis must be checked for whether the log-factor gap is unavoidable or an artifact of the proof technique; if the paper's main theorem (likely in the analysis section) shows an explicit constant or improved log term under additional assumptions, that should be highlighted as it affects how close 'nearly achieves' is.

    Authors: The logarithmic factors originate from the non-asymptotic concentration inequalities and union bounds employed in the stopping-time analysis over the finite sets. The main theorem states the bound with these explicit log terms and does not provide improved dependence under any additional assumptions. While a tighter characterization of whether the gap is information-theoretically necessary would be valuable, the current analysis already yields the first non-asymptotic guarantee that matches the lower bound up to logarithmic factors. revision: no

standing simulated objections not resolved
  • Determining whether the logarithmic gap is information-theoretically unavoidable or an artifact of the current proof technique would require a refined lower-bound analysis beyond the scope of this work.

Circularity Check

0 steps flagged

No significant circularity; lower bounds and matching algorithm are independently derived

full rationale

The paper states finite known X and Z as modeling assumptions in the problem definition, derives instance-dependent information-theoretic lower bounds, and constructs an algorithm whose sample complexity matches those bounds up to log factors. No quoted step reduces a claimed prediction or optimality result to a fitted parameter, self-definition, or load-bearing self-citation chain. The non-asymptotic guarantee for linear bandits is presented as a construction that achieves the external lower bound rather than re-deriving it from its own outputs. This is the normal case of a self-contained derivation against information-theoretic benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract supplies insufficient technical detail to enumerate free parameters, axioms, or invented entities used in the bounds or algorithm.

pith-pipeline@v0.9.0 · 5845 in / 1128 out tokens · 42094 ms · 2026-05-25T19:44:13.154784+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Logging Policy Design for Off-Policy Evaluation

    stat.ML 2026-05 unverdicted novelty 7.0

    Derives optimal logging policies for off-policy evaluation by balancing reward concentration against action coverage in known, unknown, and partially known regimes of target policy and rewards.

  2. Logging Policy Design for Off-Policy Evaluation

    stat.ML 2026-05 unverdicted novelty 5.0

    Derives optimal logging policies for minimizing off-policy evaluation error under known, unknown, and partially known target policies and reward distributions.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · cited by 1 Pith paper · 6 internal anchors

  1. [1]

    Improved algorithms for linear stochastic bandits

    Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Improved algorithms for linear stochastic bandits. InAdvances in Neural Information Processing Systems, pages 2312–2320, 2011

  2. [2]

    Near-Optimal Discrete Optimization for Experimental Design: A Regret Minimization Approach

    Zeyuan Allen-Zhu, Yuanzhi Li, Aarti Singh, and Yining Wang. Near-optimal discrete optimization for experimental design: A regret minimization approach.arXiv preprint arXiv:1711.05174, 2017

  3. [3]

    Best arm identification in multi-armed bandits

    Jean-Yves Audibert and Sébastien Bubeck. Best arm identification in multi-armed bandits. In Conference on Learning Theory, pages 41–53, 2010

  4. [4]

    Using confidence bounds for exploitation-exploration trade-offs.Journal of Machine Learning Research, 3:397–422, 2002

    Peter Auer. Using confidence bounds for exploitation-exploration trade-offs.Journal of Machine Learning Research, 3:397–422, 2002

  5. [5]

    Mirror descent and nonlinear projected subgradient methods for convex optimization.Operations Research Letters, 31(3):167–175, 2003

    Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization.Operations Research Letters, 31(3):167–175, 2003

  6. [6]

    Pure exploration in multi-armed bandits problems

    Sébastien Bubeck, Rémi Munos, and Gilles Stoltz. Pure exploration in multi-armed bandits problems. InInternational Conference on Algorithmic Learning Theory, pages 23–37, 2009

  7. [7]

    Disagreement-Based Combinatorial Pure Exploration: Sample Complexity Bounds and an Efficient Algorithm

    Tongyi Cao and Akshay Krishnamurthy. Disagreement-based combinatorial pure exploration: Efficient algorithms and an analysis with localization. arXiv preprint arXiv:1711.08018, 2017

  8. [8]

    Pure exploration of multi-armed bandit under matroid constraints

    Lijie Chen, Anupam Gupta, and Jian Li. Pure exploration of multi-armed bandit under matroid constraints. InConference on Learning Theory, pages 647–669, 2016

  9. [9]

    Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration

    Lijie Chen, Anupam Gupta, Jian Li, Mingda Qiao, and Ruosong Wang. Nearly optimal samplingalgorithmsforcombinatorialpureexploration. arXiv preprint arXiv:1706.01081, 2017

  10. [10]

    On the Optimal Sample Complexity for Best Arm Identification

    Lijie Chen and Jian Li. On the optimal sample complexity for best arm identification. arXiv preprint arXiv:1511.03774, 2015

  11. [11]

    Combinatorial pure exploration of multi-armed bandits

    Shouyuan Chen, Tian Lin, Irwin King, Michael R Lyu, and Wei Chen. Combinatorial pure exploration of multi-armed bandits. InAdvances in Neural Information Processing Systems, pages 379–387, 2014

  12. [12]

    Stochastic linear optimization under bandit feedback

    Varsha Dani, Thomas P Hayes, and Sham M Kakade. Stochastic linear optimization under bandit feedback. 2008

  13. [13]

    Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems.Journal of machine learning research, 7(Jun):1079–1105, 2006

    Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems.Journal of machine learning research, 7(Jun):1079–1105, 2006

  14. [14]

    Best arm identifi- cation: A unified approach to fixed budget and fixed confidence

    Victor Gabillon, Mohammad Ghavamzadeh, and Alessandro Lazaric. Best arm identifi- cation: A unified approach to fixed budget and fixed confidence. InAdvances in Neural Information Processing Systems, pages 3212–3220, 2012. 11

  15. [15]

    On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning

    Matthew Hoffman, Bobak Shahriari, and Nando Freitas. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In International Conference on Artificial Intelligence and Statistics, pages 365–374, 2014

  16. [16]

    Revisiting frank-wolfe: Projection-free sparse convex optimization

    Martin Jaggi. Revisiting frank-wolfe: Projection-free sparse convex optimization. In ICML (1), pages 427–435, 2013

  17. [17]

    lil’ucb: An optimal exploration algorithm for multi-armed bandits

    Kevin Jamieson, Matthew Malloy, Robert Nowak, and Sébastien Bubeck. lil’ucb: An optimal exploration algorithm for multi-armed bandits. InConference on Learning Theory, pages 423–439, 2014

  18. [18]

    Almost optimal exploration in multi- armed bandits

    Zohar Karnin, Tomer Koren, and Oren Somekh. Almost optimal exploration in multi- armed bandits. InInternational Conference on Machine Learning, pages 1238–1246, 2013

  19. [19]

    Verification based solution for structured mab problems

    Zohar S Karnin. Verification based solution for structured mab problems. InAdvances in Neural Information Processing Systems, pages 145–153, 2016

  20. [20]

    On the complexity of best-arm identification in multi-armed bandit models.Journal of Machine Learning Research, 17:1–42, 2016

    Emilie Kaufmann, Olivier Cappé, and Aurélien Garivier. On the complexity of best-arm identification in multi-armed bandit models.Journal of Machine Learning Research, 17:1–42, 2016

  21. [21]

    Best Arm Identification in Generalized Linear Bandits

    Abbas Kazerouni and Lawrence M Wein. Best arm identification in generalized linear bandits. arXiv preprint arXiv:1905.08224, 2019

  22. [22]

    The equivalence of two extremum problems.Canadian Journal of Mathematics, 12:363–366, 1960

    Jack Kiefer and Jacob Wolfowitz. The equivalence of two extremum problems.Canadian Journal of Mathematics, 12:363–366, 1960

  23. [23]

    The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

    Tor Lattimore and Csaba Szepesvari. The end of optimism? an asymptotic analysis of finite-armed linear bandits.arXiv preprint arXiv:1610.04491, 2016

  24. [24]

    A contextual-bandit approach to personalized news article recommendation

    Lihong Li, Wei Chu, John Langford, and Robert E Schapire. A contextual-bandit approach to personalized news article recommendation. InProceedings of the 19th international conference on World wide web, pages 661–670. ACM, 2010

  25. [25]

    SIAM, 2006

    Friedrich Pukelsheim.Optimal design of experiments. SIAM, 2006

  26. [26]

    Princeton university press, 2015

    Ralph Tyrell Rockafellar.Convex analysis. Princeton university press, 2015

  27. [27]

    Sequential resource allocation in linear stochastic bandits

    Marta Soare. Sequential resource allocation in linear stochastic bandits. PhD thesis, Université Lille 1-Sciences et Technologies, 2015

  28. [28]

    Best-arm identification in linear bandits

    Marta Soare, Alessandro Lazaric, and Rémi Munos. Best-arm identification in linear bandits. In Advances in Neural Information Processing Systems, pages 828–836, 2014

  29. [29]

    Best arm identification in linear bandits with linear dimension dependency

    Chao Tao, Saúl Blanco, and Yuan Zhou. Best arm identification in linear bandits with linear dimension dependency. InInternational Conference on Machine Learning, pages 4884–4893, 2018

  30. [30]

    A. B. Tsybakov. Optimal aggregation of classifiers in statistical learning.Annals of Statistics, pages 135–166, 2004. 12

  31. [31]

    A fully adaptive algorithm for pure exploration in linear bandits

    Liyuan Xu, Junya Honda, and Masashi Sugiyama. A fully adaptive algorithm for pure exploration in linear bandits. InInternational Conference on Artificial Intelligence and Statistics, pages 843–851, 2018

  32. [32]

    Active learning via transductive experimental design

    Kai Yu, Jinbo Bi, and Volker Tresp. Active learning via transductive experimental design. In Proceedings of the 23rd international conference on Machine learning, pages 1081–1088. ACM, 2006. 13 A Proof of Theorem 2 Proof. Let the good event for thetth round of Algorithm 1 be Et := { Nt≤ max{⌈8(2t+1)2ρ(Y(St))(1+ϵ) log(|Z| 2 δt )⌉, r(ϵ) }} ∩{ z∗∈ ˆZt+1}∩{ ˆ...

  33. [33]

    , Np are positive integers constrained such that∑ i≤p Ni≥ N

    Given the number of samplesN to obtain and the cardinality of the support ofλ, samples to allocate to arms in the support ofλ are computed usingNi =⌈(N− 1 2 p)λi⌉, where N1, N2, . . . , Np are positive integers constrained such that∑ i≤p Ni≥ N. 16

  34. [34]

    Following the previous phase of the rounding procedure, loop until the discrepancy (∑ i≤p Ni)− N = 0, from either decreasing a sample countNj which obtains (Nj− 1)/λj = mini≤p(Ni− 1)/λi to Nj + 1, or increasing a sample countNj which obtains Nj/λj = maxi≤p Ni/λi to Nj− 1. The efficient design apportionment theorem in Section 12.5 of [25] provides the founda...

  35. [35]

    Because (1− cos(α))2≈ α4/4 and these inequalities must hold for allj = 1,

    this hypothesis test requiresNj + Nj+d/2≥ c log(1/δ) (1−cos(α))2 for some universal constant c > 0. Because (1− cos(α))2≈ α4/4 and these inequalities must hold for allj = 1, . . . , d/2 simultaneously for the single static allocation, we obtain the result. 18 E Proof of Lemma 1 Proof. ρ(Y) = min λ∈ |X| max y∈Y ∥y∥2 (∑ x∈Xλxxx⊤)−1 = 1 γ2 Y min λ∈ |X| max y...