Sequential Experimental Design for Transductive Linear Bandits
Pith reviewed 2026-05-25 19:44 UTC · model grok-4.3
The pith
An algorithm for the transductive linear bandit problem nearly matches the information-theoretic lower bound on queries needed.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce the transductive linear bandit problem and supply instance-dependent lower bounds together with an algorithm that matches the bounds up to logarithmic factors; when X coincides with Z the same procedure is the first non-asymptotic linear-bandit algorithm whose sample complexity nearly attains the information-theoretic lower bound.
What carries the argument
A sequential algorithm that selects queries from the known finite set X, using precomputed instance-dependent quantities derived from the geometry of X and Z, to certify the argmax element of Z with probability 1-delta.
If this is right
- When X equals Z the procedure gives a nearly optimal solution for ordinary linear bandits.
- The same bounds and algorithm cover combinatorial bandits when X consists of standard basis vectors and Z consists of binary vectors.
- The query count adapts to the particular geometry of X and Z, yielding fewer measurements on easier instances.
- The approach directly supports experimental design tasks in which the affordable measurements differ from the items whose ranking is required.
Where Pith is reading between the lines
- The finite-set assumption suggests that discretizations of continuous measurement and target spaces could inherit similar guarantees.
- In applications such as drug screening the separation of X and Z could translate into concrete reductions in laboratory tests while still ranking in-vivo candidates.
- The instance-dependent analysis points toward possible extensions that update bounds online when X or Z are revealed incrementally.
Load-bearing premise
The sets X and Z are finite and known in advance, allowing the algorithm to plan queries and compute instance-dependent bounds from them.
What would settle it
On a concrete finite instance with known theta-star, run the algorithm and check whether the number of queries needed to reach success probability 1-delta exceeds the paper's lower bound by more than a logarithmic factor.
Figures
read the original abstract
In this paper we introduce the transductive linear bandit problem: given a set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$, a set of items $\mathcal{Z}\subset \mathbb{R}^d$, a fixed confidence $\delta$, and an unknown vector $\theta^{\ast}\in \mathbb{R}^d$, the goal is to infer $\text{argmax}_{z\in \mathcal{Z}} z^\top\theta^\ast$ with probability $1-\delta$ by making as few sequentially chosen noisy measurements of the form $x^\top\theta^{\ast}$ as possible. When $\mathcal{X}=\mathcal{Z}$, this setting generalizes linear bandits, and when $\mathcal{X}$ is the standard basis vectors and $\mathcal{Z}\subset \{0,1\}^d$, combinatorial bandits. Such a transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages $\mathcal{X}$ a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages $\mathcal{Z}$ that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books $\mathcal{X}$ a user is queried about may be restricted to well known best-sellers even though the goal might be to recommend more esoteric titles $\mathcal{Z}$. In this paper, we provide instance-dependent lower bounds for the transductive setting, an algorithm that matches these up to logarithmic factors, and an evaluation. In particular, we provide the first non-asymptotic algorithm for linear bandits that nearly achieves the information theoretic lower bound.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the transductive linear bandit problem: given finite known sets X and Z of vectors in R^d, a fixed delta, and unknown theta*, identify argmax_{z in Z} z^T theta* with probability 1-delta using as few sequential noisy queries x^T theta* (x in X) as possible. It derives instance-dependent lower bounds on the number of queries, presents an algorithm that matches these bounds up to logarithmic factors, and specializes the result to linear bandits when X=Z. The central claim is that this yields the first non-asymptotic algorithm for linear bandits that nearly achieves the information-theoretic lower bound.
Significance. If the lower bounds and matching analysis hold, the result strengthens the theory of optimal sequential design for linear models by handling the transductive case where measurement and target sets differ. The non-asymptotic, instance-dependent guarantee and explicit matching (up to logs) are strengths, as is the explicit scope to finite known X and Z that enables the planning step. Applications to settings like drug discovery are noted but not central to the contribution.
major comments (2)
- The abstract and introduction state that X and Z are finite and known in advance; this precondition is load-bearing for the algorithm's ability to precompute optimal allocations and stopping thresholds that achieve the near-matching to the lower bound. No extension removing this assumption is claimed, so the result is correctly scoped but the 'first non-asymptotic nearly matching' claim for linear bandits holds only inside this finite-known regime.
- The lower-bound derivation and algorithm analysis must be checked for whether the log-factor gap is unavoidable or an artifact of the proof technique; if the paper's main theorem (likely in the analysis section) shows an explicit constant or improved log term under additional assumptions, that should be highlighted as it affects how close 'nearly achieves' is.
minor comments (2)
- Notation for the transductive setting (X vs Z) should be introduced with a clear diagram or example early in the paper to aid readers coming from standard linear bandits.
- The evaluation section would benefit from explicit comparison tables showing query complexity versus existing linear-bandit algorithms on the same instances.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for the constructive comments. We respond to each major comment below.
read point-by-point responses
-
Referee: The abstract and introduction state that X and Z are finite and known in advance; this precondition is load-bearing for the algorithm's ability to precompute optimal allocations and stopping thresholds that achieve the near-matching to the lower bound. No extension removing this assumption is claimed, so the result is correctly scoped but the 'first non-asymptotic nearly matching' claim for linear bandits holds only inside this finite-known regime.
Authors: We agree that the finiteness and a priori knowledge of both X and Z are essential to the algorithm, as they enable the offline computation of the optimal allocation and the stopping thresholds used to achieve the near-matching guarantee. The manuscript already scopes all results—including the specialization to linear bandits when X=Z—to this finite-known setting, with no claims made about extensions that remove the assumption. The 'first non-asymptotic nearly matching' statement is therefore already correctly limited to the regime in which the paper operates. revision: no
-
Referee: The lower-bound derivation and algorithm analysis must be checked for whether the log-factor gap is unavoidable or an artifact of the proof technique; if the paper's main theorem (likely in the analysis section) shows an explicit constant or improved log term under additional assumptions, that should be highlighted as it affects how close 'nearly achieves' is.
Authors: The logarithmic factors originate from the non-asymptotic concentration inequalities and union bounds employed in the stopping-time analysis over the finite sets. The main theorem states the bound with these explicit log terms and does not provide improved dependence under any additional assumptions. While a tighter characterization of whether the gap is information-theoretically necessary would be valuable, the current analysis already yields the first non-asymptotic guarantee that matches the lower bound up to logarithmic factors. revision: no
- Determining whether the logarithmic gap is information-theoretically unavoidable or an artifact of the current proof technique would require a refined lower-bound analysis beyond the scope of this work.
Circularity Check
No significant circularity; lower bounds and matching algorithm are independently derived
full rationale
The paper states finite known X and Z as modeling assumptions in the problem definition, derives instance-dependent information-theoretic lower bounds, and constructs an algorithm whose sample complexity matches those bounds up to log factors. No quoted step reduces a claimed prediction or optimality result to a fitted parameter, self-definition, or load-bearing self-citation chain. The non-asymptotic guarantee for linear bandits is presented as a construction that achieves the external lower bound rather than re-deriving it from its own outputs. This is the normal case of a self-contained derivation against information-theoretic benchmarks.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 2 Pith papers
-
Logging Policy Design for Off-Policy Evaluation
Derives optimal logging policies for off-policy evaluation by balancing reward concentration against action coverage in known, unknown, and partially known regimes of target policy and rewards.
-
Logging Policy Design for Off-Policy Evaluation
Derives optimal logging policies for minimizing off-policy evaluation error under known, unknown, and partially known target policies and reward distributions.
Reference graph
Works this paper leans on
-
[1]
Improved algorithms for linear stochastic bandits
Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Improved algorithms for linear stochastic bandits. InAdvances in Neural Information Processing Systems, pages 2312–2320, 2011
work page 2011
-
[2]
Near-Optimal Discrete Optimization for Experimental Design: A Regret Minimization Approach
Zeyuan Allen-Zhu, Yuanzhi Li, Aarti Singh, and Yining Wang. Near-optimal discrete optimization for experimental design: A regret minimization approach.arXiv preprint arXiv:1711.05174, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[3]
Best arm identification in multi-armed bandits
Jean-Yves Audibert and Sébastien Bubeck. Best arm identification in multi-armed bandits. In Conference on Learning Theory, pages 41–53, 2010
work page 2010
-
[4]
Peter Auer. Using confidence bounds for exploitation-exploration trade-offs.Journal of Machine Learning Research, 3:397–422, 2002
work page 2002
-
[5]
Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization.Operations Research Letters, 31(3):167–175, 2003
work page 2003
-
[6]
Pure exploration in multi-armed bandits problems
Sébastien Bubeck, Rémi Munos, and Gilles Stoltz. Pure exploration in multi-armed bandits problems. InInternational Conference on Algorithmic Learning Theory, pages 23–37, 2009
work page 2009
-
[7]
Tongyi Cao and Akshay Krishnamurthy. Disagreement-based combinatorial pure exploration: Efficient algorithms and an analysis with localization. arXiv preprint arXiv:1711.08018, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[8]
Pure exploration of multi-armed bandit under matroid constraints
Lijie Chen, Anupam Gupta, and Jian Li. Pure exploration of multi-armed bandit under matroid constraints. InConference on Learning Theory, pages 647–669, 2016
work page 2016
-
[9]
Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration
Lijie Chen, Anupam Gupta, Jian Li, Mingda Qiao, and Ruosong Wang. Nearly optimal samplingalgorithmsforcombinatorialpureexploration. arXiv preprint arXiv:1706.01081, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[10]
On the Optimal Sample Complexity for Best Arm Identification
Lijie Chen and Jian Li. On the optimal sample complexity for best arm identification. arXiv preprint arXiv:1511.03774, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[11]
Combinatorial pure exploration of multi-armed bandits
Shouyuan Chen, Tian Lin, Irwin King, Michael R Lyu, and Wei Chen. Combinatorial pure exploration of multi-armed bandits. InAdvances in Neural Information Processing Systems, pages 379–387, 2014
work page 2014
-
[12]
Stochastic linear optimization under bandit feedback
Varsha Dani, Thomas P Hayes, and Sham M Kakade. Stochastic linear optimization under bandit feedback. 2008
work page 2008
-
[13]
Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems.Journal of machine learning research, 7(Jun):1079–1105, 2006
work page 2006
-
[14]
Best arm identifi- cation: A unified approach to fixed budget and fixed confidence
Victor Gabillon, Mohammad Ghavamzadeh, and Alessandro Lazaric. Best arm identifi- cation: A unified approach to fixed budget and fixed confidence. InAdvances in Neural Information Processing Systems, pages 3212–3220, 2012. 11
work page 2012
-
[15]
Matthew Hoffman, Bobak Shahriari, and Nando Freitas. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In International Conference on Artificial Intelligence and Statistics, pages 365–374, 2014
work page 2014
-
[16]
Revisiting frank-wolfe: Projection-free sparse convex optimization
Martin Jaggi. Revisiting frank-wolfe: Projection-free sparse convex optimization. In ICML (1), pages 427–435, 2013
work page 2013
-
[17]
lil’ucb: An optimal exploration algorithm for multi-armed bandits
Kevin Jamieson, Matthew Malloy, Robert Nowak, and Sébastien Bubeck. lil’ucb: An optimal exploration algorithm for multi-armed bandits. InConference on Learning Theory, pages 423–439, 2014
work page 2014
-
[18]
Almost optimal exploration in multi- armed bandits
Zohar Karnin, Tomer Koren, and Oren Somekh. Almost optimal exploration in multi- armed bandits. InInternational Conference on Machine Learning, pages 1238–1246, 2013
work page 2013
-
[19]
Verification based solution for structured mab problems
Zohar S Karnin. Verification based solution for structured mab problems. InAdvances in Neural Information Processing Systems, pages 145–153, 2016
work page 2016
-
[20]
Emilie Kaufmann, Olivier Cappé, and Aurélien Garivier. On the complexity of best-arm identification in multi-armed bandit models.Journal of Machine Learning Research, 17:1–42, 2016
work page 2016
-
[21]
Best Arm Identification in Generalized Linear Bandits
Abbas Kazerouni and Lawrence M Wein. Best arm identification in generalized linear bandits. arXiv preprint arXiv:1905.08224, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[22]
The equivalence of two extremum problems.Canadian Journal of Mathematics, 12:363–366, 1960
Jack Kiefer and Jacob Wolfowitz. The equivalence of two extremum problems.Canadian Journal of Mathematics, 12:363–366, 1960
work page 1960
-
[23]
The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits
Tor Lattimore and Csaba Szepesvari. The end of optimism? an asymptotic analysis of finite-armed linear bandits.arXiv preprint arXiv:1610.04491, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[24]
A contextual-bandit approach to personalized news article recommendation
Lihong Li, Wei Chu, John Langford, and Robert E Schapire. A contextual-bandit approach to personalized news article recommendation. InProceedings of the 19th international conference on World wide web, pages 661–670. ACM, 2010
work page 2010
- [25]
-
[26]
Princeton university press, 2015
Ralph Tyrell Rockafellar.Convex analysis. Princeton university press, 2015
work page 2015
-
[27]
Sequential resource allocation in linear stochastic bandits
Marta Soare. Sequential resource allocation in linear stochastic bandits. PhD thesis, Université Lille 1-Sciences et Technologies, 2015
work page 2015
-
[28]
Best-arm identification in linear bandits
Marta Soare, Alessandro Lazaric, and Rémi Munos. Best-arm identification in linear bandits. In Advances in Neural Information Processing Systems, pages 828–836, 2014
work page 2014
-
[29]
Best arm identification in linear bandits with linear dimension dependency
Chao Tao, Saúl Blanco, and Yuan Zhou. Best arm identification in linear bandits with linear dimension dependency. InInternational Conference on Machine Learning, pages 4884–4893, 2018
work page 2018
-
[30]
A. B. Tsybakov. Optimal aggregation of classifiers in statistical learning.Annals of Statistics, pages 135–166, 2004. 12
work page 2004
-
[31]
A fully adaptive algorithm for pure exploration in linear bandits
Liyuan Xu, Junya Honda, and Masashi Sugiyama. A fully adaptive algorithm for pure exploration in linear bandits. InInternational Conference on Artificial Intelligence and Statistics, pages 843–851, 2018
work page 2018
-
[32]
Active learning via transductive experimental design
Kai Yu, Jinbo Bi, and Volker Tresp. Active learning via transductive experimental design. In Proceedings of the 23rd international conference on Machine learning, pages 1081–1088. ACM, 2006. 13 A Proof of Theorem 2 Proof. Let the good event for thetth round of Algorithm 1 be Et := { Nt≤ max{⌈8(2t+1)2ρ(Y(St))(1+ϵ) log(|Z| 2 δt )⌉, r(ϵ) }} ∩{ z∗∈ ˆZt+1}∩{ ˆ...
work page 2006
-
[33]
, Np are positive integers constrained such that∑ i≤p Ni≥ N
Given the number of samplesN to obtain and the cardinality of the support ofλ, samples to allocate to arms in the support ofλ are computed usingNi =⌈(N− 1 2 p)λi⌉, where N1, N2, . . . , Np are positive integers constrained such that∑ i≤p Ni≥ N. 16
-
[34]
Following the previous phase of the rounding procedure, loop until the discrepancy (∑ i≤p Ni)− N = 0, from either decreasing a sample countNj which obtains (Nj− 1)/λj = mini≤p(Ni− 1)/λi to Nj + 1, or increasing a sample countNj which obtains Nj/λj = maxi≤p Ni/λi to Nj− 1. The efficient design apportionment theorem in Section 12.5 of [25] provides the founda...
-
[35]
Because (1− cos(α))2≈ α4/4 and these inequalities must hold for allj = 1,
this hypothesis test requiresNj + Nj+d/2≥ c log(1/δ) (1−cos(α))2 for some universal constant c > 0. Because (1− cos(α))2≈ α4/4 and these inequalities must hold for allj = 1, . . . , d/2 simultaneously for the single static allocation, we obtain the result. 18 E Proof of Lemma 1 Proof. ρ(Y) = min λ∈ |X| max y∈Y ∥y∥2 (∑ x∈Xλxxx⊤)−1 = 1 γ2 Y min λ∈ |X| max y...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.