hub Mixed citations

Finite-time analysis of the multiarmed bandit problem.Mach

Peter Auer, Nicolò Cesa-Bianchi, Paul Fischer · 2002 · Machine Learning · DOI 10.1023/a:1013689704352

Mixed citation behavior. Most common role is background (40%).

16 Pith papers citing it

3,730 external citations · Crossref

Background 40% of classified citations

open at publisher browse 16 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 2 method 2 dataset 1

citation-polarity summary

background 2 use method 2 use dataset 1

representative citing papers

Discovering Data Encoding Strategies for Quantum-Classical Neural Networks Using Monte Carlo Tree Search

quant-ph · 2026-05-18 · conditional · novelty 7.0

MCTS discovers superior data encoding circuits for QCCNNs that outperform standard encodings on medical datasets, with effective rank of feature maps serving as a performance predictor.

On-line Learning in Tree MDPs by Treating Policies as Bandit Arms

cs.AI · 2026-05-06 · unverdicted · novelty 7.0

Bandit algorithms can be adapted to Tree MDPs by treating policies as arms with shared-data confidence bounds, achieving polynomial memory and instance-dependent bounds on sample complexity and regret that depend on terminal-state gaps rather than all policies.

Offline Local Search for Online Stochastic Bandits

cs.LG · 2026-04-10 · unverdicted · novelty 7.0

A generic conversion turns offline local search algorithms into online stochastic combinatorial bandit algorithms with O(log^3 T) approximate regret.

Minimizing Upper Confidence Bounds: A Data-Driven Framework for Stochastic Programming

math.OC · 2024-03-13 · unverdicted · novelty 7.0

Proposes APUB optimization framework for stochastic programming, proves asymptotic correctness and consistency of the new bound, and develops bootstrap and L-shaped solvers for two-stage linear problems with empirical tests on a product mix example.

Playing the network backward: A Game Theoretic Attribution Framework

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Backward attribution is reframed as integrals over trajectories in a two-player game on the network, unifying gradients and alpha-beta-LRP while enabling new adaptations that outperform prior methods on ViT-B/16 localization metrics.

Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Feather uses reinforcement learning and a Chunked Hash Tree to balance batch size against prefix homogeneity in LLM inference, delivering 2-10x higher throughput than existing schedulers.

Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization

cs.AI · 2026-04-28 · accept · novelty 6.0

An LLM-driven agentic system evolves microarchitectural policies for cache replacement, data prefetching, and branch prediction, producing designs that match or exceed prior state-of-the-art in IPC on standard benchmarks.

ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation

cs.LG · 2026-04-25 · unverdicted · novelty 6.0

ProEval is a proactive framework using pre-trained GPs, Bayesian quadrature, and superlevel set sampling to estimate performance and find failures in generative AI with 8-65x fewer samples than baselines.

Dual-Timescale Memory in a Spiking Neuron-Astrocyte Network for Efficient Navigation

q-bio.QM · 2026-04-16 · unverdicted · novelty 6.0

A neuron-astrocyte network with dual-timescale memory reduces median path lengths up to sixfold in partially observable grid-world navigation tasks.

Discrete Diffusion for Codebook-Based Beam Candidate Generation

eess.SP · 2026-04-09 · unverdicted · novelty 6.0

A discrete denoising diffusion model learns from probing histories to generate promising beam candidates, yielding better SNR, lower beam-miss probability, and reduced probe regret than baselines under tight probing budgets.

Knowledge-Graph-Driven Data Synthesis for Low-Resource Software Development: A HarmonyOS Case Study

cs.SE · 2025-11-29 · unverdicted · novelty 6.0

APIKG4Syn synthesizes API-oriented training data via knowledge graphs and Monte Carlo search to fine-tune a 7B model that reaches 25% pass@1 on HarmonyOS code generation, beating untuned GPT-4o at 17.59%.

Best Agent Identification for General Game Playing

cs.LG · 2025-07-01 · unverdicted · novelty 6.0

An optimistic confidence-interval ranking procedure for best-arm identification across multiple independent bandits yields lower average simple regret and error probability than prior methods when selecting high-performing agents for each game in GVGAI and Ludii.

APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

APEX maintains an explicit strategy space via a DAG with fork discovery and policy selection to sustain exploration in self-evolving LLM agents and reports outperformance on Jericho games and WebArena.

RIE-Greedy: Regularization-Induced Exploration for Contextual Bandits

stat.ML · 2026-03-11 · unverdicted · novelty 5.0

RIE-Greedy uses stochasticity from cross-validation regularization to induce Thompson Sampling-like exploration, claimed equivalent in the two-armed case and empirically competitive in large-scale settings.

Adaptive Pricing in Insurance: Generalized Linear Models and Gaussian Process Regression Approaches

econ.EM · 2019-07-02 · unverdicted · novelty 5.0

Adaptive GLM with MQLE and GP regression with UCB for dynamic insurance pricing, showing parameter convergence and regret analysis under delayed claims.

Calibrated Model-Based Deep Reinforcement Learning

cs.LG · 2019-06-19 · unverdicted · novelty 5.0

Augmenting model-based RL agents with calibrated predictive uncertainties improves planning, sample efficiency, and exploration on continuous control tasks.

citing papers explorer

Showing 16 of 16 citing papers.

Discovering Data Encoding Strategies for Quantum-Classical Neural Networks Using Monte Carlo Tree Search quant-ph · 2026-05-18 · conditional · none · ref 45
MCTS discovers superior data encoding circuits for QCCNNs that outperform standard encodings on medical datasets, with effective rank of feature maps serving as a performance predictor.
On-line Learning in Tree MDPs by Treating Policies as Bandit Arms cs.AI · 2026-05-06 · unverdicted · none · ref 2
Bandit algorithms can be adapted to Tree MDPs by treating policies as arms with shared-data confidence bounds, achieving polynomial memory and instance-dependent bounds on sample complexity and regret that depend on terminal-state gaps rather than all policies.
Offline Local Search for Online Stochastic Bandits cs.LG · 2026-04-10 · unverdicted · none · ref 5
A generic conversion turns offline local search algorithms into online stochastic combinatorial bandit algorithms with O(log^3 T) approximate regret.
Minimizing Upper Confidence Bounds: A Data-Driven Framework for Stochastic Programming math.OC · 2024-03-13 · unverdicted · none · ref 4
Proposes APUB optimization framework for stochastic programming, proves asymptotic correctness and consistency of the new bound, and develops bootstrap and L-shaped solvers for two-stage linear problems with empirical tests on a product mix example.
Playing the network backward: A Game Theoretic Attribution Framework cs.LG · 2026-05-07 · unverdicted · none · ref 2
Backward attribution is reframed as integrals over trajectories in a two-player game on the network, unifying gradients and alpha-beta-LRP while enabling new adaptations that outperform prior methods on ViT-B/16 localization metrics.
Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference cs.LG · 2026-05-07 · unverdicted · none · ref 5
Feather uses reinforcement learning and a Chunked Hash Tree to balance batch size against prefix homogeneity in LLM inference, delivering 2-10x higher throughput than existing schedulers.
Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization cs.AI · 2026-04-28 · accept · none · ref 6
An LLM-driven agentic system evolves microarchitectural policies for cache replacement, data prefetching, and branch prediction, producing designs that match or exceed prior state-of-the-art in IPC on standard benchmarks.
ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation cs.LG · 2026-04-25 · unverdicted · none · ref 8
ProEval is a proactive framework using pre-trained GPs, Bayesian quadrature, and superlevel set sampling to estimate performance and find failures in generative AI with 8-65x fewer samples than baselines.
Dual-Timescale Memory in a Spiking Neuron-Astrocyte Network for Efficient Navigation q-bio.QM · 2026-04-16 · unverdicted · none · ref 12
A neuron-astrocyte network with dual-timescale memory reduces median path lengths up to sixfold in partially observable grid-world navigation tasks.
Discrete Diffusion for Codebook-Based Beam Candidate Generation eess.SP · 2026-04-09 · unverdicted · none · ref 53
A discrete denoising diffusion model learns from probing histories to generate promising beam candidates, yielding better SNR, lower beam-miss probability, and reduced probe regret than baselines under tight probing budgets.
Knowledge-Graph-Driven Data Synthesis for Low-Resource Software Development: A HarmonyOS Case Study cs.SE · 2025-11-29 · unverdicted · none · ref 3
APIKG4Syn synthesizes API-oriented training data via knowledge graphs and Monte Carlo search to fine-tune a 7B model that reaches 25% pass@1 on HarmonyOS code generation, beating untuned GPT-4o at 17.59%.
Best Agent Identification for General Game Playing cs.LG · 2025-07-01 · unverdicted · none · ref 7
An optimistic confidence-interval ranking procedure for best-arm identification across multiple independent bandits yields lower average simple regret and error probability than prior methods when selecting high-performing agents for each game in GVGAI and Ludii.
APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents cs.LG · 2026-05-20 · unverdicted · none · ref 19
APEX maintains an explicit strategy space via a DAG with fork discovery and policy selection to sustain exploration in self-evolving LLM agents and reports outperformance on Jericho games and WebArena.
RIE-Greedy: Regularization-Induced Exploration for Contextual Bandits stat.ML · 2026-03-11 · unverdicted · none · ref 8
RIE-Greedy uses stochasticity from cross-validation regularization to induce Thompson Sampling-like exploration, claimed equivalent in the two-armed case and empirically competitive in large-scale settings.
Adaptive Pricing in Insurance: Generalized Linear Models and Gaussian Process Regression Approaches econ.EM · 2019-07-02 · unverdicted · none · ref 44
Adaptive GLM with MQLE and GP regression with UCB for dynamic insurance pricing, showing parameter convergence and regret analysis under delayed claims.
Calibrated Model-Based Deep Reinforcement Learning cs.LG · 2019-06-19 · unverdicted · none · ref 2
Augmenting model-based RL agents with calibrated predictive uncertainties improves planning, sample efficiency, and exploration on continuous control tasks.

Finite-time analysis of the multiarmed bandit problem.Mach

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer