MCTS discovers superior data encoding circuits for QCCNNs that outperform standard encodings on medical datasets, with effective rank of feature maps serving as a performance predictor.
hub Mixed citations
Finite-time analysis of the multiarmed bandit problem.Mach
Mixed citation behavior. Most common role is background (40%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Bandit algorithms can be adapted to Tree MDPs by treating policies as arms with shared-data confidence bounds, achieving polynomial memory and instance-dependent bounds on sample complexity and regret that depend on terminal-state gaps rather than all policies.
A generic conversion turns offline local search algorithms into online stochastic combinatorial bandit algorithms with O(log^3 T) approximate regret.
Proposes APUB optimization framework for stochastic programming, proves asymptotic correctness and consistency of the new bound, and develops bootstrap and L-shaped solvers for two-stage linear problems with empirical tests on a product mix example.
Backward attribution is reframed as integrals over trajectories in a two-player game on the network, unifying gradients and alpha-beta-LRP while enabling new adaptations that outperform prior methods on ViT-B/16 localization metrics.
Feather uses reinforcement learning and a Chunked Hash Tree to balance batch size against prefix homogeneity in LLM inference, delivering 2-10x higher throughput than existing schedulers.
An LLM-driven agentic system evolves microarchitectural policies for cache replacement, data prefetching, and branch prediction, producing designs that match or exceed prior state-of-the-art in IPC on standard benchmarks.
ProEval is a proactive framework using pre-trained GPs, Bayesian quadrature, and superlevel set sampling to estimate performance and find failures in generative AI with 8-65x fewer samples than baselines.
A neuron-astrocyte network with dual-timescale memory reduces median path lengths up to sixfold in partially observable grid-world navigation tasks.
A discrete denoising diffusion model learns from probing histories to generate promising beam candidates, yielding better SNR, lower beam-miss probability, and reduced probe regret than baselines under tight probing budgets.
APIKG4Syn synthesizes API-oriented training data via knowledge graphs and Monte Carlo search to fine-tune a 7B model that reaches 25% pass@1 on HarmonyOS code generation, beating untuned GPT-4o at 17.59%.
An optimistic confidence-interval ranking procedure for best-arm identification across multiple independent bandits yields lower average simple regret and error probability than prior methods when selecting high-performing agents for each game in GVGAI and Ludii.
APEX maintains an explicit strategy space via a DAG with fork discovery and policy selection to sustain exploration in self-evolving LLM agents and reports outperformance on Jericho games and WebArena.
RIE-Greedy uses stochasticity from cross-validation regularization to induce Thompson Sampling-like exploration, claimed equivalent in the two-armed case and empirically competitive in large-scale settings.
Adaptive GLM with MQLE and GP regression with UCB for dynamic insurance pricing, showing parameter convergence and regret analysis under delayed claims.
Augmenting model-based RL agents with calibrated predictive uncertainties improves planning, sample efficiency, and exploration on continuous control tasks.
citing papers explorer
-
Discovering Data Encoding Strategies for Quantum-Classical Neural Networks Using Monte Carlo Tree Search
MCTS discovers superior data encoding circuits for QCCNNs that outperform standard encodings on medical datasets, with effective rank of feature maps serving as a performance predictor.
-
On-line Learning in Tree MDPs by Treating Policies as Bandit Arms
Bandit algorithms can be adapted to Tree MDPs by treating policies as arms with shared-data confidence bounds, achieving polynomial memory and instance-dependent bounds on sample complexity and regret that depend on terminal-state gaps rather than all policies.
-
Offline Local Search for Online Stochastic Bandits
A generic conversion turns offline local search algorithms into online stochastic combinatorial bandit algorithms with O(log^3 T) approximate regret.
-
Minimizing Upper Confidence Bounds: A Data-Driven Framework for Stochastic Programming
Proposes APUB optimization framework for stochastic programming, proves asymptotic correctness and consistency of the new bound, and develops bootstrap and L-shaped solvers for two-stage linear problems with empirical tests on a product mix example.
-
Playing the network backward: A Game Theoretic Attribution Framework
Backward attribution is reframed as integrals over trajectories in a two-player game on the network, unifying gradients and alpha-beta-LRP while enabling new adaptations that outperform prior methods on ViT-B/16 localization metrics.
-
Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference
Feather uses reinforcement learning and a Chunked Hash Tree to balance batch size against prefix homogeneity in LLM inference, delivering 2-10x higher throughput than existing schedulers.
-
Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization
An LLM-driven agentic system evolves microarchitectural policies for cache replacement, data prefetching, and branch prediction, producing designs that match or exceed prior state-of-the-art in IPC on standard benchmarks.
-
ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation
ProEval is a proactive framework using pre-trained GPs, Bayesian quadrature, and superlevel set sampling to estimate performance and find failures in generative AI with 8-65x fewer samples than baselines.
-
Dual-Timescale Memory in a Spiking Neuron-Astrocyte Network for Efficient Navigation
A neuron-astrocyte network with dual-timescale memory reduces median path lengths up to sixfold in partially observable grid-world navigation tasks.
-
Discrete Diffusion for Codebook-Based Beam Candidate Generation
A discrete denoising diffusion model learns from probing histories to generate promising beam candidates, yielding better SNR, lower beam-miss probability, and reduced probe regret than baselines under tight probing budgets.
-
Knowledge-Graph-Driven Data Synthesis for Low-Resource Software Development: A HarmonyOS Case Study
APIKG4Syn synthesizes API-oriented training data via knowledge graphs and Monte Carlo search to fine-tune a 7B model that reaches 25% pass@1 on HarmonyOS code generation, beating untuned GPT-4o at 17.59%.
-
Best Agent Identification for General Game Playing
An optimistic confidence-interval ranking procedure for best-arm identification across multiple independent bandits yields lower average simple regret and error probability than prior methods when selecting high-performing agents for each game in GVGAI and Ludii.
-
APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents
APEX maintains an explicit strategy space via a DAG with fork discovery and policy selection to sustain exploration in self-evolving LLM agents and reports outperformance on Jericho games and WebArena.
-
RIE-Greedy: Regularization-Induced Exploration for Contextual Bandits
RIE-Greedy uses stochasticity from cross-validation regularization to induce Thompson Sampling-like exploration, claimed equivalent in the two-armed case and empirically competitive in large-scale settings.
-
Adaptive Pricing in Insurance: Generalized Linear Models and Gaussian Process Regression Approaches
Adaptive GLM with MQLE and GP regression with UCB for dynamic insurance pricing, showing parameter convergence and regret analysis under delayed claims.
-
Calibrated Model-Based Deep Reinforcement Learning
Augmenting model-based RL agents with calibrated predictive uncertainties improves planning, sample efficiency, and exploration on continuous control tasks.