Corrected heavy-ball Q-learning with convergence and acceleration guarantees is derived via switched linear system and joint spectral radius analysis, extended to linear function approximation.
Title resolution pending
18 Pith papers cite this work, alongside 7,436 external citations. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
A process algebra with guarded choice and recursion is compiled to global and then projected local Mealy machines that filter safe joint actions for each agent in Dec-POMDPs using belief-style state subsets.
Causal Process Models reframe dynamic causal graph discovery as multi-agent reinforcement learning to build sparse time-varying graphs only at active interactions, outperforming dense baselines on physical prediction.
vsOED uses a variational one-point reward and RL policy optimization to provide a lower bound on expected information gain for sequential experimental design, supporting nuisance parameters, implicit likelihoods, and multiple design goals.
CACFM applies RL to adaptively select critical regions in probability flow ODE trajectories for consistency distillation, yielding SOTA few-step results on FLUX and SDXL.
Introduces and analyzes the λ-target update for linear Q-learning via geometric averaging of periodic target maps, studied with a switching-system model in the deterministic case.
Dmsh is a new multi-agent RL framework that formulates mesh generation as an MDP and uses three coordinated agents plus curriculum learning to produce globally conforming all-quad meshes without post-processing.
Feather uses reinforcement learning and a Chunked Hash Tree to balance batch size against prefix homogeneity in LLM inference, delivering 2-10x higher throughput than existing schedulers.
LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.
Hybrid agent with variational quantum circuits for feature extraction in hierarchical RL outperforms classical baselines with 66% parameter savings, but quantum value estimation degrades results.
An ASP-based implementation of CARCASS abstractions is created and evaluated for RL on two domains.
A bound on OOD test performance in POMDPs decomposes loss into approximation and estimation errors, indicating that smaller abstract state spaces improve generalization in RL agents.
Artifacts in the environment can reduce the memory an RL agent needs to represent its history, as shown by a mathematical proof and experiments with spatial paths.
In a repeated market-maker/taker game with endogenous price impact, projected stochastic gradient ascent by adaptive agents reaches a region of persistent overpricing in finite time.
Error propagation mitigation in digital twins is cast as an MDP/POMDP with HMM-derived regimes as states, where the MDP policy maximizes reward and the POMDP recovers 95% of that performance.
An improved Q-learning algorithm with a modified action-value function and reward-penalty scheme generates time-optimal robot trajectories that respect velocity-dependent piecewise-linear torque constraints.
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
citing papers explorer
-
Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation
Dmsh is a new multi-agent RL framework that formulates mesh generation as an MDP and uses three coordinated agents plus curriculum learning to produce globally conforming all-quad meshes without post-processing.