Corrected heavy-ball Q-learning with convergence and acceleration guarantees is derived via switched linear system and joint spectral radius analysis, extended to linear function approximation.
Title resolution pending
18 Pith papers cite this work, alongside 7,436 external citations. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
A process algebra with guarded choice and recursion is compiled to global and then projected local Mealy machines that filter safe joint actions for each agent in Dec-POMDPs using belief-style state subsets.
Causal Process Models reframe dynamic causal graph discovery as multi-agent reinforcement learning to build sparse time-varying graphs only at active interactions, outperforming dense baselines on physical prediction.
vsOED uses a variational one-point reward and RL policy optimization to provide a lower bound on expected information gain for sequential experimental design, supporting nuisance parameters, implicit likelihoods, and multiple design goals.
CACFM applies RL to adaptively select critical regions in probability flow ODE trajectories for consistency distillation, yielding SOTA few-step results on FLUX and SDXL.
Introduces and analyzes the λ-target update for linear Q-learning via geometric averaging of periodic target maps, studied with a switching-system model in the deterministic case.
Dmsh is a new multi-agent RL framework that formulates mesh generation as an MDP and uses three coordinated agents plus curriculum learning to produce globally conforming all-quad meshes without post-processing.
Feather uses reinforcement learning and a Chunked Hash Tree to balance batch size against prefix homogeneity in LLM inference, delivering 2-10x higher throughput than existing schedulers.
LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.
Hybrid agent with variational quantum circuits for feature extraction in hierarchical RL outperforms classical baselines with 66% parameter savings, but quantum value estimation degrades results.
An ASP-based implementation of CARCASS abstractions is created and evaluated for RL on two domains.
A bound on OOD test performance in POMDPs decomposes loss into approximation and estimation errors, indicating that smaller abstract state spaces improve generalization in RL agents.
Artifacts in the environment can reduce the memory an RL agent needs to represent its history, as shown by a mathematical proof and experiments with spatial paths.
In a repeated market-maker/taker game with endogenous price impact, projected stochastic gradient ascent by adaptive agents reaches a region of persistent overpricing in finite time.
Error propagation mitigation in digital twins is cast as an MDP/POMDP with HMM-derived regimes as states, where the MDP policy maximizes reward and the POMDP recovers 95% of that performance.
An improved Q-learning algorithm with a modified action-value function and reward-penalty scheme generates time-optimal robot trajectories that respect velocity-dependent piecewise-linear torque constraints.
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
citing papers explorer
-
Heavy-Ball Q-Learning with Residual Weighting Correction
Corrected heavy-ball Q-learning with convergence and acceleration guarantees is derived via switched linear system and joint spectral radius analysis, extended to linear function approximation.
-
Generating Local Shields for Decentralised Partially Observable Markov Decision Processes
A process algebra with guarded choice and recursion is compiled to global and then projected local Mealy machines that filter safe joint actions for each agent in Dec-POMDPs using belief-style state subsets.
-
Causal Process Models: Reframing Dynamic Causal Graph Discovery as a Reinforcement Learning Problem
Causal Process Models reframe dynamic causal graph discovery as multi-agent reinforcement learning to build sparse time-varying graphs only at active interactions, outperforming dense baselines on physical prediction.
-
Variational Sequential Optimal Experimental Design using Reinforcement Learning
vsOED uses a variational one-point reward and RL policy optimization to provide a lower bound on expected information gain for sequential experimental design, supporting nuisance parameters, implicit likelihoods, and multiple design goals.
-
Curvature-Adaptive Consistency Flow Matching: Autonomous Trajectory Optimization via Reinforcement Learning
CACFM applies RL to adaptively select critical regions in probability flow ODE trajectories for consistency distillation, yielding SOTA few-step results on FLUX and SDXL.
-
Geometrically Averaged Hard Target Updates for Linear Q-Learning
Introduces and analyzes the λ-target update for linear Q-learning via geometric averaging of periodic target maps, studied with a switching-system model in the deterministic case.
-
Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation
Dmsh is a new multi-agent RL framework that formulates mesh generation as an MDP and uses three coordinated agents plus curriculum learning to produce globally conforming all-quad meshes without post-processing.
-
Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference
Feather uses reinforcement learning and a Chunked Hash Tree to balance batch size against prefix homogeneity in LLM inference, delivering 2-10x higher throughput than existing schedulers.
-
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities
LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.
-
Quantum Hierarchical Reinforcement Learning via Variational Quantum Circuits
Hybrid agent with variational quantum circuits for feature extraction in hierarchical RL outperforms classical baselines with 66% parameter savings, but quantum value estimation degrades results.
-
Answer-Set-Programming-based Abstractions for Reinforcement Learning
An ASP-based implementation of CARCASS abstractions is created and evaluated for RL on two domains.
-
Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning
A bound on OOD test performance in POMDPs decomposes loss into approximation and estimation errors, indicating that smaller abstract state spaces improve generalization in RL agents.
-
Artifacts as Memory Beyond the Agent Boundary
Artifacts in the environment can reduce the memory an RL agent needs to represent its history, as shown by a mathematical proof and experiments with spatial paths.
-
The Invisible Handshake: Persistent Overpricing by Adaptive Market Agents
In a repeated market-maker/taker game with endogenous price impact, projected stochastic gradient ascent by adaptive agents reaches a region of persistent overpricing in finite time.
-
Optimal sequential decision-making for error propagation mitigation in digital twins
Error propagation mitigation in digital twins is cast as an MDP/POMDP with HMM-derived regimes as states, where the MDP policy maximizes reward and the POMDP recovers 95% of that performance.
-
Reinforcement Learning for Robotic Time-optimal Path Tracking Using Prior Knowledge
An improved Q-learning algorithm with a modified action-value function and reward-penalty scheme generates time-optimal robot trajectories that respect velocity-dependent piecewise-linear torque constraints.
-
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
- Negative Ontology of True Target for Machine Learning: Towards Evaluation and Learning under Democratic Supervision