Title resolution pending

Christopher J. C. H. Watkins, Peter Dayan · 1992 · Machine Learning · DOI 10.1007/bf00992698

18 Pith papers cite this work, alongside 7,436 external citations. Polarity classification is still indexing.

18 Pith papers citing it

7,436 external citations · Crossref

open at publisher browse 18 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 3 baseline 1

citation-polarity summary

background 2 baseline 1 support 1

representative citing papers

Heavy-Ball Q-Learning with Residual Weighting Correction

cs.LG · 2026-06-25 · unverdicted · novelty 7.0

Corrected heavy-ball Q-learning with convergence and acceleration guarantees is derived via switched linear system and joint spectral radius analysis, extended to linear function approximation.

Generating Local Shields for Decentralised Partially Observable Markov Decision Processes

cs.MA · 2026-04-08 · unverdicted · novelty 7.0

A process algebra with guarded choice and recursion is compiled to global and then projected local Mealy machines that filter safe joint actions for each agent in Dec-POMDPs using belief-style state subsets.

Causal Process Models: Reframing Dynamic Causal Graph Discovery as a Reinforcement Learning Problem

cs.LG · 2025-07-18 · unverdicted · novelty 7.0

Causal Process Models reframe dynamic causal graph discovery as multi-agent reinforcement learning to build sparse time-varying graphs only at active interactions, outperforming dense baselines on physical prediction.

Variational Sequential Optimal Experimental Design using Reinforcement Learning

stat.ML · 2023-06-17 · unverdicted · novelty 7.0

vsOED uses a variational one-point reward and RL policy optimization to provide a lower bound on expected information gain for sequential experimental design, supporting nuisance parameters, implicit likelihoods, and multiple design goals.

Curvature-Adaptive Consistency Flow Matching: Autonomous Trajectory Optimization via Reinforcement Learning

cs.CV · 2026-06-21 · unverdicted · novelty 6.0

CACFM applies RL to adaptively select critical regions in probability flow ODE trajectories for consistency distillation, yielding SOTA few-step results on FLUX and SDXL.

Geometrically Averaged Hard Target Updates for Linear Q-Learning

cs.LG · 2026-06-09 · unverdicted · novelty 6.0

Introduces and analyzes the λ-target update for linear Q-learning via geometric averaging of periodic target maps, studied with a switching-system model in the deterministic case.

Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation

math.NA · 2026-06-09 · unverdicted · novelty 6.0

Dmsh is a new multi-agent RL framework that formulates mesh generation as an MDP and uses three coordinated agents plus curriculum learning to produce globally conforming all-quad meshes without post-processing.

Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Feather uses reinforcement learning and a Chunked Hash Tree to balance batch size against prefix homogeneity in LLM inference, delivering 2-10x higher throughput than existing schedulers.

Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.

Quantum Hierarchical Reinforcement Learning via Variational Quantum Circuits

cs.LG · 2026-05-05 · unverdicted · novelty 6.0

Hybrid agent with variational quantum circuits for feature extraction in hierarchical RL outperforms classical baselines with 66% parameter savings, but quantum value estimation degrades results.

Answer-Set-Programming-based Abstractions for Reinforcement Learning

cs.AI · 2026-05-29 · unverdicted · novelty 5.0

An ASP-based implementation of CARCASS abstractions is created and evaluated for RL on two domains.

Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

A bound on OOD test performance in POMDPs decomposes loss into approximation and estimation errors, indicating that smaller abstract state spaces improve generalization in RL agents.

Artifacts as Memory Beyond the Agent Boundary

cs.AI · 2026-04-09 · unverdicted · novelty 5.0

Artifacts in the environment can reduce the memory an RL agent needs to represent its history, as shown by a mathematical proof and experiments with spatial paths.

The Invisible Handshake: Persistent Overpricing by Adaptive Market Agents

q-fin.TR · 2025-10-14 · unverdicted · novelty 5.0

In a repeated market-maker/taker game with endogenous price impact, projected stochastic gradient ascent by adaptive agents reaches a region of persistent overpricing in finite time.

Optimal sequential decision-making for error propagation mitigation in digital twins

cs.LG · 2026-04-24 · unverdicted · novelty 4.0

Error propagation mitigation in digital twins is cast as an MDP/POMDP with HMM-derived regimes as states, where the MDP policy maximizes reward and the POMDP recovers 95% of that performance.

Reinforcement Learning for Robotic Time-optimal Path Tracking Using Prior Knowledge

cs.RO · 2019-06-30 · unverdicted · novelty 3.0

An improved Q-learning algorithm with a modified action-value function and reward-penalty scheme generates time-optimal robot trajectories that respect velocity-dependent piecewise-linear torque constraints.

Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers

math.OC · 2026-04-13 · unverdicted · novelty 2.0

A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.

Negative Ontology of True Target for Machine Learning: Towards Evaluation and Learning under Democratic Supervision

cs.LG · 2026-04-27 · 3 refs

citing papers explorer

Showing 18 of 18 citing papers.

Heavy-Ball Q-Learning with Residual Weighting Correction cs.LG · 2026-06-25 · unverdicted · none · ref 1
Corrected heavy-ball Q-learning with convergence and acceleration guarantees is derived via switched linear system and joint spectral radius analysis, extended to linear function approximation.
Generating Local Shields for Decentralised Partially Observable Markov Decision Processes cs.MA · 2026-04-08 · unverdicted · none · ref 12
A process algebra with guarded choice and recursion is compiled to global and then projected local Mealy machines that filter safe joint actions for each agent in Dec-POMDPs using belief-style state subsets.
Causal Process Models: Reframing Dynamic Causal Graph Discovery as a Reinforcement Learning Problem cs.LG · 2025-07-18 · unverdicted · none · ref 25
Causal Process Models reframe dynamic causal graph discovery as multi-agent reinforcement learning to build sparse time-varying graphs only at active interactions, outperforming dense baselines on physical prediction.
Variational Sequential Optimal Experimental Design using Reinforcement Learning stat.ML · 2023-06-17 · unverdicted · none · ref 60
vsOED uses a variational one-point reward and RL policy optimization to provide a lower bound on expected information gain for sequential experimental design, supporting nuisance parameters, implicit likelihoods, and multiple design goals.
Curvature-Adaptive Consistency Flow Matching: Autonomous Trajectory Optimization via Reinforcement Learning cs.CV · 2026-06-21 · unverdicted · none · ref 60
CACFM applies RL to adaptively select critical regions in probability flow ODE trajectories for consistency distillation, yielding SOTA few-step results on FLUX and SDXL.
Geometrically Averaged Hard Target Updates for Linear Q-Learning cs.LG · 2026-06-09 · unverdicted · none · ref 36
Introduces and analyzes the λ-target update for linear Q-learning via geometric averaging of periodic target maps, studied with a switching-system model in the deterministic case.
Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation math.NA · 2026-06-09 · unverdicted · none · ref 24
Dmsh is a new multi-agent RL framework that formulates mesh generation as an MDP and uses three coordinated agents plus curriculum learning to produce globally conforming all-quad meshes without post-processing.
Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference cs.LG · 2026-05-07 · unverdicted · none · ref 37
Feather uses reinforcement learning and a Chunked Hash Tree to balance batch size against prefix homogeneity in LLM inference, delivering 2-10x higher throughput than existing schedulers.
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities cs.AI · 2026-05-07 · unverdicted · none · ref 48 · 2 links
LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.
Quantum Hierarchical Reinforcement Learning via Variational Quantum Circuits cs.LG · 2026-05-05 · unverdicted · none · ref 17
Hybrid agent with variational quantum circuits for feature extraction in hierarchical RL outperforms classical baselines with 66% parameter savings, but quantum value estimation degrades results.
Answer-Set-Programming-based Abstractions for Reinforcement Learning cs.AI · 2026-05-29 · unverdicted · none · ref 15
An ASP-based implementation of CARCASS abstractions is created and evaluated for RL on two domains.
Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning cs.LG · 2026-05-19 · unverdicted · none · ref 15
A bound on OOD test performance in POMDPs decomposes loss into approximation and estimation errors, indicating that smaller abstract state spaces improve generalization in RL agents.
Artifacts as Memory Beyond the Agent Boundary cs.AI · 2026-04-09 · unverdicted · none · ref 69
Artifacts in the environment can reduce the memory an RL agent needs to represent its history, as shown by a mathematical proof and experiments with spatial paths.
The Invisible Handshake: Persistent Overpricing by Adaptive Market Agents q-fin.TR · 2025-10-14 · unverdicted · none · ref 55
In a repeated market-maker/taker game with endogenous price impact, projected stochastic gradient ascent by adaptive agents reaches a region of persistent overpricing in finite time.
Optimal sequential decision-making for error propagation mitigation in digital twins cs.LG · 2026-04-24 · unverdicted · none · ref 23
Error propagation mitigation in digital twins is cast as an MDP/POMDP with HMM-derived regimes as states, where the MDP policy maximizes reward and the POMDP recovers 95% of that performance.
Reinforcement Learning for Robotic Time-optimal Path Tracking Using Prior Knowledge cs.RO · 2019-06-30 · unverdicted · none · ref 42
An improved Q-learning algorithm with a modified action-value function and reward-penalty scheme generates time-optimal robot trajectories that respect velocity-dependent piecewise-linear torque constraints.
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers math.OC · 2026-04-13 · unverdicted · none · ref 125
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
Negative Ontology of True Target for Machine Learning: Towards Evaluation and Learning under Democratic Supervision cs.LG · 2026-04-27 · unreviewed · ref 73 · 3 links

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer