Title resolution pending

· 2013 · DOI 10.1177/0278364913495721

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

On-line Learning in Tree MDPs by Treating Policies as Bandit Arms

cs.AI · 2026-05-06 · unverdicted · novelty 7.0

Bandit algorithms can be adapted to Tree MDPs by treating policies as arms with shared-data confidence bounds, achieving polynomial memory and instance-dependent bounds on sample complexity and regret that depend on terminal-state gaps rather than all policies.

SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning

cs.LG · 2026-04-10 · unverdicted · novelty 7.0

SafeAdapt certifies a Rashomon set of safe policies from demonstration data and projects updates from arbitrary RL algorithms onto it to guarantee preservation of safety on source tasks.

Reinforcement Learning for Robotic Time-optimal Path Tracking Using Prior Knowledge

cs.RO · 2019-06-30 · unverdicted · novelty 3.0

An improved Q-learning algorithm with a modified action-value function and reward-penalty scheme generates time-optimal robot trajectories that respect velocity-dependent piecewise-linear torque constraints.

citing papers explorer

Showing 3 of 3 citing papers.

On-line Learning in Tree MDPs by Treating Policies as Bandit Arms cs.AI · 2026-05-06 · unverdicted · none · ref 29
Bandit algorithms can be adapted to Tree MDPs by treating policies as arms with shared-data confidence bounds, achieving polynomial memory and instance-dependent bounds on sample complexity and regret that depend on terminal-state gaps rather than all policies.
SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning cs.LG · 2026-04-10 · unverdicted · none · ref 24
SafeAdapt certifies a Rashomon set of safe policies from demonstration data and projects updates from arbitrary RL algorithms onto it to guarantee preservation of safety on source tasks.
Reinforcement Learning for Robotic Time-optimal Path Tracking Using Prior Knowledge cs.RO · 2019-06-30 · unverdicted · none · ref 39
An improved Q-learning algorithm with a modified action-value function and reward-penalty scheme generates time-optimal robot trajectories that respect velocity-dependent piecewise-linear torque constraints.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer