Title resolution pending

doi: 10 · 1988 · DOI 10.1007/bf00115009

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open at publisher browse 6 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation

cs.RO · 2026-05-12 · conditional · novelty 7.0

A liveness-based Bellman operator enables conservative offline policy evaluation for manipulation tasks by encoding task progression and reducing truncation bias from finite horizons.

A Meta Reinforcement Learning Approach to Goals-Based Wealth Management

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.

DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions

cs.LG · 2025-09-23 · unverdicted · novelty 6.0

DAWM introduces a modular diffusion world model with an inverse dynamics model to produce complete synthetic transitions that improve conservative offline RL algorithms like TD3BC and IQL on D4RL tasks.

PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents

cs.CL · 2026-05-08 · unverdicted · novelty 5.0

An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.

Revisiting Adam for Streaming Reinforcement Learning

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

C51 matches StreamQ in streaming RL on 55 Atari games while a new Adaptive Q(λ) algorithm based on bounded derivatives and variance-adjusted updates reaches nearly double the human baseline.

Twice Sequential Monte Carlo for Tree Search

cs.LG · 2025-11-18 · unverdicted · novelty 5.0

TSMCTS applies Sequential Monte Carlo in two stages for tree search, claiming better performance, favorable scaling with depth, lower variance, and reduced path degeneracy than SMC and modern MCTS baselines across discrete and continuous environments.

citing papers explorer

Showing 6 of 6 citing papers.

Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation cs.RO · 2026-05-12 · conditional · none · ref 27
A liveness-based Bellman operator enables conservative offline policy evaluation for manipulation tasks by encoding task progression and reducing truncation bias from finite horizons.
A Meta Reinforcement Learning Approach to Goals-Based Wealth Management cs.LG · 2026-05-04 · unverdicted · none · ref 74
MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.
DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions cs.LG · 2025-09-23 · unverdicted · none · ref 25
DAWM introduces a modular diffusion world model with an inverse dynamics model to produce complete synthetic transitions that improve conservative offline RL algorithms like TD3BC and IQL on D4RL tasks.
PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents cs.CL · 2026-05-08 · unverdicted · none · ref 27
An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.
Revisiting Adam for Streaming Reinforcement Learning cs.LG · 2026-05-07 · unverdicted · none · ref 32
C51 matches StreamQ in streaming RL on 55 Atari games while a new Adaptive Q(λ) algorithm based on bounded derivatives and variance-adjusted updates reaches nearly double the human baseline.
Twice Sequential Monte Carlo for Tree Search cs.LG · 2025-11-18 · unverdicted · none · ref 12
TSMCTS applies Sequential Monte Carlo in two stages for tree search, claiming better performance, favorable scaling with depth, lower variance, and reduced path degeneracy than SMC and modern MCTS baselines across discrete and continuous environments.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer