A Markovian decision process.Jour- nal of mathematics and mechanics, 6(5):679–684

Richard Bellman · 1957 · DOI 10.1512/iumj.1957.6.56038

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

The hidden risks of temporal resampling in clinical reinforcement learning

cs.LG · 2026-02-06 · conditional · novelty 6.0

Resampling clinical time series into uniform bins for offline RL reduces performance by up to 60% and causes retrospective evaluations to overestimate returns by 1.5-3x versus unprocessed data.

Neural Mean-Field Games: Extending Mean-Field Game Theory with Neural Stochastic Differential Equations

cs.LG · 2025-04-17 · unverdicted · novelty 6.0

Neural mean-field games integrate mean-field game theory with neural SDEs to learn strategic interactions from data in a model-free way, demonstrated on games and viral dynamics.

Internalizing the Future: A Unified Agentic Training Paradigm for World Model Planning

cs.AI · 2026-06-25 · unverdicted · novelty 5.0

A three-stage training pipeline internalizes world-model simulation and success estimation in LLM agents for improved planning on search and math tasks.

citing papers explorer

Showing 3 of 3 citing papers.

The hidden risks of temporal resampling in clinical reinforcement learning cs.LG · 2026-02-06 · conditional · none · ref 27
Resampling clinical time series into uniform bins for offline RL reduces performance by up to 60% and causes retrospective evaluations to overestimate returns by 1.5-3x versus unprocessed data.
Neural Mean-Field Games: Extending Mean-Field Game Theory with Neural Stochastic Differential Equations cs.LG · 2025-04-17 · unverdicted · none · ref 19
Neural mean-field games integrate mean-field game theory with neural SDEs to learn strategic interactions from data in a model-free way, demonstrated on games and viral dynamics.
Internalizing the Future: A Unified Agentic Training Paradigm for World Model Planning cs.AI · 2026-06-25 · unverdicted · none · ref 19
A three-stage training pipeline internalizes world-model simulation and success estimation in LLM agents for improved planning on search and math tasks.

A Markovian decision process.Jour- nal of mathematics and mechanics, 6(5):679–684

fields

years

verdicts

representative citing papers

citing papers explorer