Resampling clinical time series into uniform bins for offline RL reduces performance by up to 60% and causes retrospective evaluations to overestimate returns by 1.5-3x versus unprocessed data.
A Markovian decision process.Jour- nal of mathematics and mechanics, 6(5):679–684
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
Neural mean-field games integrate mean-field game theory with neural SDEs to learn strategic interactions from data in a model-free way, demonstrated on games and viral dynamics.
A three-stage training pipeline internalizes world-model simulation and success estimation in LLM agents for improved planning on search and math tasks.
citing papers explorer
-
Internalizing the Future: A Unified Agentic Training Paradigm for World Model Planning
A three-stage training pipeline internalizes world-model simulation and success estimation in LLM agents for improved planning on search and math tasks.