Recurrent TD3 with separate LSTM actor-critic networks delivers substantially stronger and more stable chemotherapy control than feed-forward baselines under partial observability on the AhnChemoEnv benchmark.
Dynamic treatment regimes,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Recurrent Deep Reinforcement Learning for Chemotherapy Control under Partial Observability
Recurrent TD3 with separate LSTM actor-critic networks delivers substantially stronger and more stable chemotherapy control than feed-forward baselines under partial observability on the AhnChemoEnv benchmark.