A generalization of differential TD extends it to episodic settings while preserving policy ordering, inheriting linear TD guarantees, and improving sample efficiency.
Scaling off-policy reinforcement learning with batch and weight normalization
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
LoRA applied to critics in SAC and FastTD3 reduces critic loss and yields best or competitive policy performance on most evaluated tasks.
citing papers explorer
-
Extending Differential Temporal Difference Methods for Episodic Problems
A generalization of differential TD extends it to episodic settings while preserving policy ordering, inheriting linear TD guarantees, and improving sample efficiency.
-
Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning
LoRA applied to critics in SAC and FastTD3 reduces critic loss and yields best or competitive policy performance on most evaluated tasks.