pith. sign in

Efficient online reinforcement learning fine-tuning need not retain offline data.arXiv preprint arXiv:2412.07762

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

fields

cs.LG 5 cs.RO 3

years

2026 7 2025 1

representative citing papers

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.

Reinforcement Learning with Action Chunking

cs.LG · 2025-07-10 · unverdicted · novelty 6.0

Q-chunking improves offline-to-online RL sample efficiency on long-horizon sparse-reward manipulation tasks by applying action chunking to TD learning.

COOPO: Cyclic Offline-Online Policy Optimization Algorithm

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.

citing papers explorer

Showing 8 of 8 citing papers.