Without the need for gradient computation or optimizer states for the large language model, the peak memory footprint is drastically reduced

Offline Trajectory Generation (Inference-Only):Sampling training data requires an upfront compute investment but is strictly a forward-pass operation on a frozen backbone

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it