Without the need for gradient computation or optimizer states for the large language model, the peak memory footprint is drastically reduced
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it