pith. the verified trust layer for science. sign in

Supervised fine tuning on curated data is reinforcement learning (and can be improved)

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

fields

cs.LG 4 cs.SE 1

years

2026 3 2025 2

roles

background 2

polarities

background 2

representative citing papers

Rotation-Preserving Supervised Fine-Tuning

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.

Proximal Supervised Fine-Tuning

cs.LG · 2025-08-25 · unverdicted · novelty 5.0

PSFT modifies supervised fine-tuning by incorporating trust-region ideas from RL to constrain policy changes, yielding better out-of-domain generalization in math and human-value tasks without entropy collapse.

citing papers explorer

Showing 5 of 5 citing papers.