pith. sign in

Preserving diversity in supervised fine-tuning of large language models

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

background 2 baseline 1 method 1

citation-polarity summary

years

2026 6 2025 1

verdicts

UNVERDICTED 7

representative citing papers

Selective Off-Policy Reference Tuning with Plan Guidance

cs.AI · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

SORT turns all-wrong prompts into selective learning signals by weighting tokens more predictable under plan guidance from reference solutions, improving over GRPO on reasoning benchmarks especially for weaker models.

Annotations Mitigate Post-Training Mode Collapse

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

Annotation-anchored training reduces semantic diversity collapse in post-trained language models by a factor of six compared to standard supervised fine-tuning while preserving instruction-following and improving with scale.

Proximal Supervised Fine-Tuning

cs.LG · 2025-08-25 · unverdicted · novelty 5.0

PSFT modifies supervised fine-tuning by incorporating trust-region ideas from RL to constrain policy changes, yielding better out-of-domain generalization in math and human-value tasks without entropy collapse.

Agentic Reasoning for Large Language Models

cs.AI · 2026-01-18 · unverdicted · novelty 4.0

The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.

citing papers explorer

Showing 7 of 7 citing papers.

  • GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification cs.AI · 2026-04-15 · unverdicted · none · ref 2

    GFT uses group advantage learning and dynamic coefficient rectification to fix reward sparsity and optimization instability in SFT for LLMs, yielding better policies than standard SFT.

  • Selective Off-Policy Reference Tuning with Plan Guidance cs.AI · 2026-05-12 · unverdicted · none · ref 31 · 2 links

    SORT turns all-wrong prompts into selective learning signals by weighting tokens more predictable under plan guidance from reference solutions, improving over GRPO on reasoning benchmarks especially for weaker models.

  • Annotations Mitigate Post-Training Mode Collapse cs.CL · 2026-05-11 · unverdicted · none · ref 8

    Annotation-anchored training reduces semantic diversity collapse in post-trained language models by a factor of six compared to standard supervised fine-tuning while preserving instruction-following and improving with scale.

  • Self-Consolidating Language Models: Continual Knowledge Incorporation from Context cs.CL · 2026-05-08 · unverdicted · none · ref 59 · 2 links

    SCoL trains LLMs via meta-reinforcement learning to generate layer-specific update instructions that improve knowledge acquisition and retention from context streams over standard baselines.

  • Diversity in Large Language Models under Supervised Fine-Tuning cs.LG · 2026-04-30 · unverdicted · none · ref 7 · 2 links

    TOFU loss mitigates the narrowing of generative diversity in LLMs after supervised fine-tuning by addressing neglect of low-frequency patterns and forgetting of prior knowledge.

  • Proximal Supervised Fine-Tuning cs.LG · 2025-08-25 · unverdicted · none · ref 15

    PSFT modifies supervised fine-tuning by incorporating trust-region ideas from RL to constrain policy changes, yielding better out-of-domain generalization in math and human-value tasks without entropy collapse.

  • Agentic Reasoning for Large Language Models cs.AI · 2026-01-18 · unverdicted · none · ref 231

    The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.