28 3.PAFT atq= 0reduces to posterior-resampled SFT scaled byP θ: ˆ∇PAFT q=0 =−¯wM · 1 K KX k=1 ∇θ logp θ(z(rk),y ∗ |x ∗)

GARL at q= 1 recovers the IWAE gradient estimator [Burda et al · 2015

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum

cs.LG · 2026-04-28 · unverdicted · novelty 7.0 · 2 refs

A single-parameter Tsallis loss continuum unifies SFT and RLVR, derives time-to-escape bounds for cold start, and yields GARL and PAFT estimators that improve performance on QA reasoning tasks.

citing papers explorer

Showing 1 of 1 citing paper.

How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum cs.LG · 2026-04-28 · unverdicted · none · ref 7 · 2 links
A single-parameter Tsallis loss continuum unifies SFT and RLVR, derives time-to-escape bounds for cold start, and yields GARL and PAFT estimators that improve performance on QA reasoning tasks.

28 3.PAFT atq= 0reduces to posterior-resampled SFT scaled byP θ: ˆ∇PAFT q=0 =−¯wM · 1 K KX k=1 ∇θ logp θ(z(rk),y ∗ |x ∗)

fields

years

verdicts

representative citing papers

citing papers explorer