M ix CE : Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies

Zhang, Shiyue, Wu, Shijie, Irsoy, Ozan, Lu, Steven, Bansal, Mohit, Dredze, Mark · 2023 · DOI 10.18653/v1/2023.acl-long.502

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

Perturbation is All You Need for Extrapolating Language Models

stat.ML · 2026-05-05 · unverdicted · novelty 6.0

Perturbing prefixes to semantic neighbors during training creates a hierarchical noise model that improves language model predictions on token sequences outside the training corpus support.

citing papers explorer

Showing 1 of 1 citing paper.

Perturbation is All You Need for Extrapolating Language Models stat.ML · 2026-05-05 · unverdicted · none · ref 52
Perturbing prefixes to semantic neighbors during training creates a hierarchical noise model that improves language model predictions on token sequences outside the training corpus support.

M ix CE : Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies

fields

years

verdicts

representative citing papers

citing papers explorer