Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics

· 2026 · cs.CL · arXiv 2606.08417

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Diffusion and continuous flow-based language models have emerged as the leading non-autoregressive alternatives to language modeling. Progress in both paradigms is overwhelmingly tracked by generative perplexity (gen-PPL): the per-token negative log-likelihood of samples under a frozen autoregressive (AR) scorer such as gpt2-large, typically paired with an empirical-entropy guardrail to rule out low-entropy collapse. We argue that this metric is unsound. By construction, gen-PPL measures only predictability under the scoring AR, not grammaticality or semantic coherence -- and the set of predictable but still low-quality sequences is combinatorially large. To make this concrete, we construct a suite of zero-parameter, deliberately naive samplers that achieve state-of-the-art gen-PPL on LM1B and OpenWebText at non-degenerate entropy, surpassing recently published diffusion and continuous-flow models while producing text that is incoherent by construction. We recommend evaluation suites that directly quantify the distributional divergence between generated and reference text, and use such a suite to re-benchmark recent non-autoregressive models, recovering a more faithful picture of the current state of the art.

representative citing papers

Low Perplexity is Repetition: A One-Dimensional Self-Conditioning Attractor in Continuous Diffusion LMs

cs.CL · 2026-07-01 · unverdicted · novelty 7.0

Low Gen-PPL in continuous diffusion LMs results from repetition caused by a 1D contractive attractor in self-conditioning feedback; ACE subtracts the direction to reduce repetition to human levels while preserving quality.

Masked Language Flow Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MLFMs combine masking with continuous flows to scale flow-based language models to reasoning and instruction-following tasks on GSM8K and MT-Bench.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Low Perplexity is Repetition: A One-Dimensional Self-Conditioning Attractor in Continuous Diffusion LMs cs.CL · 2026-07-01 · unverdicted · none · ref 5 · internal anchor
Low Gen-PPL in continuous diffusion LMs results from repetition caused by a 1D contractive attractor in self-conditioning feedback; ACE subtracts the direction to reduce repetition to human levels while preserving quality.
Masked Language Flow Models cs.CL · 2026-06-26 · unverdicted · none · ref 18 · internal anchor
MLFMs combine masking with continuous flows to scale flow-based language models to reasoning and instruction-following tasks on GSM8K and MT-Bench.

Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics

fields

years

verdicts

representative citing papers

citing papers explorer