pith. sign in

arxiv: 2604.07855 · v1 · submitted 2026-04-09 · 💻 cs.AI

Hidden Biases in Conditioning Autoregressive Models

Pith reviewed 2026-05-10 17:20 UTC · model grok-4.3

classification 💻 cs.AI
keywords autoregressive modelsconstrained generationNP-hardnessMAP decodingconditioned normalizationcomputational complexityglobal constraintsneural language models
0
0 comments X

The pith

Exact MAP decoding and conditioned normalization for general autoregressive models are NP-hard and #P-hard respectively.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Autoregressive models used for language and music generation allow easy local next-token sampling but face intractability when global constraints such as rhyme, meter, or fixed length must be enforced exactly. The paper proves that finding the highest-probability full sentence satisfying such constraints, called MAP decoding, is NP-hard for models whose next-token probabilities are computable in polynomial time. Exact normalization of probabilities over the constrained space is #P-hard even for simple regular constraints. These results imply that common practical procedures for constrained generation produce samples whose distribution differs from the true conditional distribution of the underlying model. General autoregressive models lack the bounded-state dynamic programs that make the same tasks tractable for finite-state Markov models.

Core claim

For succinctly represented autoregressive models whose next-token probabilities are computable in polynomial time, exact sentence-level maximum a posteriori decoding is NP-hard, and the hardness continues to hold under unary and metrical constraints. Exact conditioned normalization is #P-hard even for regular constraints such as fixed-length terminal events. Unlike finite-state Markov models, general autoregressive models do not admit a bounded-state dynamic program for these tasks.

What carries the argument

The computational separation between local next-token sampling, which remains efficient, and exact global inference tasks that require searching over exponentially many constrained completions without bounded memory.

If this is right

  • Practical constrained generation systems must rely on approximate or heuristic methods that lack guarantees of complete coverage or correct conditional probabilities.
  • Samples produced under global form requirements are distorted relative to the model's true constrained distribution.
  • Local autoregressive sampling stays tractable while exact decoding and exact conditioning under global constraints become intractable.
  • The hidden bias is distinct from training-data bias and arises purely from the inference procedure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Developers of constrained-generation tools may need to quantify or bound the distortion introduced by their chosen approximation rather than assuming it is negligible.
  • Subclasses of autoregressive models that retain enough structure for efficient exact inference could be identified and preferred for applications requiring precise conditioning.
  • Similar hardness results may apply to other sequential generation settings where global constraints are imposed after local training.

Load-bearing premise

The models are general autoregressive structures that cannot be reduced to finite-state Markov models, which would otherwise permit efficient dynamic programming.

What would settle it

Discovery of a polynomial-time algorithm or bounded-state dynamic program that computes exact MAP decoding or exact conditioned normalization for a succinctly represented general autoregressive model under metrical constraints.

read the original abstract

Large language and music models are increasingly used for constrained generation: rhyming lines, fixed meter, inpainting or infilling, positional endings, and other global form requirements. These systems often perform strikingly well, but the induced procedures are usually not exact conditioning of the underlying autoregressive model. This creates a hidden inferential bias, distinct from the better-known notion of bias inherited from the training set: samples are distorted relative to the true constrained distribution, with no generic guarantee of complete coverage of the admissible solution space or of correct conditional probabilities over valid completions. We formalize several exact inference tasks for autoregressive models and prove corresponding hardness results. For succinctly represented autoregressive models whose next-token probabilities are computable in polynomial time, exact sentence-level maximum a posteriori (MAP) decoding is NP-hard. This hardness persists under unary and metrical constraints. On the sampling side, exact conditioned normalization is \#P-hard even for regular constraints such as fixed-length terminal events. Unlike finite-state Markov models, general autoregressive models do not admit a bounded-state dynamic program for these tasks. These results formalize a standard claim in the neural decoding literature: local autoregressive sampling is easy, whereas exact decoding and exact conditioning under global form constraints are computationally intractable in general.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims that for succinctly represented autoregressive models whose next-token probabilities are computable in polynomial time, exact sentence-level MAP decoding is NP-hard; this hardness persists under unary and metrical constraints. Exact conditioned normalization is #P-hard even for regular constraints such as fixed-length terminal events. Unlike finite-state Markov models, general autoregressive models do not admit a bounded-state dynamic program for these tasks. The results formalize why local autoregressive sampling is easy while exact decoding and conditioning under global constraints are intractable.

Significance. If the hardness results hold, the work supplies a clean complexity-theoretic account of the computational gap between approximate sampling and exact inference under global form constraints in autoregressive models. It supplies a parameter-free contrast with finite-state Markov models (which admit bounded-state DP) and thereby explains the hidden inferential bias introduced by heuristic constrained-generation procedures. The use of standard reductions from complexity theory is a strength.

minor comments (3)
  1. The abstract and introduction contrast general autoregressive models with finite-state Markov models, but the manuscript should include a short, self-contained example (e.g., a 2-state Markov chain versus a simple history-dependent AR model) showing why the bounded-state DP works for the former but not the latter.
  2. Definitions of 'unary constraints' and 'metrical constraints' appear in the hardness statements; they should be formalized once in the preliminaries section with explicit notation before being invoked in the theorems.
  3. The paper states that the next-token probabilities are 'computable in polynomial time'; the precise model of computation (e.g., oracle access, bit complexity) should be stated explicitly in the problem formulation to make the succinct-representation assumption unambiguous.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, accurate summary of its contributions, and recommendation for minor revision. The report correctly identifies the key complexity-theoretic results on MAP decoding and conditioned normalization for autoregressive models.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central results are NP-hardness and #P-hardness proofs for exact MAP decoding and conditioned normalization in succinctly represented autoregressive models. These are established via standard complexity-theoretic reductions from known hard problems, relying on the polynomial-time next-token oracle and the distinction from finite-state Markov models (which admit bounded-state DP). No equations reduce a claimed prediction to a fitted input by construction, no load-bearing self-citations are invoked to justify uniqueness or ansatzes, and the derivation does not rename empirical patterns or smuggle assumptions via prior author work. The proofs are independent of any fitted values and rest on external complexity theory.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard definitions of autoregressive models with polynomial-time next-token probabilities and global constraints, plus foundational complexity theory; no free parameters or invented entities are introduced.

axioms (1)
  • standard math Standard assumptions in computational complexity theory such as the implication of NP-hardness results (typically P ≠ NP).
    Hardness results are interpreted under these foundational assumptions in complexity theory, as invoked in the abstract's statements on NP-hardness and #P-hardness.

pith-pipeline@v0.9.0 · 5512 in / 1439 out tokens · 54378 ms · 2026-05-10T17:20:59.343939+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

  1. [1]

    Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation

    Bryan Eikema and Wilker Aziz. Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation. InProceedings of COLING 2020, pages 3080–3090, 2020

  2. [2]

    Language (Technology) is Power: A Critical Survey of “Bias” in NLP

    Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. Language (Technology) is Power: A Critical Survey of “Bias” in NLP. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476, 2020

  3. [3]

    If Beam Search is the Answer, What was the Question? InProceedings of EMNLP 2020, pages 2173–2185, 2020

    Clara Meister, Tim Vieira, and Ryan Cotterell. If Beam Search is the Answer, What was the Question? InProceedings of EMNLP 2020, pages 2173–2185, 2020

  4. [4]

    On NMT Search Errors and Model Errors: Cat Got Your Tongue? InProceedings of EMNLP-IJCNLP 2019, pages 3356–3362, 2019

    Felix Stahlberg and Bill Byrne. On NMT Search Errors and Model Errors: Cat Got Your Tongue? InProceedings of EMNLP-IJCNLP 2019, pages 3356–3362, 2019

  5. [5]

    Anticipation-RNN: enforcing unary constraints in sequence generation, with application to interactive music generation.Neural Computing and Applications, 32:995–1005, 2020

    Gaëtan Hadjeres and Frank Nielsen. Anticipation-RNN: enforcing unary constraints in sequence generation, with application to interactive music generation.Neural Computing and Applications, 32:995–1005, 2020

  6. [6]

    Learning to Traverse Latent Spaces for Musical Score Inpainting

    Ashis Pati, Alexander Lerch, and Gaëtan Hadjeres. Learning to Traverse Latent Spaces for Musical Score Inpainting. InProceedings of the 20th International Society for Music Information Retrieval Conference, pages 343–351, 2019

  7. [7]

    The Piano Inpainting Application

    Gaëtan Hadjeres and Léopold Crestel. The Piano Inpainting Application. arXiv:2107.05944, 2021

  8. [8]

    Enabling Language Models to Fill in the Blanks

    Chris Donahue, Mina Lee, and Percy Liang. Enabling Language Models to Fill in the Blanks. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2492–2501, 2020

  9. [9]

    DeepBach: a Steerable Model for Bach Chorales Generation

    Gaëtan Hadjeres, François Pachet, and Frank Nielsen. DeepBach: a Steerable Model for Bach Chorales Generation. InProceedings of the 34th International Conference on Machine Learning, pages 1362–1371, 2017

  10. [10]

    Markov Constraints for Controlled Sequence Generation

    François Pachet and Pierre Roy. Markov Constraints for Controlled Sequence Generation. To appear inFestum-PI 2025 Proceedings,Lecture Notes in Mathematics, Springer, N. Alikakos and Cédric Villani, editors, 2026

  11. [11]

    Exact Sampling for Regular and Markov Constraints with Belief Propagation

    Alexandre Papadopoulos, François Pachet, Pierre Roy, and Jason Sakellariou. Exact Sampling for Regular and Markov Constraints with Belief Propagation. InPrinciples and Practice of Constraint Programming, pages 341–350, 2015. 9