pith. sign in

arxiv: 2601.07525 · v2 · pith:YZBZWDCWnew · submitted 2026-01-12 · 💻 cs.CL · cs.AI

Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

classification 💻 cs.CL cs.AI
keywords reasoningdecodinggenerationapproachconstrainedfree-formlanguagelarge
0
0 comments X
read the original abstract

Natural generation allows Large Language Models (LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely, constrained decoding ensures standardized formats but can inadvertently restrict reasoning capabilities by imposing constraints too early in the generation process. We propose a hybrid approach, namely In-Writing, that combines free-form reasoning and structured generation in a single call. The model first performs unconstrained reasoning and only applies structured decoding after a trigger token is generated, explicitly decoupling reasoning from formatting. We establish that our trigger-token strategies are able to virtually eradicate premature triggering, a failure mode in which constrained decoding interrupts on-going reasoning. Evaluations across diverse datasets covering classification and reasoning tasks demonstrate that our approach outperforms the state-of-the-art by achieving accuracy gains of up to 27% over natural generation. Our code are available at: https://github.com/Nokia-Bell-Labs/InWriting.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Schema Key Wording as an Instruction Channel in Structured Generation under Constrained Decoding

    cs.CL 2026-04 unverdicted novelty 7.0

    Schema-key wording functions as an implicit instruction channel under constrained decoding, with experiments showing that rephrasing only the keys can substantially change accuracy on math benchmarks while prompt, mod...

  2. When Correct Isn't Usable: Improving Structured Output Reliability in Small Language Models

    cs.CL 2026-05 conditional novelty 6.0

    AloLab, an iterative meta-agent prompt optimizer, raises structured output accuracy for 7-9B models from 0% to 84-87% on GSM8K while preserving near-native inference speed.