Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation
Pith reviewed 2026-05-18 19:20 UTC · model grok-4.3
The pith
Top-H decoding is a greedy algorithm that solves an entropy-constrained mass maximization problem to balance creativity and coherence in LLM text generation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Top-H decoding, a greedy algorithm for the NP-hard entropy-constrained mass maximization problem, which is equivalent to entropy-constrained minimum divergence, outperforms min-p sampling by up to 25.63% on creative writing benchmarks and maintains performance on datasets like GPQA, GSM8K, and MT-Bench.
What carries the argument
Top-H decoding: a greedy algorithm that iteratively selects the minimal set of tokens achieving a target entropy bound while maximizing total probability mass.
Load-bearing premise
The equivalence between the entropy-constrained minimum divergence problem and the entropy-constrained mass maximization problem holds true, and the greedy top-H algorithm provides a good enough solution to the NP-hard problem.
What would settle it
A controlled experiment on the same creative writing benchmarks where min-p sampling achieves equal or better scores than top-H, or where coherence drops at high temperatures, would disprove the performance advantage.
Figures
read the original abstract
Large language models (LLMs), despite their impressive performance across a wide range of tasks, often struggle to balance two competing objectives in open-ended text generation: fostering diversity and creativity while preserving logical coherence. Existing truncated sampling techniques, including temperature scaling, top-\$p\$ (nucleus) sampling, and min-\$p\$ sampling, aim to manage this trade-off. However, they exhibit limitations, particularly in the effective incorporation of the confidence of the model into the corresponding sampling strategy. For example, min-\$p\$ sampling relies on a single top token as a heuristic for confidence, eventually underutilizing the information of the probability distribution. Toward effective incorporation of the confidence of the model, in this paper, we present **top-H** decoding. We first establish the theoretical foundation of the interplay between creativity and coherence in truncated sampling by formulating an **entropy-constrained minimum divergence** problem. We then prove this minimization problem to be equivalent to an **entropy-constrained mass maximization** (ECMM) problem, which is NP-hard. Finally, we present top-H decoding, a computationally efficient greedy algorithm to solve the ECMM problem. Extensive empirical evaluations demonstrate that top-H outperforms the state-of-the-art (SoTA) alternative of min-\$p\$ sampling by up to **25.63%** on creative writing benchmarks, while maintaining robustness on question-answering datasets such as GPQA, GSM8K, and MT-Bench. Additionally, an *LLM-as-judge* evaluation confirms that top-H indeed produces coherent outputs even at higher temperatures, where creativity is especially critical. In summary, top-H advances SoTA in open-ended text generation and can be *easily integrated* into creative writing applications. The code is available at https://github.com/ErfanBaghaei/Top-H-Decoding.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces top-H decoding, a greedy algorithm for open-ended text generation in LLMs. It formulates an entropy-constrained minimum divergence problem, proves equivalence to the NP-hard entropy-constrained mass maximization (ECMM) problem, and claims the greedy top-H solves ECMM efficiently. Experiments report up to 25.63% gains over min-p sampling on creative writing benchmarks, with maintained robustness on GPQA, GSM8K, and MT-Bench, plus LLM-as-judge confirmation of coherence at higher temperatures.
Significance. If the equivalence holds and the greedy algorithm reliably approximates the ECMM optimum, the work could supply a more principled, distribution-aware truncation method that improves the creativity-coherence tradeoff beyond heuristic approaches like min-p. Code availability and the empirical scale of the reported gains would be clear strengths supporting adoption in creative applications.
major comments (2)
- [Abstract and theoretical foundation] Abstract: the asserted proof of equivalence between the entropy-constrained minimum divergence problem and the ECMM problem, together with the NP-hardness claim, is stated without any derivation steps, theorem statements, or equations. This equivalence is load-bearing for the claim that top-H solves the underlying optimization problem rather than being an ad-hoc truncation rule.
- [Method] Method section describing the top-H greedy algorithm: no approximation ratio, worst-case guarantee, or comparison against exact ECMM solutions on small instances is supplied. Without such analysis the 25.63% empirical advantage cannot be confidently attributed to better mass maximization under the entropy bound, directly engaging the concern that performance may arise from any other truncation heuristic.
minor comments (2)
- [Experiments] Experiments: report the number of runs, variance, and exact hyper-parameter controls used for the creative-writing and QA benchmarks to allow reproduction of the quantitative deltas.
- [Method] Notation: clarify whether the entropy bound is enforced exactly or approximately in the greedy selection rule, and provide the precise stopping criterion.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and indicate the revisions we intend to make to strengthen the presentation of the theoretical results and the analysis of the algorithm.
read point-by-point responses
-
Referee: [Abstract and theoretical foundation] Abstract: the asserted proof of equivalence between the entropy-constrained minimum divergence problem and the ECMM problem, together with the NP-hardness claim, is stated without any derivation steps, theorem statements, or equations. This equivalence is load-bearing for the claim that top-H solves the underlying optimization problem rather than being an ad-hoc truncation rule.
Authors: We agree that the abstract, constrained by length, summarizes the equivalence and NP-hardness result at a high level without including derivation steps or equations. The full formulation of the entropy-constrained minimum divergence problem, the proof of equivalence to the ECMM problem, and the NP-hardness argument appear in the Method section of the manuscript. To address the concern that the claim appears unsupported in the abstract, we will revise the abstract to include a concise statement of the key equivalence result. This change will make explicit that top-H is derived from the optimization problem. revision: yes
-
Referee: [Method] Method section describing the top-H greedy algorithm: no approximation ratio, worst-case guarantee, or comparison against exact ECMM solutions on small instances is supplied. Without such analysis the 25.63% empirical advantage cannot be confidently attributed to better mass maximization under the entropy bound, directly engaging the concern that performance may arise from any other truncation heuristic.
Authors: The referee correctly identifies the absence of approximation guarantees or small-scale exact comparisons in the current manuscript. Because the ECMM problem is NP-hard, exact solutions are only tractable for small vocabularies. We will add a new subsection with experiments on synthetic small instances, comparing the greedy top-H selections against exact ECMM optima obtained via exhaustive search or dynamic programming. We will also include a brief discussion of the greedy algorithm's relation to known approximation techniques for constrained submodular maximization and acknowledge that a formal worst-case ratio is not derived. These additions will help link the observed performance gains more directly to the mass-maximization objective under the entropy constraint. revision: yes
Circularity Check
No significant circularity; derivation is a fresh optimization framing
full rationale
The paper introduces an entropy-constrained minimum divergence problem, proves its mathematical equivalence to the ECMM problem, and proposes a greedy top-H algorithm as an efficient heuristic for the NP-hard ECMM. This chain relies on new definitions and a proof rather than reducing any claimed result to fitted parameters, self-citations, or prior equations by construction. Empirical performance claims are evaluated separately on benchmarks and do not feed back into the derivation. No load-bearing steps match the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Entropy-constrained minimum divergence problem is equivalent to entropy-constrained mass maximization problem
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We first establish the theoretical foundation ... by formulating an entropy-constrained minimum divergence problem. We then prove this minimization problem to be equivalent to an entropy-constrained mass maximization (ECMM) problem, which is NP-hard.
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Algorithm 1 Top-H: proposed greedy token selection algorithm ... Sort tokens in descending order of probability ... until H(q) > α · H(p)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Geometry-Aware Decoding with Wasserstein-Regularized Truncation and Mass Penalties for Large Language Models
Top-W applies Wasserstein-regularized truncation on token-embedding geometry to create a closed-form optimal crop for LLM sampling that outperforms prior methods by up to 33.7% on GSM8K, GPQA, AlpacaEval, and MT-Bench.
Reference graph
Works this paper leans on
-
[1]
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, et al. Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[2]
A learning algorithm for boltzmann machines
David H Ackley, Geoffrey E Hinton, and Terrence J Sejnowski. A learning algorithm for boltzmann machines. Cognitive science, 9 0 (1): 0 147--169, 1985
work page 1985
-
[3]
Mirostat: A neural text decoding algorithm that directly controls perplexity
Sourya Basu, Govardana Sachitanandam Ramachandran, Nitish Shirish Keskar, and Lav R Varshney. Mirostat: A neural text decoding algorithm that directly controls perplexity. arXiv preprint arXiv:2007.14966, 2020
-
[4]
Training Verifiers to Solve Math Word Problems
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[5]
EleutherAI. Lm evaluation harness, 2023. URL https://github.com/EleutherAI/lm-evaluation-harness. Version 0.4.5
work page 2023
-
[6]
Hierarchical Neural Story Generation
Angela Fan, Mike Lewis, and Yann Dauphin. Hierarchical neural story generation. arXiv preprint arXiv:1805.04833, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[7]
Michael R. Garey and David S. Johnson. Computers and intractability: A guide to the theory of np-completeness. In W.H. Freeman. W. H. Freeman and Company, 1979. Problem SP13: Cardinality-Constrained Subset Sum is NP-complete
work page 1979
-
[8]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[9]
Truncation sampling as language model desmoothing
John Hewitt, Christopher D Manning, and Percy Liang. Truncation sampling as language model desmoothing. arXiv preprint arXiv:2210.15191, 2022
-
[10]
The Curious Case of Neural Text Degeneration
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[11]
Mina Lee, Percy Liang, and Qian Yang. Coauthor: Designing a human-ai collaborative writing dataset for exploring language model capabilities. In Proceedings of the 2022 CHI conference on human factors in computing systems, pp.\ 1--19, 2022
work page 2022
- [12]
-
[13]
N., Baker, A., Neo, C., Roush, A., Kirsch, A., and Shwartz-Ziv, R
Minh Nguyen, Andrew Baker, Clement Neo, Allen Roush, Andreas Kirsch, and Ravid Shwartz-Ziv. Turning up the heat: Min-p sampling for creative and coherent llm outputs. arXiv preprint arXiv:2407.01082, 2024
-
[14]
OpenAI. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024. URL https://arxiv.org/abs/2410.21276
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[15]
Christos H. Papadimitriou. Computational Complexity. Addison-Wesley, Reading, Massachusetts, 1994. ISBN 978-0-201-53082-7
work page 1994
-
[16]
Gpqa: A graduate-level google-proof q&a benchmark
David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R Bowman. Gpqa: A graduate-level google-proof q&a benchmark. In First Conference on Language Modeling, 2024
work page 2024
-
[17]
An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2. 5 technical report. arXiv preprint arXiv:2412.15115, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[18]
Judging llm-as-a-judge with mt-bench and chatbot arena
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36: 0 46595--46623, 2023
work page 2023
-
[19]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.