Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation

Erfan Baghaei Potraghloo; Massoud Pedram; Seyedarmin Azizi; Souvik Kundu

arxiv: 2509.02510 · v2 · submitted 2025-09-02 · 💻 cs.CL · cs.AI· stat.ML

Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation

Erfan Baghaei Potraghloo , Seyedarmin Azizi , Souvik Kundu , Massoud Pedram This is my paper

Pith reviewed 2026-05-18 19:20 UTC · model grok-4.3

classification 💻 cs.CL cs.AIstat.ML

keywords top-H decodingentropy constrained samplingLLM text generationcreative writingmin-p samplingcoherencediversitydecoding strategy

0 comments

The pith

Top-H decoding is a greedy algorithm that solves an entropy-constrained mass maximization problem to balance creativity and coherence in LLM text generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formulates the balance between diversity and coherence in open-ended text generation as an entropy-constrained minimum divergence problem. It proves equivalence to an entropy-constrained mass maximization problem and introduces top-H decoding as a computationally efficient greedy algorithm to solve it. This method better uses the full probability distribution compared to heuristics like min-p sampling. A sympathetic reader would care because it shows measurable gains in creative writing tasks while remaining robust for question answering.

Core claim

Top-H decoding, a greedy algorithm for the NP-hard entropy-constrained mass maximization problem, which is equivalent to entropy-constrained minimum divergence, outperforms min-p sampling by up to 25.63% on creative writing benchmarks and maintains performance on datasets like GPQA, GSM8K, and MT-Bench.

What carries the argument

Top-H decoding: a greedy algorithm that iteratively selects the minimal set of tokens achieving a target entropy bound while maximizing total probability mass.

Load-bearing premise

The equivalence between the entropy-constrained minimum divergence problem and the entropy-constrained mass maximization problem holds true, and the greedy top-H algorithm provides a good enough solution to the NP-hard problem.

What would settle it

A controlled experiment on the same creative writing benchmarks where min-p sampling achieves equal or better scores than top-H, or where coherence drops at high temperatures, would disprove the performance advantage.

Figures

Figures reproduced from arXiv: 2509.02510 by Erfan Baghaei Potraghloo, Massoud Pedram, Seyedarmin Azizi, Souvik Kundu.

**Figure 2.** Figure 2: (a)-(c): Length-controlled win rates (%) comparison of different SoTA sampling with top-H [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Effect of T scaling on generation coherence in min-p, top-p vs top-H. Impact of α parameter. The only hyperparameter in topH sampling is α, which directly controls the maximum allowable entropy for the distribution q. As such, careful tuning of α is essential. To determine an appropriate value, we randomly select 50 development samples from the Alpaca-Eval dataset and use LLaMA3.1–8B–Instruct to generat… view at source ↗

**Figure 4.** Figure 4: Effect of the parameter α on creativity and coherence. Empirical optimality of the top-H decoding strategy. We now empirically evaluate the competitiveness of the greedy algorithm of the top-H relative to the optimal solution of the ECMM problem, found by exhaustive search. We randomly sample 20 prompts from the Alpaca-Eval dataset and generate responses using the top-H method. At each generation step, the… view at source ↗

**Figure 5.** Figure 5: Empirical evaluation of top-H performance relative to the optimal solution of the ECMM problem. For comparison, we also compute the greedy solution S g using Algorithm 1. At each generation step, we calculate the ratio ΓSg /ΓS∗ , and report the mean and variance of this ratio (across different generation steps) for 20 different evaluation prompts, as visualized in [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Probability distribution of two different types with associated top-H thresholds. [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

read the original abstract

Large language models (LLMs), despite their impressive performance across a wide range of tasks, often struggle to balance two competing objectives in open-ended text generation: fostering diversity and creativity while preserving logical coherence. Existing truncated sampling techniques, including temperature scaling, top-\$p\$ (nucleus) sampling, and min-\$p\$ sampling, aim to manage this trade-off. However, they exhibit limitations, particularly in the effective incorporation of the confidence of the model into the corresponding sampling strategy. For example, min-\$p\$ sampling relies on a single top token as a heuristic for confidence, eventually underutilizing the information of the probability distribution. Toward effective incorporation of the confidence of the model, in this paper, we present **top-H** decoding. We first establish the theoretical foundation of the interplay between creativity and coherence in truncated sampling by formulating an **entropy-constrained minimum divergence** problem. We then prove this minimization problem to be equivalent to an **entropy-constrained mass maximization** (ECMM) problem, which is NP-hard. Finally, we present top-H decoding, a computationally efficient greedy algorithm to solve the ECMM problem. Extensive empirical evaluations demonstrate that top-H outperforms the state-of-the-art (SoTA) alternative of min-\$p\$ sampling by up to **25.63%** on creative writing benchmarks, while maintaining robustness on question-answering datasets such as GPQA, GSM8K, and MT-Bench. Additionally, an *LLM-as-judge* evaluation confirms that top-H indeed produces coherent outputs even at higher temperatures, where creativity is especially critical. In summary, top-H advances SoTA in open-ended text generation and can be *easily integrated* into creative writing applications. The code is available at https://github.com/ErfanBaghaei/Top-H-Decoding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Top-H decoding frames text sampling as an entropy-constrained mass maximization problem solved by a greedy algorithm, with reported gains over min-p on creative tasks, but the proof details and approximation quality are not visible in the abstract.

read the letter

The main point is that this paper introduces top-H decoding as a way to handle the creativity-coherence trade-off through an entropy-constrained formulation. They set up an entropy-constrained minimum divergence problem, prove it equivalent to an entropy-constrained mass maximization (ECMM) problem that is NP-hard, and then give a greedy algorithm to solve it in practice. The experiments claim up to 25.63% better performance than min-p on creative writing benchmarks while staying competitive on GPQA, GSM8K, and MT-Bench, plus some LLM-as-judge checks for coherence at higher temperatures. The code is released, which helps.

Referee Report

2 major / 2 minor

Summary. The paper introduces top-H decoding, a greedy algorithm for open-ended text generation in LLMs. It formulates an entropy-constrained minimum divergence problem, proves equivalence to the NP-hard entropy-constrained mass maximization (ECMM) problem, and claims the greedy top-H solves ECMM efficiently. Experiments report up to 25.63% gains over min-p sampling on creative writing benchmarks, with maintained robustness on GPQA, GSM8K, and MT-Bench, plus LLM-as-judge confirmation of coherence at higher temperatures.

Significance. If the equivalence holds and the greedy algorithm reliably approximates the ECMM optimum, the work could supply a more principled, distribution-aware truncation method that improves the creativity-coherence tradeoff beyond heuristic approaches like min-p. Code availability and the empirical scale of the reported gains would be clear strengths supporting adoption in creative applications.

major comments (2)

[Abstract and theoretical foundation] Abstract: the asserted proof of equivalence between the entropy-constrained minimum divergence problem and the ECMM problem, together with the NP-hardness claim, is stated without any derivation steps, theorem statements, or equations. This equivalence is load-bearing for the claim that top-H solves the underlying optimization problem rather than being an ad-hoc truncation rule.
[Method] Method section describing the top-H greedy algorithm: no approximation ratio, worst-case guarantee, or comparison against exact ECMM solutions on small instances is supplied. Without such analysis the 25.63% empirical advantage cannot be confidently attributed to better mass maximization under the entropy bound, directly engaging the concern that performance may arise from any other truncation heuristic.

minor comments (2)

[Experiments] Experiments: report the number of runs, variance, and exact hyper-parameter controls used for the creative-writing and QA benchmarks to allow reproduction of the quantitative deltas.
[Method] Notation: clarify whether the entropy bound is enforced exactly or approximately in the greedy selection rule, and provide the precise stopping criterion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and indicate the revisions we intend to make to strengthen the presentation of the theoretical results and the analysis of the algorithm.

read point-by-point responses

Referee: [Abstract and theoretical foundation] Abstract: the asserted proof of equivalence between the entropy-constrained minimum divergence problem and the ECMM problem, together with the NP-hardness claim, is stated without any derivation steps, theorem statements, or equations. This equivalence is load-bearing for the claim that top-H solves the underlying optimization problem rather than being an ad-hoc truncation rule.

Authors: We agree that the abstract, constrained by length, summarizes the equivalence and NP-hardness result at a high level without including derivation steps or equations. The full formulation of the entropy-constrained minimum divergence problem, the proof of equivalence to the ECMM problem, and the NP-hardness argument appear in the Method section of the manuscript. To address the concern that the claim appears unsupported in the abstract, we will revise the abstract to include a concise statement of the key equivalence result. This change will make explicit that top-H is derived from the optimization problem. revision: yes
Referee: [Method] Method section describing the top-H greedy algorithm: no approximation ratio, worst-case guarantee, or comparison against exact ECMM solutions on small instances is supplied. Without such analysis the 25.63% empirical advantage cannot be confidently attributed to better mass maximization under the entropy bound, directly engaging the concern that performance may arise from any other truncation heuristic.

Authors: The referee correctly identifies the absence of approximation guarantees or small-scale exact comparisons in the current manuscript. Because the ECMM problem is NP-hard, exact solutions are only tractable for small vocabularies. We will add a new subsection with experiments on synthetic small instances, comparing the greedy top-H selections against exact ECMM optima obtained via exhaustive search or dynamic programming. We will also include a brief discussion of the greedy algorithm's relation to known approximation techniques for constrained submodular maximization and acknowledge that a formal worst-case ratio is not derived. These additions will help link the observed performance gains more directly to the mass-maximization objective under the entropy constraint. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is a fresh optimization framing

full rationale

The paper introduces an entropy-constrained minimum divergence problem, proves its mathematical equivalence to the ECMM problem, and proposes a greedy top-H algorithm as an efficient heuristic for the NP-hard ECMM. This chain relies on new definitions and a proof rather than reducing any claimed result to fitted parameters, self-citations, or prior equations by construction. Empirical performance claims are evaluated separately on benchmarks and do not feed back into the derivation. No load-bearing steps match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unshown proof that the minimum-divergence problem equals the mass-maximization problem and on the assumption that a greedy procedure suffices for the NP-hard task.

axioms (1)

domain assumption Entropy-constrained minimum divergence problem is equivalent to entropy-constrained mass maximization problem
Stated as proved in the paper and used as the basis for the algorithm.

pith-pipeline@v0.9.0 · 5891 in / 1220 out tokens · 39184 ms · 2026-05-18T19:20:22.073064+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We first establish the theoretical foundation ... by formulating an entropy-constrained minimum divergence problem. We then prove this minimization problem to be equivalent to an entropy-constrained mass maximization (ECMM) problem, which is NP-hard.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Algorithm 1 Top-H: proposed greedy token selection algorithm ... Sort tokens in descending order of probability ... until H(q) > α · H(p)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Geometry-Aware Decoding with Wasserstein-Regularized Truncation and Mass Penalties for Large Language Models
cs.CL 2026-02 unverdicted novelty 7.0

Top-W applies Wasserstein-regularized truncation on token-embedding geometry to create a closed-form optimal crop for LLM sampling that outperforms prior methods by up to 33.7% on GSM8K, GPQA, AlpacaEval, and MT-Bench.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · cited by 1 Pith paper · 7 internal anchors

[1]

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, et al. Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

A learning algorithm for boltzmann machines

David H Ackley, Geoffrey E Hinton, and Terrence J Sejnowski. A learning algorithm for boltzmann machines. Cognitive science, 9 0 (1): 0 147--169, 1985

work page 1985
[3]

Mirostat: A neural text decoding algorithm that directly controls perplexity

Sourya Basu, Govardana Sachitanandam Ramachandran, Nitish Shirish Keskar, and Lav R Varshney. Mirostat: A neural text decoding algorithm that directly controls perplexity. arXiv preprint arXiv:2007.14966, 2020

work page arXiv 2007
[4]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[5]

Lm evaluation harness, 2023

EleutherAI. Lm evaluation harness, 2023. URL https://github.com/EleutherAI/lm-evaluation-harness. Version 0.4.5

work page 2023
[6]

Hierarchical Neural Story Generation

Angela Fan, Mike Lewis, and Yann Dauphin. Hierarchical neural story generation. arXiv preprint arXiv:1805.04833, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

Garey and David S

Michael R. Garey and David S. Johnson. Computers and intractability: A guide to the theory of np-completeness. In W.H. Freeman. W. H. Freeman and Company, 1979. Problem SP13: Cardinality-Constrained Subset Sum is NP-complete

work page 1979
[8]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[9]

Truncation sampling as language model desmoothing

John Hewitt, Christopher D Manning, and Percy Liang. Truncation sampling as language model desmoothing. arXiv preprint arXiv:2210.15191, 2022

work page arXiv 2022
[10]

The Curious Case of Neural Text Degeneration

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[11]

Coauthor: Designing a human-ai collaborative writing dataset for exploring language model capabilities

Mina Lee, Percy Liang, and Qian Yang. Coauthor: Designing a human-ai collaborative writing dataset for exploring language model capabilities. In Proceedings of the 2022 CHI conference on human factors in computing systems, pp.\ 1--19, 2022

work page 2022
[12]

Hashimoto

Xuechen Li, Tianyi Zhang, Yann Dubois, Rohan Taori, Ishaan Gulrajani, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. Alpacaeval: An automatic evaluator of instruction-following models. https://github.com/tatsu-lab/alpaca_eval, 5 2023

work page 2023
[13]

N., Baker, A., Neo, C., Roush, A., Kirsch, A., and Shwartz-Ziv, R

Minh Nguyen, Andrew Baker, Clement Neo, Allen Roush, Andreas Kirsch, and Ravid Shwartz-Ziv. Turning up the heat: Min-p sampling for creative and coherent llm outputs. arXiv preprint arXiv:2407.01082, 2024

work page arXiv 2024
[14]

GPT-4o System Card

OpenAI. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024. URL https://arxiv.org/abs/2410.21276

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

Papadimitriou

Christos H. Papadimitriou. Computational Complexity. Addison-Wesley, Reading, Massachusetts, 1994. ISBN 978-0-201-53082-7

work page 1994
[16]

Gpqa: A graduate-level google-proof q&a benchmark

David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R Bowman. Gpqa: A graduate-level google-proof q&a benchmark. In First Conference on Language Modeling, 2024

work page 2024
[17]

An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2. 5 technical report. arXiv preprint arXiv:2412.15115, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

Judging llm-as-a-judge with mt-bench and chatbot arena

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36: 0 46595--46623, 2023

work page 2023
[19]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[1] [1]

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, et al. Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

A learning algorithm for boltzmann machines

David H Ackley, Geoffrey E Hinton, and Terrence J Sejnowski. A learning algorithm for boltzmann machines. Cognitive science, 9 0 (1): 0 147--169, 1985

work page 1985

[3] [3]

Mirostat: A neural text decoding algorithm that directly controls perplexity

Sourya Basu, Govardana Sachitanandam Ramachandran, Nitish Shirish Keskar, and Lav R Varshney. Mirostat: A neural text decoding algorithm that directly controls perplexity. arXiv preprint arXiv:2007.14966, 2020

work page arXiv 2007

[4] [4]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[5] [5]

Lm evaluation harness, 2023

EleutherAI. Lm evaluation harness, 2023. URL https://github.com/EleutherAI/lm-evaluation-harness. Version 0.4.5

work page 2023

[6] [6]

Hierarchical Neural Story Generation

Angela Fan, Mike Lewis, and Yann Dauphin. Hierarchical neural story generation. arXiv preprint arXiv:1805.04833, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

Garey and David S

Michael R. Garey and David S. Johnson. Computers and intractability: A guide to the theory of np-completeness. In W.H. Freeman. W. H. Freeman and Company, 1979. Problem SP13: Cardinality-Constrained Subset Sum is NP-complete

work page 1979

[8] [8]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[9] [9]

Truncation sampling as language model desmoothing

John Hewitt, Christopher D Manning, and Percy Liang. Truncation sampling as language model desmoothing. arXiv preprint arXiv:2210.15191, 2022

work page arXiv 2022

[10] [10]

The Curious Case of Neural Text Degeneration

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[11] [11]

Coauthor: Designing a human-ai collaborative writing dataset for exploring language model capabilities

Mina Lee, Percy Liang, and Qian Yang. Coauthor: Designing a human-ai collaborative writing dataset for exploring language model capabilities. In Proceedings of the 2022 CHI conference on human factors in computing systems, pp.\ 1--19, 2022

work page 2022

[12] [12]

Hashimoto

Xuechen Li, Tianyi Zhang, Yann Dubois, Rohan Taori, Ishaan Gulrajani, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. Alpacaeval: An automatic evaluator of instruction-following models. https://github.com/tatsu-lab/alpaca_eval, 5 2023

work page 2023

[13] [13]

N., Baker, A., Neo, C., Roush, A., Kirsch, A., and Shwartz-Ziv, R

Minh Nguyen, Andrew Baker, Clement Neo, Allen Roush, Andreas Kirsch, and Ravid Shwartz-Ziv. Turning up the heat: Min-p sampling for creative and coherent llm outputs. arXiv preprint arXiv:2407.01082, 2024

work page arXiv 2024

[14] [14]

GPT-4o System Card

OpenAI. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024. URL https://arxiv.org/abs/2410.21276

work page internal anchor Pith review Pith/arXiv arXiv 2024

[15] [15]

Papadimitriou

Christos H. Papadimitriou. Computational Complexity. Addison-Wesley, Reading, Massachusetts, 1994. ISBN 978-0-201-53082-7

work page 1994

[16] [16]

Gpqa: A graduate-level google-proof q&a benchmark

David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R Bowman. Gpqa: A graduate-level google-proof q&a benchmark. In First Conference on Language Modeling, 2024

work page 2024

[17] [17]

An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2. 5 technical report. arXiv preprint arXiv:2412.15115, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[18] [18]

Judging llm-as-a-judge with mt-bench and chatbot arena

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36: 0 46595--46623, 2023

work page 2023

[19] [19]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page