arxiv: 2604.27039 · v1 · submitted 2026-04-29 · 💻 cs.CL

Recognition: unknown

Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

Zhen Zhang , Changyi Yang , Zijie Xia , Zhen Yang , Chengzhi Liu , Zhaotiao Weng , Yepeng Liu , Haobo Chen

show 6 more authors

Jin Pan Chenyang Zhao Yuheng Bu Alkesh Patel Zhe Gan Xin Eric Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-07 10:37 UTC · model grok-4.3

classification 💻 cs.CL

keywords length modelingvalue estimationtoken-level modelingautoregressive generationLLM efficiencygeneration controlvalue pretraininglength prediction

0 comments

The pith

LenVM models remaining generation length at each token by estimating value under a constant negative reward per token.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LenVM to address the lack of fine-grained length modeling in autoregressive models by treating length as a value to be estimated at each token. It assigns a constant negative reward to each token generated, which creates a discounted return that acts as a proxy for how many more tokens are needed. This approach provides dense, annotation-free supervision that scales well. Sympathetic readers would care because better length control can balance reasoning quality with computational cost in large models.

Core claim

By formulating length modeling as a value estimation problem and assigning a constant negative reward to each generated token, LenVM predicts a bounded, discounted return that serves as a monotone proxy for the remaining generation horizon. This yields supervision that is annotation-free, dense, unbiased, and scalable.

What carries the argument

The Length Value Model (LenVM), a token-level value estimator trained to predict the remaining sequence length via negative per-token rewards.

If this is right

LenVM improves adherence to exact length targets in generation tasks.
It supports continuous trade-offs between output quality and inference cost.
The model predicts total output length directly from the input prompt alone.
Token-level values reveal how specific tokens steer generation toward shorter or longer sequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The length signal could integrate into reinforcement learning loops to optimize generation policies for both quality and cost.
Similar value estimation might apply to controlling sequence properties beyond length, such as complexity or style.
At deployment the per-token predictions could guide early stopping or budget allocation without extra training.

Load-bearing premise

Assigning a constant negative reward to every generated token produces a return that is monotone and unbiased with respect to the actual remaining generation length at every position.

What would settle it

A test where the predicted values fail to decrease monotonically with each new token or where length-controlled generation using the model shows no improvement over standard baselines on length-matching tasks.

read the original abstract

Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and reasoning performance. Despite its importance, existing approaches lack fine-grained length modeling, operating primarily at the coarse-grained sequence level. We introduce the Length Value Model (LenVM), a token-level framework that models the remaining generation length. By formulating length modeling as a value estimation problem and assigning a constant negative reward to each generated token, LenVM predicts a bounded, discounted return that serves as a monotone proxy for the remaining generation horizon. This formulation yields supervision that is annotation-free, dense, unbiased, and scalable. Experiments on LLMs and VLMs demonstrate LenVM provides a highly effective signal at inference time. On the LIFEBench exact length matching task, applying LenVM to a 7B model improves the length score from 30.9 to 64.8, significantly outperforming frontier closed-source models. Furthermore, LenVM enables continuous control over the trade off between performance and efficiency. On GSM8K at a budget of 200 tokens, LenVM maintains 63% accuracy compared to 6 percent for token budget baseline. It also accurately predicts total generation length from the prompt boundary. Finally, LenVM's token-level values offer an interpretable view of generation dynamics, revealing how specific tokens shift reasoning toward shorter or longer regimes. Results demonstrate that LenVM supports a broad range of applications and token length can be effectively modeled as a token-level value signal, highlighting the potential of LenVM as a general framework for length modeling and as a length-specific value signal that could support future RL training. Code is available at https://github.com/eric-ai-lab/Length-Value-Model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LenVM turns remaining length into a token-level value target via constant negative rewards and shows solid gains on control benchmarks.

read the letter

LenVM turns remaining generation length into a token-level value target by assigning a constant negative reward per token. The discounted return then serves as a bounded, monotone proxy for how many tokens are left, and the paper reports clear improvements on length control and efficiency trade-offs. This is the main thing to know: the method is simple, the supervision is free, and the numbers move in the right direction on the tasks they test. What is new is the explicit token-level value pretraining setup for length. Earlier approaches mostly worked at the full-sequence level. Here the value is defined directly from the reward structure, so any finished sequence supplies dense targets without extra labels or human annotation. The math is exact: for k remaining tokens the return equals -(1 - γ^k)/(1 - γ), which is strictly monotone for γ < 1. The paper does well on the empirical side. On LIFEBench the 7B model lifts the exact length score from 30.9 to 64.8 and beats several closed models. On GSM8K at a 200-token budget it keeps 63 percent accuracy while a simple budget baseline falls to 6 percent. It also predicts total length from the prompt alone and gives some view into how individual tokens shift the model toward shorter or longer outputs. The main soft spot is that the value is length in a different form by construction. There is no circularity in training because targets come from complete trajectories, but the signal does not capture anything beyond horizon. The suggestion that it could serve as a general length-specific value for future RL is plausible yet untested here beyond the length tasks. This paper is for people who tune LLMs or VLMs and need practical levers for inference cost or output length. Readers working on controllable generation or efficiency will get immediate use from the benchmarks and the released code. It deserves a serious referee. The formulation is transparent, the gains are substantial on the reported tasks, and the method is straightforward to reproduce. I would send it through peer review.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces the Length Value Model (LenVM), a token-level framework that formulates remaining generation length as a value estimation problem. By assigning a constant negative reward per token, the model predicts a bounded, discounted return that serves as a monotone proxy for the remaining generation horizon. This yields dense, annotation-free supervision derived directly from observed sequence lengths. Experiments on LLMs and VLMs report large gains on the LIFEBench exact length matching task (7B model length score rising from 30.9 to 64.8) and improved accuracy-efficiency trade-offs on GSM8K (63% accuracy at 200-token budget versus 6% for the token-budget baseline), plus interpretable insights into generation dynamics.

Significance. If the central claims hold under full verification, LenVM offers a scalable, annotation-free approach to fine-grained length modeling that addresses a practical limitation in current autoregressive systems. The ability to continuously control the performance-efficiency frontier and the potential extension to RL training constitute clear strengths. The reported numerical improvements over both open baselines and closed-source models on targeted tasks indicate meaningful practical utility for inference-time length prediction and control.

major comments (2)

[Methods] Methods section: the value formulation (constant negative reward per token yielding a discounted return) is presented as producing an unbiased proxy, yet the manuscript provides no explicit derivation or empirical check that the resulting targets are unbiased beyond monotonicity; the exact equation relating return to remaining length and the procedure for extracting targets from complete trajectories must be shown with equations to substantiate the claim.
[Experiments] Experiments section: ablation studies on the discount factor, comparisons against alternative length-modeling baselines, and error analysis of the predicted values versus actual remaining lengths are absent; without these, the claims of scalability and effectiveness cannot be fully assessed from the reported aggregate scores alone.

minor comments (3)

[Abstract] Abstract: 'trade off' should be written as the compound 'trade-off'.
[Abstract] Abstract: '6 percent' should be rendered as '6%' for consistency with other numeric reporting.
[Abstract] The code repository link is welcome, but the manuscript should state which scripts and checkpoints are included to support reproduction of the reported LIFEBench and GSM8K results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive review and for recognizing the potential utility of LenVM. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [Methods] Methods section: the value formulation (constant negative reward per token yielding a discounted return) is presented as producing an unbiased proxy, yet the manuscript provides no explicit derivation or empirical check that the resulting targets are unbiased beyond monotonicity; the exact equation relating return to remaining length and the procedure for extracting targets from complete trajectories must be shown with equations to substantiate the claim.

Authors: We agree that an explicit derivation is required. In the revised manuscript we will insert the following in the Methods section: with constant per-token reward r = -1, the return at a position with remaining length L is exactly G = sum_{k=0}^{L-1} gamma^k * (-1) = -(1 - gamma^L)/(1 - gamma). Because this quantity is computed directly from the observed L of each complete trajectory, the supervision target is the precise return under the defined reward function and is therefore unbiased (not merely monotone). We will also document the extraction procedure: for every token in every training sequence the remaining length is known, the closed-form return is calculated, and that scalar becomes the regression target. These equations and the extraction steps will be added verbatim. revision: yes
Referee: [Experiments] Experiments section: ablation studies on the discount factor, comparisons against alternative length-modeling baselines, and error analysis of the predicted values versus actual remaining lengths are absent; without these, the claims of scalability and effectiveness cannot be fully assessed from the reported aggregate scores alone.

Authors: We accept that the current experimental section is insufficient for full assessment. In the revision we will add: (i) an ablation table varying gamma over {0.9, 0.95, 0.99, 0.999} and reporting effects on both length-prediction MSE and downstream accuracy-efficiency curves; (ii) direct comparisons against two additional baselines (linear length regression from prompt embeddings and a non-discounted cumulative-length predictor); and (iii) an error-analysis subsection containing scatter plots of predicted value versus true remaining length together with per-bin bias and variance statistics. These results will be generated from the same training and evaluation splits already used in the paper. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central formulation deliberately defines the value function via a constant negative per-token reward, yielding a return that is exactly a monotone function of remaining length by the Bellman equation and geometric series summation. This is presented as an intentional modeling choice to obtain dense, annotation-free supervision from any corpus of complete sequences, not as a derived result or prediction that reduces to hidden inputs. No load-bearing step equates a claimed output to its own fitted parameters or self-cited premises; the empirical gains on LIFEBench and GSM8K are external to the formulation itself. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on a standard RL value-function setup plus one domain-specific modeling choice; no new physical entities or complex axioms are introduced.

free parameters (1)

discount factor
Controls the bounded discounted return used as the length proxy; its specific value is not stated in the abstract.

axioms (1)

domain assumption A constant negative reward per generated token yields a monotone proxy for remaining generation length
This modeling choice is presented as the core of LenVM in the abstract.

pith-pipeline@v0.9.0 · 5656 in / 1361 out tokens · 60096 ms · 2026-05-07T10:37:23.244046+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 29 canonical work pages · 8 internal anchors

[1]

OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique , 2025 a

Wasi Uddin Ahmad, Somshubra Majumdar, Aleksander Ficek, Sean Narenthiran, Mehrzad Samadi, Jocelyn Huang, Siddhartha Jain, Vahid Noroozi, and Boris Ginsburg. Opencodereasoning-ii: A simple test time scaling approach via self-critique. arXiv preprint arXiv:2507.09075, 2025

work page arXiv 2025
[2]

Plan-and-write: Structure-guided length control for llms without model retraining

Adewale Akinfaderin, Shreyas Subramanian, and Akarsha Sehwag. Plan-and-write: Structure-guided length control for llms without model retraining. ArXiv, abs/2511.01807, 2025. URL https://api.semanticscholar.org/CorpusID:282739780

work page arXiv 2025
[3]

Precise length control for large language models

Bradley Butcher, Michael O'Keefe, and James Titchener. Precise length control for large language models. Nat. Lang. Process. J., 11: 0 100143, 2024. URL https://api.semanticscholar.org/CorpusID:274788732

2024
[4]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review arXiv 2021
[5]

Planning-aware code infilling via horizon-length prediction, 2025

Yifeng Ding, Hantian Ding, Shiqi Wang, Qing Sun, Varun Kumar, and Zijian Wang. Planning-aware code infilling via horizon-length prediction, 2025. URL https://arxiv.org/abs/2410.03103

work page arXiv 2025
[6]

Constrained sampling for language models should be easy: An mcmc perspective

Emmanuel Anaya Gonzalez, Sairam Vaidya, Kanghee Park, Ruyi Ji, Taylor Berg-Kirkpatrick, and Loris D'antoni. Constrained sampling for language models should be easy: An mcmc perspective. ArXiv, abs/2506.05754, 2025. URL https://api.semanticscholar.org/CorpusID:279245064

work page arXiv 2025
[7]

Length controlled generation for black-box llms

Yuxuan Gu, Wenjie Wang, Xiaocheng Feng, Weihong Zhong, Kun Zhu, Lei Huang, Tat-Seng Chua, and Bing Qin. Length controlled generation for black-box llms. ArXiv, abs/2412.14656, 2024. URL https://api.semanticscholar.org/CorpusID:274859461

work page arXiv 2024
[8]

arXiv preprint arXiv:2504.11456 , year=

Zhiwei He, Tian Liang, Jiahao Xu, Qiuzhi Liu, Xingyu Chen, Yue Wang, Linfeng Song, Dian Yu, Zhenwen Liang, Wenxuan Wang, et al. Deepmath-103k: A large-scale, challenging, decontaminated, and verifiable mathematical dataset for advancing reasoning. arXiv preprint arXiv:2504.11456, 2025

work page arXiv 2025
[9]

Pretrain value, not reward: Decoupled value policy optimization, 2026

Chenghua Huang, Lu Wang, Fangkai Yang, Pu Zhao, Zhixu Li, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, and Qi Zhang. Pretrain value, not reward: Decoupled value policy optimization, 2026. URL https://arxiv.org/abs/2502.16944

work page arXiv 2026
[10]

Gonzalez, Haotong Zhang, and Ion Stoica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Haotong Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. Proceedings of the 29th Symposium on Operating Systems Principles, 2023. URL https://api.semanticscholar.org/CorpusID:261697361

2023
[11]

Leash: Adaptive length penalty and reward shaping for efficient large reasoning model, 2025

Yanhao Li, Lu Ma, Jiaran Zhang, Lexiang Tang, Wentao Zhang, and Guibo Luo. Leash: Adaptive length penalty and reward shaping for efficient large reasoning model, 2025. URL https://arxiv.org/abs/2512.21540

work page arXiv 2025
[12]

Let's verify step by step

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let's verify step by step. In The twelfth international conference on learning representations, 2023

2023
[13]

Dler: Doing length penalty right - incentivizing more intelligence per token via reinforcement learning.ArXiv, abs/2510.15110, 2025

Shih-Yang Liu, Xin Dong, Ximing Lu, Shizhe Diao, Mingjie Liu, Min-Hung Chen, Hongxu Yin, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Yejin Choi, Jan Kautz, and Pavlo Molchanov. Dler: Doing length penalty right - incentivizing more intelligence per token via reinforcement learning, 2025. URL https://arxiv.org/abs/2510.15110

work page arXiv 2025
[14]

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, and Jianfeng Gao. Mathvista: Evaluating mathematical reasoning of foundation models in visual contexts. arXiv preprint arXiv:2310.02255, 2023

work page internal anchor Pith review arXiv 2023
[15]

Cgmh: Constrained sentence generation by metropolis-hastings sampling, 2018

Ning Miao, Hao Zhou, Lili Mou, Rui Yan, and Lei Li. Cgmh: Constrained sentence generation by metropolis-hastings sampling, 2018. URL https://arxiv.org/abs/1811.10996

work page arXiv 2018
[16]

When will the tokens end? graph-based forecasting for LLM s output length

Grzegorz Piotrowski, Mateusz Bystro \'n ski, Miko aj Ho ysz, Jakub Binkowski, Grzegorz Chodak, and Tomasz Jan Kajdanowicz. When will the tokens end? graph-based forecasting for LLM s output length. In Jin Zhao, Mingyang Wang, and Zhu Liu, editors, Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Re...

work page doi:10.18653/v1/2025.acl-srw.61 2025
[17]

Efficiently scaling transformer inference

Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, and Jeff Dean. Efficiently scaling transformer inference. ArXiv, abs/2211.05102, 2022. URL https://api.semanticscholar.org/CorpusID:253420623

work page arXiv 2022
[18]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, and P. Abbeel. High-dimensional continuous control using generalized advantage estimation. CoRR, abs/1506.02438, 2015. URL https://api.semanticscholar.org/CorpusID:3075448

work page internal anchor Pith review arXiv 2015
[19]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. ArXiv, abs/1707.06347, 2017. URL https://api.semanticscholar.org/CorpusID:28695052

work page internal anchor Pith review arXiv 2017
[20]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Charlie Victor Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling llm test-time compute optimally can be more effective than scaling model parameters. ArXiv, abs/2408.03314, 2024. URL https://api.semanticscholar.org/CorpusID:271719990

work page internal anchor Pith review arXiv 2024
[21]

Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, Chuning Tang, Congcong Wang, Dehao Zhang, Enming Yuan, Enzhe Lu, Feng Tang, Flood Sung, Guangda Wei, Guokun Lai, Haiqing Guo, Han Zhu, Haochen Ding, Hao-Xing Hu, Haoming Yang, Hao Zhang, Haotian Yao, Hao-Dong Zhao, Haoyu Lu, Haoze...

work page internal anchor Pith review arXiv 2025
[22]

Just enough thinking: Efficient reasoning with adaptive length penalties reinforcement learning, 2025

Violet Xiang, Chase Blagden, Rafael Rafailov, Nathan Lile, Sang Truong, Chelsea Finn, and Nick Haber. Just enough thinking: Efficient reasoning with adaptive length penalties reinforcement learning, 2025. URL https://arxiv.org/abs/2506.05256

work page arXiv 2025
[24]

Can llms track their output length? a dynamic feedback mechanism for precise length regulation, 2026 b

Meiman Xiao, Ante Wang, Qingguo Hu, Zhongjian Miao, Huangjun Shen, Longyue Wang, Weihua Luo, and Jinsong Su. Can llms track their output length? a dynamic feedback mechanism for precise length regulation, 2026 b . URL https://arxiv.org/abs/2601.01768

work page arXiv 2026
[25]

Predicting LLM output length via entropy-guided representations.arXiv preprint arXiv:2602.11812, 2026

Huanyi Xie, Yubin Chen, Liangyu Wang, Lijie Hu, and Di Wang. Predicting llm output length via entropy-guided representations. ArXiv, abs/2602.11812, 2026. URL https://api.semanticscholar.org/CorpusID:285540500

work page arXiv 2026
[26]

Prompt-based one-shot exact length-controlled generation with llms.arXiv preprint arXiv:2508.13805, 2025

Juncheng Xie and Hung yi Lee. Prompt-based one-shot exact length-controlled generation with llms. ArXiv, abs/2508.13805, 2025. URL https://api.semanticscholar.org/CorpusID:280686321

work page arXiv 2025
[27]

Qwen3 technical report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxin Yang, Jingren Zhou, Jingren Zhou, Junyan Lin, Kai Dang, Keqin Bao, Ke‐Pei Ya...

2025
[28]

Qwen2.5 Technical Report

Qwen An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxin Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin,...

work page internal anchor Pith review arXiv 2024
[29]

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Yu Yue, Yufeng Yuan, Qiying Yu, Xiaochen Zuo, Ruofei Zhu, Wenyuan Xu, Jiaze Chen, Chengyi Wang, TianTian Fan, Zhengyin Du, Xiangpeng Wei, Xiangyu Yu, Gaohong Liu, Juncai Liu, Lingjun Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Ru Zhang, Xin Liu, Mingxuan Wang, Yonghui Wu, and Lin Yan. Vapo: Efficient and reliable rei...

work page internal anchor Pith review arXiv 2025
[30]

Adaptthink: Reasoning models can learn when to think

Jiajie Zhang, Nianyi Lin, Lei Hou, Ling Feng, and Juanzi Li. Adaptthink: Reasoning models can learn when to think. ArXiv, abs/2505.13417, 2025 a . URL https://api.semanticscholar.org/CorpusID:278769267

work page arXiv 2025
[31]

Lifebench: Evaluating length instruction following in large language models.arXiv preprint arXiv:2505.16234, 2025

Wei Zhang, Zhenhong Zhou, Kun Wang, Junfeng Fang, Yuanhe Zhang, Rui Wang, Ge Zhang, Xavier Li, Li Sun, Lingjuan Lyu, et al. Lifebench: Evaluating length instruction following in large language models. arXiv preprint arXiv:2505.16234, 2025 b

work page arXiv 2025
[32]

v_0 : A generalist value model for any policy at state zero, 2026

Yi-Kai Zhang, Zhiyuan Yao, Hongyan Hao, Yueqing Sun, Qi Gu, Hui Su, Xunliang Cai, De-Chuan Zhan, and Han-Jia Ye. v_0 : A generalist value model for any policy at state zero, 2026. URL https://arxiv.org/abs/2602.03584

work page arXiv 2026
[33]

WildChat : 1M ChatGPT Interaction Logs in the Wild

Wenting Zhao, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, and Yuntian Deng. Wildchat: 1m chatgpt interaction logs in the wild. arXiv preprint arXiv:2405.01470, 2024

work page arXiv 2024
[34]

Response length perception and sequence scheduling: An llm-empowered llm inference pipeline, 2023

Zangwei Zheng, Xiaozhe Ren, Fuzhao Xue, Yang Luo, Xin Jiang, and Yang You. Response length perception and sequence scheduling: An llm-empowered llm inference pipeline, 2023. URL https://arxiv.org/abs/2305.13144

work page arXiv 2023