PrefBench: Evaluating Zero-Shot LLM Agents in Hidden-Preference Personalized Pricing Negotiations

Yingjie Lei

arxiv: 2605.22855 · v1 · pith:72OE7BIUnew · submitted 2026-05-19 · 💻 cs.GT · cs.AI· cs.CL· cs.LG

PrefBench: Evaluating Zero-Shot LLM Agents in Hidden-Preference Personalized Pricing Negotiations

Yingjie Lei This is my paper

Pith reviewed 2026-05-25 00:06 UTC · model grok-4.3

classification 💻 cs.GT cs.AIcs.CLcs.LG

keywords LLM agentspersonalized pricingnegotiation benchmarkzero-shot evaluationhidden preferencesseller profitdeal ratesconcession heuristic

0 comments

The pith

Zero-shot LLM sellers reach deal rates above 0.99 but earn profits only slightly above random and far below a concession heuristic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PrefBench to test how zero-shot LLMs perform as sellers when buyer valuations, patience, and walkaway rules stay hidden from the agent. Each episode supplies only public persona and bundle details plus negotiation history, while the simulator controls the rest through fixed latent variables. Over 7500 episodes the tested models follow the required JSON protocol and close nearly every deal, yet their average profits stay close to a random baseline and well below a simple concession rule run on the identical streams. A reader would care because the result separates protocol compliance from profitable bargaining under information asymmetry. The benchmark therefore supplies a controlled way to measure whether future agents can close the profit gap without changing the hidden-information boundary.

Core claim

PrefBench evaluates zero-shot LLM sellers against heuristic references over 7500 episodes and finds that the tested LLMs follow the protocol reliably and achieve deal rates above 0.99, but their seller-profit outcomes remain weak: the best LLM average profit is only slightly above the random baseline and far below a simple concession heuristic under the same episode stream. These results show that structured action compliance and agreement-seeking behavior can coexist with weak profit-sensitive bargaining.

What carries the argument

PrefBench simulator that pairs each episode with a fixed vehicle bundle and latent buyer variables, accessed only through an LLM-facing state-summary protocol that requires strict JSON actions under a fixed hidden-information boundary.

If this is right

LLMs achieve deal rates above 0.99 while returning valid JSON actions in the required format.
The strongest LLM profit is only marginally higher than a random-action baseline under the same episodes.
A simple concession heuristic produces markedly higher seller profit than any tested LLM on the identical episode stream.
Protocol compliance and high agreement rates can occur without strong profit performance when buyer preferences remain hidden.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Agents may need explicit profit modeling or additional training signals beyond compliance to close the observed gap.
The benchmark setting could be reused to compare zero-shot performance against few-shot or fine-tuned variants on the same hidden-preference episodes.
Similar compliance-versus-outcome gaps may appear in other sequential decision domains that supply only partial state information.

Load-bearing premise

The simulator's latent buyer variables produce negotiation dynamics that are representative of the hidden-preference challenges faced by real pricing agents.

What would settle it

Running the identical 7500 episodes with human sellers or with agents explicitly optimized for profit and measuring whether their average seller profit substantially exceeds the best LLM result would settle the claim.

Figures

Figures reproduced from arXiv: 2605.22855 by Yingjie Lei.

**Figure 2.** Figure 2: Persona-bank construction. Observable descriptors are sampled from public-data-informed [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Bundle-signal construction. Fixed customization descriptors are visible to the seller, while [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: LLM-facing PrefBench evaluation loop. The LLM seller receives an observable state [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

Personalized pricing negotiations are a challenging testbed for LLM agents because successful interaction does not guarantee profitable decision making. A seller may produce valid actions and close many deals while still pricing poorly when buyer willingness to pay and bargaining traits remain hidden. This paper presents PrefBench, a simulator-based benchmark for hidden-preference personalized pricing negotiations. Each episode pairs a simulated buyer with a fixed vehicle-customization bundle; the seller observes public persona descriptors, bundle information, and negotiation history, while latent buyer variables govern valuation, patience, counter-offer behavior, and walkaway decisions. PrefBench evaluates this setting through an LLM-facing state-summary protocol that constrains agents to return strict JSON actions under a fixed hidden-information boundary. We evaluate zero-shot LLM sellers against heuristic references over 7,500 episodes. The tested LLMs follow the protocol reliably and achieve deal rates above 0.99, but their seller-profit outcomes remain weak: the best LLM average profit is only slightly above the random baseline and far below a simple concession heuristic under the same episode stream. These results show that structured action compliance and agreement-seeking behavior can coexist with weak profit-sensitive bargaining. PrefBench provides a controlled benchmark for evaluating pricing-agent behavior under hidden buyer preferences.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PrefBench is a useful new benchmark showing LLMs can follow negotiation protocols but struggle with profitable bargaining under hidden preferences.

read the letter

PrefBench is a useful new benchmark showing LLMs can follow negotiation protocols but struggle with profitable bargaining under hidden preferences. The paper sets up a simulator where each episode has a simulated buyer with hidden valuation, patience, and walkaway rules, while the LLM seller sees only public info and must respond in strict JSON. They test several zero-shot LLMs against random and concession baselines across 7500 episodes. The LLMs hit deal rates above 0.99 but their profits are only slightly above random and well below the heuristic. This is new because prior negotiation benchmarks don't focus on this hidden-preference pricing setup with a standardized action protocol. The work does well by making the comparison clean and internal to the simulator, with no circularity. The soft spot is the lack of implementation details like exact prompts and baseline code, plus no variance or tests mentioned. That leaves some uncertainty about how much the results depend on those choices. The buyer model is a controlled testbed, which is fine, but its link to real pricing agents is not tested. This paper is for researchers in LLM agents and economic decision making. A reader looking for benchmarks in negotiation with incomplete information would get value from the protocol and the baseline comparison. It shows clear thinking in the design. It deserves peer review. I would recommend sending it out.

Referee Report

0 major / 3 minor

Summary. The paper introduces PrefBench, a simulator-based benchmark for zero-shot LLM agents in hidden-preference personalized pricing negotiations. Each episode pairs a simulated buyer (with latent variables for valuation, patience, counter-offer behavior, and walkaway) against a fixed vehicle bundle; the seller sees only public persona, bundle info, and history, and must output strict JSON actions. Over 7,500 episodes the tested LLMs achieve deal rates >0.99 yet post seller profits only marginally above random and well below a simple concession heuristic under the identical episode stream.

Significance. If the empirical comparison holds, the result is significant because it cleanly separates protocol compliance from profit-sensitive bargaining under a fixed hidden-information boundary and external baselines. The fixed episode stream and reproducible simulator constitute a controlled testbed that future work can use to measure progress on profit-aware negotiation agents.

minor comments (3)

[Abstract] Abstract: the claim that 'the best LLM average profit is only slightly above the random baseline' is stated without naming the LLM, giving the numerical gap, or citing the table/figure that reports it; this should be tied to a specific result in the main text.
[Experimental protocol] The manuscript should supply the exact system and user prompts used for each LLM (including temperature and JSON schema enforcement) and the precise implementation of the concession heuristic so that the 7,500-episode comparison can be reproduced.
[Results] Table or figure reporting profits should include per-LLM means, standard deviations or confidence intervals, and the result of any statistical test against the random and concession baselines.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their constructive review and for recognizing the value of PrefBench as a controlled, reproducible testbed that cleanly separates protocol compliance from profit-sensitive bargaining. We appreciate the recommendation for minor revision.

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is an empirical benchmark paper that evaluates LLM agents against external heuristic baselines (random and concession) on a fixed set of 7,500 simulator episodes. The central claims concern observed protocol compliance and profit gaps under a described JSON action protocol and hidden-information boundary. No equations, fitted parameters renamed as predictions, self-citations, or uniqueness theorems appear in the load-bearing steps. The simulator is presented as a controlled testbed rather than a calibrated model whose parameters are derived from the results themselves. The evaluation is therefore self-contained against external references.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The evaluation rests on a domain-specific simulator whose buyer model is introduced by the paper; no free parameters are fitted to data in the reported results.

axioms (1)

domain assumption The latent buyer variables produce negotiation dynamics representative of hidden-preference pricing challenges
Invoked to justify the simulator as a valid testbed for the central claim

invented entities (1)

PrefBench simulator and JSON action protocol no independent evidence
purpose: Provide controlled episodes and constrained interface for evaluating LLM pricing agents under hidden information
Newly defined in this work; no independent evidence supplied beyond the paper's own episodes

pith-pipeline@v0.9.0 · 5748 in / 1431 out tokens · 24286 ms · 2026-05-25T00:06:41.169982+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 3 internal anchors

[1]

Semi-Parametric Contextual Pricing Algorithm using Cox Proportional Hazards Model

Young-Geun Choi, Gi-Soo Kim, Yunseo Choi, Wooseong Cho, Myunghee Cho Paik, and Min- Hwan Oh. Semi-Parametric Contextual Pricing Algorithm using Cox Proportional Hazards Model. InProceedings of the 40th International Conference on Machine Learning, pages 5771–5786. PMLR, July 2023

work page 2023
[2]

Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning: A Field Experiment

Jiaxi Liu, Yidong Zhang, Xiaoqing Wang, Yuming Deng, and Xingyu Wu. Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning: A Field Experiment. Technical Report arXiv:1912.02572, arXiv, August 2021

work page arXiv 1912
[3]

Model distillation for revenue optimization: In- terpretable personalized pricing

Max Biggs, Wei Sun, and Markus Ettl. Model distillation for revenue optimization: In- terpretable personalized pricing. InInternational Conference on Machine Learning, pages 946–956. PMLR, 2021

work page 2021
[4]

Personalized pricing and consumer welfare.Journal of Political Economy, 131(1):131–189, 2023

Jean-Pierre Dubé and Sanjog Misra. Personalized pricing and consumer welfare.Journal of Political Economy, 131(1):131–189, 2023. 13 PrefBench A PREPRINT

work page 2023
[5]

RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation

Yu Xia, Ali Arian, Sriram Narayanamoorthy, and Joshua Mabry. RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation. Technical Report arXiv:2312.14095, arXiv, December 2023

work page arXiv 2023
[6]

The First Automated Negotiating Agents Competition (ANAC 2010)

Tim Baarslag, Koen Hindriks, Catholijn Jonker, Sarit Kraus, and Raz Lin. The First Automated Negotiating Agents Competition (ANAC 2010). In Takayuki Ito, Minjie Zhang, Valentin Robu, Shaheen Fatima, and Tokuro Matsuo, editors,New Trends in Agent-Based Complex Automated Negotiations, pages 113–135. Springer, Berlin, Heidelberg, 2012. ISBN 978-3-642-24696-8...

work page doi:10.1007/978-3-642-24696-8_7 2010
[7]

Raz Lin, Sarit Kraus, Tim Baarslag, Dmytro Tykhonov, Koen Hindriks, and Catholijn M. Jonker. Genius: An Integrated Environment for Supporting the Design of Generic Au- tomated Negotiators.Computational Intelligence, 30(1):48–70, 2014. ISSN 1467-8640. doi:10.1111/j.1467-8640.2012.00463.x

work page doi:10.1111/j.1467-8640.2012.00463.x 2014
[8]

Measuring bargaining abilities of llms: A benchmark and a buyer-enhancement method

Tian Xia, Zhiwei He, Tong Ren, Yibo Miao, Zhuosheng Zhang, Yang Yang, and Rui Wang. Measuring bargaining abilities of llms: A benchmark and a buyer-enhancement method. In Findings of the Association for Computational Linguistics: ACL 2024, pages 3579–3602, 2024

work page 2024
[9]

Negotiationtom: A benchmark for stress- testing machine theory of mind on negotiation surrounding

Chunkit Chan, Jiayang Cheng, Yauwai Yim, Zheye Deng, Wei Fan, Haoran Li, Xin Liu, Hongming Zhang, Weiqi Wang, and Yangqiu Song. Negotiationtom: A benchmark for stress- testing machine theory of mind on negotiation surrounding. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 4211–4241, 2024

work page 2024
[10]

AgentBench: Evaluating LLMs as Agents

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. AgentBench: Evaluating LLMs as Agents, 2023. URL https://arxiv.org/ abs/2308.03688

work page internal anchor Pith review Pith/arXiv arXiv 2023
[11]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. ToolLLM: Facilitating Large Language Models to Master 16000+ Real-World APIs, 2023. URL https://arxiv.org/ abs/2307.16789

work page internal anchor Pith review Pith/arXiv arXiv 2023
[12]

$\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Shunyu Yao, Noah Shinn, Pedram Razavi, and Karthik Narasimhan. τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains, 2024. URL https://arxiv.org/ abs/2406.12045

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

Venktesh Pandey, Evana Wang, and Stephen D. Boyles. Deep Reinforcement Learning Algo- rithm for Dynamic Pricing of Express Lanes with Multiple Access Locations.Transportation Research Part C: Emerging Technologies, 119:102715, October 2020. ISSN 0968090X. doi:10.1016/j.trc.2020.102715

work page doi:10.1016/j.trc.2020.102715 2020
[14]

A special price just for you: Effects of personalized dynamic pricing on consumer fairness perceptions.Journal of Revenue and Pricing Management, 19(2):99–112, April 2020

Anna Priester, Thomas Robbert, and Stefan Roth. A special price just for you: Effects of personalized dynamic pricing on consumer fairness perceptions.Journal of Revenue and Pricing Management, 19(2):99–112, April 2020. ISSN 1477-657X. doi:10.1057/s41272-019- 00224-3. 14 PrefBench A PREPRINT

work page doi:10.1057/s41272-019- 2020
[15]

An Empirical Model of Automobile Engine Variant Pricing.International Journal of the Economics of Business, 24(3):275–293, September 2017

Øyvind Thomassen. An Empirical Model of Automobile Engine Variant Pricing.International Journal of the Economics of Business, 24(3):275–293, September 2017. ISSN 1357-1516. doi:10.1080/13571516.2017.1333733

work page doi:10.1080/13571516.2017.1333733 2017
[16]

Assortment planning and pricing for configurable product under sequential choice process.Management System Engineering, 1(1): 6, October 2022

Yana Wang, Zhen-Song Chen, and Xian-Jia Wang. Assortment planning and pricing for configurable product under sequential choice process.Management System Engineering, 1(1): 6, October 2022. ISSN 2731-5843. doi:10.1007/s44176-022-00002-3

work page doi:10.1007/s44176-022-00002-3 2022
[17]

NegMAS: A Platform for Au- tomated Negotiations

Yasser Mohammad, Shinji Nakadai, and Amy Greenwald. NegMAS: A Platform for Au- tomated Negotiations. In Takahiro Uchiya, Quan Bai, and Iván Marsá Maestre, editors, PRIMA 2020: Principles and Practice of Multi-Agent Systems, volume 12568, pages 343–351. Springer International Publishing, Cham, 2021. ISBN 978-3-030-69321-3 978-3-030-69322-0. doi:10.1007/978...

work page doi:10.1007/978-3-030-69322-0_23 2020
[18]

Dynamic Pricing in High-Speed Railways Using Multi- Agent Reinforcement Learning

Enrique Adrian Villarrubia-Martin, Luis Rodriguez-Benitez, David Muñoz-Valero, Giovanni Montana, and Luis Jimenez-Linares. Dynamic Pricing in High-Speed Railways Using Multi- Agent Reinforcement Learning. Technical Report arXiv:2501.08234, arXiv, September 2025

work page arXiv 2025
[19]

Census profile: United States

Census Reporter. Census profile: United States. http://censusreporter.org/profiles/01000US- united-states/, 2026

work page 2026
[20]

Census Bureau

U.S. Census Bureau. Income in the United States: 2023. https://www.census.gov/library/publications/2024/demo/p60-282.html, 2024

work page 2023
[21]

Summary of travel trends: 2022 national household travel survey

Stacey Bricka, Timothy Reuscher, Paul Schroeder, Mitchell Fisher, Justina Beard, and Xi- aoyuan Layla Sun. Summary of travel trends: 2022 national household travel survey. Technical report, Federal Highway Administration, 2024

work page 2022
[22]

Build Your Own 2026 E 350 Sedan

Mercedes-Benz USA. Build Your Own 2026 E 350 Sedan. https://www.mbusa.com/en/vehicles/build/e-class/sedan/e350w, 2026

work page 2026
[23]

Chat Completions

OpenAI. Chat Completions. OpenAI API Reference, 2026. URL https://developers. openai.com/api/reference/resources/chat

work page 2026
[24]

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence,

DeepSeek-AI. DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence,

work page
[25]

URLhttps://huggingface.co/collections/deepseek-ai/deepseek-v4

work page
[26]

Kimi K2.6: Advancing Open-Source Coding

Moonshot AI. Kimi K2.6: Advancing Open-Source Coding. Kimi Technical Blog, 2026. URL https://www.kimi.com/blog/kimi-k2-6

work page 2026
[27]

prompt_version

Qwen Team. Qwen3.6-Plus: Towards Real World Agents, April 2026. URL https://qwen. ai/blog?id=qwen3.6. 15 PrefBench A PREPRINT A Customization Scope PrefBench uses a focused Mercedes-Benz E350 Sedan customization catalog as the fixed product substrate. The catalog is derived from selected official configuration options and MSRP deltas [22], then standardiz...

work page 2026

[1] [1]

Semi-Parametric Contextual Pricing Algorithm using Cox Proportional Hazards Model

Young-Geun Choi, Gi-Soo Kim, Yunseo Choi, Wooseong Cho, Myunghee Cho Paik, and Min- Hwan Oh. Semi-Parametric Contextual Pricing Algorithm using Cox Proportional Hazards Model. InProceedings of the 40th International Conference on Machine Learning, pages 5771–5786. PMLR, July 2023

work page 2023

[2] [2]

Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning: A Field Experiment

Jiaxi Liu, Yidong Zhang, Xiaoqing Wang, Yuming Deng, and Xingyu Wu. Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning: A Field Experiment. Technical Report arXiv:1912.02572, arXiv, August 2021

work page arXiv 1912

[3] [3]

Model distillation for revenue optimization: In- terpretable personalized pricing

Max Biggs, Wei Sun, and Markus Ettl. Model distillation for revenue optimization: In- terpretable personalized pricing. InInternational Conference on Machine Learning, pages 946–956. PMLR, 2021

work page 2021

[4] [4]

Personalized pricing and consumer welfare.Journal of Political Economy, 131(1):131–189, 2023

Jean-Pierre Dubé and Sanjog Misra. Personalized pricing and consumer welfare.Journal of Political Economy, 131(1):131–189, 2023. 13 PrefBench A PREPRINT

work page 2023

[5] [5]

RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation

Yu Xia, Ali Arian, Sriram Narayanamoorthy, and Joshua Mabry. RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation. Technical Report arXiv:2312.14095, arXiv, December 2023

work page arXiv 2023

[6] [6]

The First Automated Negotiating Agents Competition (ANAC 2010)

Tim Baarslag, Koen Hindriks, Catholijn Jonker, Sarit Kraus, and Raz Lin. The First Automated Negotiating Agents Competition (ANAC 2010). In Takayuki Ito, Minjie Zhang, Valentin Robu, Shaheen Fatima, and Tokuro Matsuo, editors,New Trends in Agent-Based Complex Automated Negotiations, pages 113–135. Springer, Berlin, Heidelberg, 2012. ISBN 978-3-642-24696-8...

work page doi:10.1007/978-3-642-24696-8_7 2010

[7] [7]

Raz Lin, Sarit Kraus, Tim Baarslag, Dmytro Tykhonov, Koen Hindriks, and Catholijn M. Jonker. Genius: An Integrated Environment for Supporting the Design of Generic Au- tomated Negotiators.Computational Intelligence, 30(1):48–70, 2014. ISSN 1467-8640. doi:10.1111/j.1467-8640.2012.00463.x

work page doi:10.1111/j.1467-8640.2012.00463.x 2014

[8] [8]

Measuring bargaining abilities of llms: A benchmark and a buyer-enhancement method

Tian Xia, Zhiwei He, Tong Ren, Yibo Miao, Zhuosheng Zhang, Yang Yang, and Rui Wang. Measuring bargaining abilities of llms: A benchmark and a buyer-enhancement method. In Findings of the Association for Computational Linguistics: ACL 2024, pages 3579–3602, 2024

work page 2024

[9] [9]

Negotiationtom: A benchmark for stress- testing machine theory of mind on negotiation surrounding

Chunkit Chan, Jiayang Cheng, Yauwai Yim, Zheye Deng, Wei Fan, Haoran Li, Xin Liu, Hongming Zhang, Weiqi Wang, and Yangqiu Song. Negotiationtom: A benchmark for stress- testing machine theory of mind on negotiation surrounding. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 4211–4241, 2024

work page 2024

[10] [10]

AgentBench: Evaluating LLMs as Agents

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. AgentBench: Evaluating LLMs as Agents, 2023. URL https://arxiv.org/ abs/2308.03688

work page internal anchor Pith review Pith/arXiv arXiv 2023

[11] [11]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. ToolLLM: Facilitating Large Language Models to Master 16000+ Real-World APIs, 2023. URL https://arxiv.org/ abs/2307.16789

work page internal anchor Pith review Pith/arXiv arXiv 2023

[12] [12]

$\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Shunyu Yao, Noah Shinn, Pedram Razavi, and Karthik Narasimhan. τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains, 2024. URL https://arxiv.org/ abs/2406.12045

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

Venktesh Pandey, Evana Wang, and Stephen D. Boyles. Deep Reinforcement Learning Algo- rithm for Dynamic Pricing of Express Lanes with Multiple Access Locations.Transportation Research Part C: Emerging Technologies, 119:102715, October 2020. ISSN 0968090X. doi:10.1016/j.trc.2020.102715

work page doi:10.1016/j.trc.2020.102715 2020

[14] [14]

A special price just for you: Effects of personalized dynamic pricing on consumer fairness perceptions.Journal of Revenue and Pricing Management, 19(2):99–112, April 2020

Anna Priester, Thomas Robbert, and Stefan Roth. A special price just for you: Effects of personalized dynamic pricing on consumer fairness perceptions.Journal of Revenue and Pricing Management, 19(2):99–112, April 2020. ISSN 1477-657X. doi:10.1057/s41272-019- 00224-3. 14 PrefBench A PREPRINT

work page doi:10.1057/s41272-019- 2020

[15] [15]

An Empirical Model of Automobile Engine Variant Pricing.International Journal of the Economics of Business, 24(3):275–293, September 2017

Øyvind Thomassen. An Empirical Model of Automobile Engine Variant Pricing.International Journal of the Economics of Business, 24(3):275–293, September 2017. ISSN 1357-1516. doi:10.1080/13571516.2017.1333733

work page doi:10.1080/13571516.2017.1333733 2017

[16] [16]

Assortment planning and pricing for configurable product under sequential choice process.Management System Engineering, 1(1): 6, October 2022

Yana Wang, Zhen-Song Chen, and Xian-Jia Wang. Assortment planning and pricing for configurable product under sequential choice process.Management System Engineering, 1(1): 6, October 2022. ISSN 2731-5843. doi:10.1007/s44176-022-00002-3

work page doi:10.1007/s44176-022-00002-3 2022

[17] [17]

NegMAS: A Platform for Au- tomated Negotiations

Yasser Mohammad, Shinji Nakadai, and Amy Greenwald. NegMAS: A Platform for Au- tomated Negotiations. In Takahiro Uchiya, Quan Bai, and Iván Marsá Maestre, editors, PRIMA 2020: Principles and Practice of Multi-Agent Systems, volume 12568, pages 343–351. Springer International Publishing, Cham, 2021. ISBN 978-3-030-69321-3 978-3-030-69322-0. doi:10.1007/978...

work page doi:10.1007/978-3-030-69322-0_23 2020

[18] [18]

Dynamic Pricing in High-Speed Railways Using Multi- Agent Reinforcement Learning

Enrique Adrian Villarrubia-Martin, Luis Rodriguez-Benitez, David Muñoz-Valero, Giovanni Montana, and Luis Jimenez-Linares. Dynamic Pricing in High-Speed Railways Using Multi- Agent Reinforcement Learning. Technical Report arXiv:2501.08234, arXiv, September 2025

work page arXiv 2025

[19] [19]

Census profile: United States

Census Reporter. Census profile: United States. http://censusreporter.org/profiles/01000US- united-states/, 2026

work page 2026

[20] [20]

Census Bureau

U.S. Census Bureau. Income in the United States: 2023. https://www.census.gov/library/publications/2024/demo/p60-282.html, 2024

work page 2023

[21] [21]

Summary of travel trends: 2022 national household travel survey

Stacey Bricka, Timothy Reuscher, Paul Schroeder, Mitchell Fisher, Justina Beard, and Xi- aoyuan Layla Sun. Summary of travel trends: 2022 national household travel survey. Technical report, Federal Highway Administration, 2024

work page 2022

[22] [22]

Build Your Own 2026 E 350 Sedan

Mercedes-Benz USA. Build Your Own 2026 E 350 Sedan. https://www.mbusa.com/en/vehicles/build/e-class/sedan/e350w, 2026

work page 2026

[23] [23]

Chat Completions

OpenAI. Chat Completions. OpenAI API Reference, 2026. URL https://developers. openai.com/api/reference/resources/chat

work page 2026

[24] [24]

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence,

DeepSeek-AI. DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence,

work page

[25] [25]

URLhttps://huggingface.co/collections/deepseek-ai/deepseek-v4

work page

[26] [26]

Kimi K2.6: Advancing Open-Source Coding

Moonshot AI. Kimi K2.6: Advancing Open-Source Coding. Kimi Technical Blog, 2026. URL https://www.kimi.com/blog/kimi-k2-6

work page 2026

[27] [27]

prompt_version

Qwen Team. Qwen3.6-Plus: Towards Real World Agents, April 2026. URL https://qwen. ai/blog?id=qwen3.6. 15 PrefBench A PREPRINT A Customization Scope PrefBench uses a focused Mercedes-Benz E350 Sedan customization catalog as the fixed product substrate. The catalog is derived from selected official configuration options and MSRP deltas [22], then standardiz...

work page 2026