pith. sign in

arxiv: 2502.16810 · v6 · submitted 2025-02-24 · 💻 cs.AI · cs.CL· cs.HC· econ.GN· q-fin.EC

AI Realtor: Towards Grounded Persuasive Language Generation for Automated Copywriting

Pith reviewed 2026-05-23 02:53 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.HCecon.GNq-fin.EC
keywords real estate marketingpersuasive language generationLLM agentscopywriting automationgrounded generationhuman preference evaluationfactual accuracypersonalization
0
0 comments X

The pith

AI agent generates real estate marketing copy that buyers prefer over human expert writing while matching factual accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an agentic framework using large language models for grounded persuasive language generation in automated copywriting, focused on real estate. It employs three modules that predict marketable features, align output with buyer preferences, and enforce factual accuracy plus localized details. Systematic experiments with a focus group of potential house buyers show the AI descriptions are preferred over human expert baselines by a clear margin. A sympathetic reader would care because the result indicates a scalable way to produce targeted marketing content without losing appeal or truthfulness.

Core claim

This agent consists of three key modules: (1) Grounding Module, mimicking expert human behavior to predict marketable features; (2) Personalization Module, aligning content with user preferences; (3) Marketing Module, ensuring factual accuracy and the inclusion of localized features. The results demonstrate that marketing descriptions generated by our approach are preferred over those written by human experts by a clear margin while maintaining the same level of factual accuracy. Our findings suggest a promising agentic approach to automate large-scale targeted copywriting while ensuring factuality of content generation.

What carries the argument

Three-module agentic framework with Grounding Module to predict marketable features, Personalization Module to align with user preferences, and Marketing Module to ensure factual accuracy and localized features.

Load-bearing premise

The focus group of potential house buyers provides a representative sample of real-world preferences and the human expert baselines are a fair and unbiased comparison point.

What would settle it

A follow-up study with a larger or demographically broader buyer sample that rates the human-written descriptions as equal or superior, or that uncovers factual errors in the AI outputs.

Figures

Figures reproduced from arXiv: 2502.16810 by Chaoqi Wang, Chenghao Yang, Fei Fang, Haifeng Xu, Hao Zhu, Jibang Wu, Simon Mahns, Yi Wu.

Figure 1
Figure 1. Figure 1: Illustration of the Design Pipeline of AI Realtor. 4 THE AGENTIC IMPLEMENTATION OF AI RE A L T O R This section outlines the core design of AI Realtor, an AI agent that process multiple levels of marketing information to compose persuasive descriptions for real estate listings and actively learn to adapt its language to individual buyer preferences. At a high level, our approach operationalizes microeconom… view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the inductive feature schema construction pipeline. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of model performance using Elo ratings and win rates. Elo ratings represent [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Analyses of Simulating Human Feedback with AI Feedback. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Faithfulness Scores for Hallucination Checks. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Survey Screening Interface and ask for their ratings of importance on a 1-5 scale. We showcases the web user interfaces in [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Human Evaluation Interface B.4 FEATURE ANNOTATION INTERFACE To ease the task of feature annotation, we also develop a user-friendly web interface. Its design is shown in [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Annotation Interface C IMPLEMENTATION DETAILS In this section, we provide a full description of the implementation detail of AI Realtor. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Credibility Scores for Hallucination Checks. [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Interfaces used in the hallucination checks. [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Interfaces used in the hallucination checks. [PITH_FULL_IMAGE:figures/full_fig_p032_12.png] view at source ↗
read the original abstract

This paper develops an agentic framework that employs large language models (LLMs) for grounded persuasive language generation in automated copywriting, with real estate marketing as a focal application. Our method is designed to align the generated content with user preferences while highlighting useful factual attributes. This agent consists of three key modules: (1) Grounding Module, mimicking expert human behavior to predict marketable features; (2) Personalization Module, aligning content with user preferences; (3) Marketing Module, ensuring factual accuracy and the inclusion of localized features. We conduct systematic human-subject experiments in the domain of real estate marketing, with a focus group of potential house buyers. The results demonstrate that marketing descriptions generated by our approach are preferred over those written by human experts by a clear margin while maintaining the same level of factual accuracy. Our findings suggest a promising agentic approach to automate large-scale targeted copywriting while ensuring factuality of content generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces an agentic LLM-based framework for grounded persuasive language generation in automated copywriting, focused on real estate marketing. It consists of three modules—Grounding (to predict marketable features), Personalization (to align with user preferences), and Marketing (to ensure factual accuracy and localized features)—and reports human-subject experiments with a focus group of potential house buyers claiming that the generated descriptions are preferred over those written by human experts by a clear margin while maintaining equivalent factual accuracy.

Significance. If the empirical results hold under rigorous controls, the work would demonstrate a practical agentic approach to scalable, targeted copywriting that integrates factuality constraints with personalization, offering a template for similar applications in other marketing domains.

major comments (1)
  1. [Abstract] Abstract: the headline claim of a 'clear margin' preference for AI-generated descriptions over human experts at matched factual accuracy is presented without any reported sample size, statistical tests, blinding procedures, rater demographics, recruitment criteria, or protocol for verifying factual accuracy, rendering the central empirical result unverifiable from the provided information.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review and constructive comment on the abstract. We address the point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim of a 'clear margin' preference for AI-generated descriptions over human experts at matched factual accuracy is presented without any reported sample size, statistical tests, blinding procedures, rater demographics, recruitment criteria, or protocol for verifying factual accuracy, rendering the central empirical result unverifiable from the provided information.

    Authors: We agree that the abstract as written does not include these methodological details, which are reported in Section 4 of the full manuscript. To improve verifiability at the abstract level, we will revise the abstract to incorporate key experimental parameters (e.g., participant count, preference margin with statistical support, and confirmation of factual accuracy protocol) while preserving length constraints. revision: yes

Circularity Check

0 steps flagged

Empirical human-evaluation study with no derivation chain or fitted predictions

full rationale

The paper presents an agentic framework with three descriptive modules (Grounding, Personalization, Marketing) and reports results from human-subject experiments on real-estate copywriting. No equations, first-principles derivations, parameter-fitting procedures, or predictions are described in the abstract or provided text. Central claims rest on external human preference and accuracy judgments rather than any internal reduction, self-definition, or self-citation chain. This is a standard empirical study whose validity can be assessed against the reported experimental protocol; no load-bearing step reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on domain assumptions about LLM capabilities rather than new mathematical constructs; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Large language models can be prompted to mimic expert human behavior in predicting marketable features.
    Underpins the Grounding Module as described in the abstract.
  • domain assumption Generated content can be aligned with user preferences via a dedicated personalization step while preserving factual accuracy.
    Central to the Personalization and Marketing Modules.

pith-pipeline@v0.9.0 · 5720 in / 1241 out tokens · 57686 ms · 2026-05-23T02:53:52.400951+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 6 internal anchors

  1. [1]

    The market for “lemons”: Quality uncertainty and the market mechanism

    George A Akerlof. The market for “lemons”: Quality uncertainty and the market mechanism. In Uncertainty in economics, pp.\ 235--251. Elsevier, 1978

  2. [2]

    Causal alignment: Augmenting language models with a/b tests

    Panagiotis Angelopoulos, Kevin Lee, and Sanjog Misra. Causal alignment: Augmenting language models with a/b tests. Available at SSRN, 2024

  3. [3]

    Claude 3.5 sonnet, 2024

    Anthropic . Claude 3.5 sonnet, 2024. URL https://www.anthropic.com/news/claude-3-5-sonnet. AI language model

  4. [4]

    Persuasion is now 30 per cent of us gdp: Revisiting mccloskey and klamer after a quarter of a century

    Gerry Antioch. Persuasion is now 30 per cent of us gdp: Revisiting mccloskey and klamer after a quarter of a century. Economic Round-up, 0 (1): 0 1--10, 2013

  5. [5]

    The economics of information: An exposition

    Kenneth J Arrow. The economics of information: An exposition. Empirica, 23 0 (2): 0 119--128, 1996

  6. [6]

    Constitutional AI: Harmlessness from AI Feedback

    Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073, 2022

  7. [7]

    Information design: A unified perspective

    Dirk Bergemann and Stephen Morris. Information design: A unified perspective. Journal of Economic Literature, 57 0 (1): 0 44--95, 2019

  8. [8]

    The limits of price discrimination

    Dirk Bergemann, Benjamin Brooks, and Stephen Morris. The limits of price discrimination. American Economic Review, 105 0 (3): 0 921--957, 2015

  9. [9]

    How Well Can

    Federico Bianchi, Patrick John Chia, Mert Yuksekgonul, Jacopo Tagliabue, Dan Jurafsky, and James Zou. How well can llms negotiate? negotiationarena platform and analysis. arXiv preprint arXiv:2402.05863, 2024

  10. [10]

    O'Reilly Media, Inc

    Steven Bird, Ewan Klein, and Edward Loper. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.", 2009

  11. [11]

    The persuasive power of large language models

    Simon Martin Breum, Daniel V dele Egdal, Victor Gram Mortensen, Anders Giovanni M ller, and Luca Maria Aiello. The persuasive power of large language models. In Proceedings of the International AAAI Conference on Web and Social Media, volume 18, pp.\ 152--163, 2024

  12. [12]

    The emergence of economic rationality of gpt

    Yiting Chen, Tracy Xiao Liu, You Shan, and Songfa Zhong. The emergence of economic rationality of gpt. Proceedings of the National Academy of Sciences, 120 0 (51): 0 e2316205120, 2023

  13. [13]

    Signaling theory: A review and assessment

    Brian L Connelly, S Trevis Certo, R Duane Ireland, and Christopher R Reutzel. Signaling theory: A review and assessment. Journal of management, 37 0 (1): 0 39--67, 2011

  14. [14]

    Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

    Carson Denison, Monte MacDiarmid, Fazl Barez, David Duvenaud, Shauna Kravec, Samuel Marks, Nicholas Schiefer, Ryan Soklaski, Alex Tamkin, Jared Kaplan, et al. Sycophancy to subterfuge: Investigating reward-tampering in large language models. arXiv preprint arXiv:2406.10162, 2024

  15. [15]

    On product uncertainty in online markets: Theory and evidence

    Angelika Dimoka, Yili Hong, and Paul A Pavlou. On product uncertainty in online markets: Theory and evidence. MIS quarterly, pp.\ 395--426, 2012

  16. [16]

    Measuring the persuasiveness of language models, 2024

    Esin Durmus, Liane Lovitt, Alex Tamkin, Stuart Ritchie, Jack Clark, and Deep Ganguli. Measuring the persuasiveness of language models, 2024

  17. [17]

    The proposed uscf rating system, its development, theory, and applications

    Arpad E Elo. The proposed uscf rating system, its development, theory, and applications. Chess life, 22 0 (8): 0 242--247, 1967

  18. [18]

    Suspense and surprise

    Jeffrey Ely, Alexander Frankel, and Emir Kamenica. Suspense and surprise. Journal of Political Economy, 123 0 (1): 0 215--260, 2015

  19. [19]

    How persuasive is ai-generated propaganda? PNAS nexus, 3 0 (2): 0 pgae034, 2024

    Josh A Goldstein, Jason Chao, Shelby Grossman, Alex Stamos, and Michael Tomz. How persuasive is ai-generated propaganda? PNAS nexus, 3 0 (2): 0 pgae034, 2024

  20. [20]

    The informational role of warranties and private disclosure about product quality

    Sanford J Grossman. The informational role of warranties and private disclosure about product quality. The Journal of law and Economics, 24 0 (3): 0 461--483, 1981

  21. [21]

    Evaluating the persuasive influence of political microtargeting with large language models

    Kobi Hackenburg and Helen Margetts. Evaluating the persuasive influence of political microtargeting with large language models. Proceedings of the National Academy of Sciences, 121 0 (24): 0 e2403116121, 2024

  22. [22]

    Tappin, Paul R ¨ottger, Scott Hale, Jonathan Bright, and Helen Margetts

    Kobi Hackenburg, Ben M Tappin, Paul R \"o ttger, Scott Hale, Jonathan Bright, and Helen Margetts. Evidence of a log scaling law for political persuasion with large language models. arXiv preprint arXiv:2406.14508, 2024

  23. [23]

    GPT-4o System Card

    Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024

  24. [24]

    OpenAI o1 System Card

    Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. Openai o1 system card. arXiv preprint arXiv:2412.16720, 2024

  25. [25]

    Mixtral of Experts

    Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024

  26. [26]

    Bayesian persuasion

    Emir Kamenica and Matthew Gentzkow. Bayesian persuasion. American Economic Review, 101 0 (6): 0 2590--2615, 2011

  27. [27]

    R., Rocktäschel, T., and Perez, E

    Akbir Khan, John Hughes, Dan Valentine, Laura Ruis, Kshitij Sachan, Ansh Radhakrishnan, Edward Grefenstette, Samuel R Bowman, Tim Rockt \"a schel, and Ethan Perez. Debating with more persuasive llms leads to more truthful answers. arXiv preprint arXiv:2402.06782, 2024

  28. [28]

    Signalling to experts

    Pablo Kurlat and Florian Scheuer. Signalling to experts. The Review of Economic Studies, 88 0 (2): 0 800--850, 2021

  29. [29]

    Asymmetric information, adverse selection and online disclosure: The case of ebay motors

    Gregory Lewis. Asymmetric information, adverse selection and online disclosure: The case of ebay motors. American Economic Review, 101 0 (4): 0 1535--1546, 2011

  30. [30]

    u ttler, Mike Lewis, Wen-tau Yih, Tim Rockt \

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K \"u ttler, Mike Lewis, Wen-tau Yih, Tim Rockt \"a schel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33: 0 9459--9474, 2020

  31. [31]

    Viral marketing: The use of surprise

    Adam Lindgreen and Joelle Vanhamme. Viral marketing: The use of surprise. Advances in electronic marketing, pp.\ 122--138, 2005

  32. [32]

    o fgren, Torsten Persson, and J \

    Karl-Gustaf L \"o fgren, Torsten Persson, and J \"o rgen W Weibull. Markets with asymmetric information: the contributions of george akerlof, michael spence and joseph stiglitz. The Scandinavian Journal of Economics, pp.\ 195--211, 2002

  33. [33]

    Hypersuasion--on ai’s persuasive power and how to deal with it

    Floridi Luciano. Hypersuasion--on ai’s persuasive power and how to deal with it. Philosophy & Technology, 37 0 (2): 0 1--10, 2024

  34. [34]

    Surprise as a design strategy

    Geke DS Ludden, Hendrik NJ Schifferstein, and Paul Hekkert. Surprise as a design strategy. Design Issues, 24 0 (2): 0 28--38, 2008

  35. [35]

    The potential of generative ai for personalized persuasion at scale

    SC Matz, JD Teeny, Sumer S Vaid, H Peters, GM Harari, and M Cerf. The potential of generative ai for personalized persuasion at scale. Scientific Reports, 14 0 (1): 0 4692, 2024

  36. [36]

    Sfrembedding-mistral: enhance text retrieval with transfer learning

    Rui Meng, Ye Liu, Shafiq Rayhan Joty, Caiming Xiong, Yingbo Zhou, and Semih Yavuz. Sfrembedding-mistral: enhance text retrieval with transfer learning. Salesforce AI Research Blog, 3, 2024

  37. [37]

    FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

    Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Koh, Mohit Iyyer, Luke Zettlemoyer, and Hannaneh Hajishirzi. FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.\ 12076--12100, 2023

  38. [38]

    Gpt-4o, 2024 a

    OpenAI. Gpt-4o, 2024 a . Available at: https://openai.com/index/hello-gpt-4o/

  39. [39]

    Gpt-4o mini: Advancing cost-efficient intelligence, July 2024 b

    OpenAI. Gpt-4o mini: Advancing cost-efficient intelligence, July 2024 b . URL https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/. Accessed: 2024-09-19

  40. [40]

    Steer: Assessing the economic rationality of large language models

    Narun Krishnamurthi Raman, Taylor Lundy, Samuel Joseph Amouyal, Yoav Levine, Kevin Leyton-Brown, and Moshe Tennenholtz. Steer: Assessing the economic rationality of large language models. In Forty-first International Conference on Machine Learning, 2024

  41. [41]

    On the Conversational Persuasive- ness of Large Language Models: A Randomized Controlled Trial, March 2024

    Francesco Salvi, Manoel Horta Ribeiro, Riccardo Gallotti, and Robert West. On the conversational persuasiveness of large language models: A randomized controlled trial. arXiv preprint arXiv:2403.14380, 2024

  42. [42]

    Towards Understanding Sycophancy in Language Models

    Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R Johnston, et al. Towards understanding sycophancy in language models. arXiv preprint arXiv:2310.13548, 2023

  43. [43]

    Measuring and improving persuasiveness of large language models

    Somesh Singh, Yaman K Singla, Harini SI, and Balaji Krishnamurthy. Measuring and improving persuasiveness of large language models. arXiv preprint arXiv:2410.02653, 2024

  44. [44]

    Job market signaling

    Michael Spence. Job market signaling. In Uncertainty in economics, pp.\ 281--306. Elsevier, 1978

  45. [45]

    screening,

    Joseph E Stiglitz. The theory of" screening," education, and the distribution of income. The American economic review, 65 0 (3): 0 283--300, 1975

  46. [46]

    Can gpt-4 sway experts’ investment decisions? In Findings of the Association for Computational Linguistics: NAACL 2025, pp.\ 374--383, 2025

    Takehiro Takayanagi, Hiroya Takamura, Kiyoshi Izumi, and Chung-Chi Chen. Can gpt-4 sway experts’ investment decisions? In Findings of the Association for Computational Linguistics: NAACL 2025, pp.\ 374--383, 2025

  47. [47]

    Winning arguments: Interaction dynamics and persuasion strategies in good-faith online discussions

    Chenhao Tan, Vlad Niculae, Cristian Danescu-Niculescu-Mizil, and Lillian Lee. Winning arguments: Interaction dynamics and persuasion strategies in good-faith online discussions. In Proceedings of the 25th international conference on world wide web, pp.\ 613--624, 2016

  48. [48]

    Chatgpt helped me save \ 50k buying/selling a house

    Reddit User. Chatgpt helped me save \ 50k buying/selling a house. https://www.reddit.com/r/ChatGPT/comments/12z8g3l/chatgpt_helped_me_save_50k_buyingselling_a_house/, 2023. [Online; posted April 27, 2023]

  49. [49]

    Artificial intelligence can persuade humans on political issues

    Jan G Voelkel, Robb Willer, et al. Artificial intelligence can persuade humans on political issues. 2023

  50. [50]

    Learning personalized alignment for evaluating open-ended text generation

    Danqing Wang, Kevin Yang, Hanlin Zhu, Xiaomeng Yang, Andrew Cohen, Lei Li, and Yuandong Tian. Learning personalized alignment for evaluating open-ended text generation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp.\ 13274--13292, 2024

  51. [51]

    Persuasion for good: Towards a personalized persuasive dialogue system for social good

    Xuewei Wang, Weiyan Shi, Richard Kim, Yoojung Oh, Sijia Yang, Jingwen Zhang, and Zhou Yu. Persuasion for good: Towards a personalized persuasive dialogue system for social good. arXiv preprint arXiv:1906.06725, 2019

  52. [52]

    Chain-of-thought prompting elicits reasoning in large language models

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35: 0 24824--24837, 2022

  53. [53]

    Is this post persuasive? ranking argumentative comments in online forum

    Zhongyu Wei, Yang Liu, and Yi Li. Is this post persuasive? ranking argumentative comments in online forum. In Katrin Erk and Noah A. Smith (eds.), Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp.\ 195--200, Berlin, Germany, August 2016. Association for Computational Linguistics. doi:10.1...

  54. [54]

    Travelplanner: A benchmark for real-world planning with language agents

    Jian Xie, Kai Zhang, Jiangjie Chen, Tinghui Zhu, Renze Lou, Yuandong Tian, Yanghua Xiao, and Yu Su. Travelplanner: A benchmark for real-world planning with language agents. arXiv preprint arXiv:2402.01622, 2024

  55. [55]

    Webshop: Towards scalable real-world web interaction with grounded language agents

    Shunyu Yao, Howard Chen, John Yang, and Karthik Narasimhan. Webshop: Towards scalable real-world web interaction with grounded language agents. Advances in Neural Information Processing Systems, 35: 0 20744--20757, 2022

  56. [56]

    Judging llm-as-a-judge with mt-bench and chatbot arena

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36: 0 46595--46623, 2023

  57. [57]

    Sotopia: Interactive evaluation for social intelligence in language agents

    Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Zhengyang Qi, Haofei Yu, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, and Maarten Sap. Sotopia: Interactive evaluation for social intelligence in language agents. 2024. URL https://openreview.net/forum?id=mM7VurbA4r

  58. [58]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  59. [59]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  60. [60]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  61. [61]

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...