arxiv: 2604.07028 · v1 · submitted 2026-04-08 · 💻 cs.MA · cs.AI· cs.CL

Recognition: unknown

Strategic Persuasion with Trait-Conditioned Multi-Agent Systems for Iterative Legal Argumentation

Philipp D. Siedler

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:12 UTC · model grok-4.3

classification 💻 cs.MA cs.AIcs.CL

keywords multi-agent systemslegal argumentationlarge language modelspersuasionreinforcement learningtrait conditioningsimulationstrategic interaction

0 comments

The pith

Heterogeneous teams of trait-conditioned language model agents outperform uniform groups in simulated legal arguments, and a learned orchestrator finds even better strategies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a simulation where prosecution and defense teams of AI agents, each given specific personality traits like quantitative or charismatic, argue legal cases over multiple rounds. It shows that mixing different traits leads to better outcomes than all agents having the same traits, that too many or too few rounds of argument can hurt consistency, and that some traits matter more than others. A new component uses reinforcement learning to pick the best traits for the defense based on the case and the other side, beating pre-set combinations. This matters because it treats talking and persuading as a strategic tool that can be automated and improved in adversarial settings like law or negotiation.

Core claim

In the Strategic Courtroom Framework, teams of large language models conditioned on nine interpretable traits engage in iterative legal argumentation across 10 synthetic cases. Results from over 7,000 trials indicate that diverse trait combinations in teams produce higher success rates than homogeneous ones, moderate interaction rounds stabilize verdicts, and traits such as quantitative reasoning and charisma drive disproportionate persuasive power. The reinforcement-learning Trait Orchestrator dynamically selects defense traits to counter the prosecution, yielding strategies superior to static human-designed sets.

What carries the argument

The Strategic Courtroom Framework, a multi-agent environment where LLM agents are conditioned on nine traits grouped into four archetypes to control rhetorical style and strategy, paired with a reinforcement-learning Trait Orchestrator that generates adaptive defense traits.

If this is right

Teams mixing complementary traits achieve higher win rates than single-trait teams in the simulated trials.
Arguments that run for a moderate number of rounds produce more consistent final verdicts than very short or very long exchanges.
Agents with quantitative and charismatic traits contribute more to overall team success than other trait types.
The reinforcement-learning orchestrator identifies trait combinations that outperform any fixed set of human-chosen traits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could be tested in other adversarial language domains such as diplomacy or business negotiation to see if trait diversity remains advantageous.
Future work might replace synthetic verdicts with judgments from actual legal experts to check if the simulated advantages hold in real disputes.
By making persuasion traits adjustable, the system opens the possibility of training agents that adapt their rhetorical approach mid-debate rather than using a single fixed profile.

Load-bearing premise

The assumption that assigning nine specific traits to large language models will produce reliable and controllable differences in how they argue, and that the resulting simulated verdicts accurately reflect what would happen in actual legal persuasion.

What would settle it

Run the same simulated cases with human lawyers playing the trait-conditioned roles and compare whether the trait effects and orchestrator advantages appear in the human verdicts or outcomes.

Figures

Figures reproduced from arXiv: 2604.07028 by Philipp D. Siedler.

**Figure 1.** Figure 1: Overview of the Strategic Courtroom Framework. Prosecution and defense teams generate opening statements, engage [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Normalized trait-importance scores derived from [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Cumulative reward of the RL-based Trait Orches [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Cumulative judge confidence over training episodes [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Cumulative defense win rate over training episodes [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Trait confidence analysis for Single Agent, 1 Trait, 1 Round using DeepSeek-R1. Left: Average judge confidence when [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Elo rankings of traits under Single Agent, 1 Trait, 1 Round using DeepSeek-R1, shown for overall performance, [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 9.** Figure 9: Comparison of trait effectiveness across agent roles under Single Agent, 1 Trait, 1 Round using DeepSeek-R1, showing prosecution versus defense Elo for each trait [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: Trait confidence analysis for Single Agent, 1 Trait, 2 Rounds using DeepSeek-R1. Left: Average judge confidence [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗

**Figure 11.** Figure 11: Elo rankings of traits under Single Agent, 1 Trait, 2 Rounds using DeepSeek-R1, shown for overall performance, [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

**Figure 13.** Figure 13: Comparison of trait effectiveness across agent roles under Single Agent, 1 Trait, 2 Rounds using DeepSeek-R1, showing prosecution versus defense Elo for each trait [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗

**Figure 14.** Figure 14: Trait confidence analysis for Single Agent, 1 Trait, 3 Rounds using DeepSeek-R1. Left: Average judge confidence [PITH_FULL_IMAGE:figures/full_fig_p015_14.png] view at source ↗

**Figure 15.** Figure 15: Elo rankings of traits under Single Agent, 1 Trait, 3 Rounds using DeepSeek-R1, shown for overall performance, [PITH_FULL_IMAGE:figures/full_fig_p015_15.png] view at source ↗

**Figure 17.** Figure 17: Comparison of trait effectiveness across agent roles under Single Agent, 1 Trait, 3 Rounds using DeepSeek-R1, showing prosecution versus defense Elo for each trait [PITH_FULL_IMAGE:figures/full_fig_p015_17.png] view at source ↗

**Figure 18.** Figure 18: Trait confidence analysis for Single Agent, 2 Traits, 3 Rounds using Gemini-2.5-Pro. Left: Average judge confidence [PITH_FULL_IMAGE:figures/full_fig_p016_18.png] view at source ↗

**Figure 19.** Figure 19: Elo rankings of traits under Single Agent, 2 Traits, 3 Rounds using Gemini-2.5-Pro, shown for overall performance, [PITH_FULL_IMAGE:figures/full_fig_p016_19.png] view at source ↗

**Figure 21.** Figure 21: Comparison of trait effectiveness across agent roles under Single Agent, 2 Traits, 3 Rounds using Gemini-2.5-Pro, showing prosecution versus defense Elo for each trait [PITH_FULL_IMAGE:figures/full_fig_p016_21.png] view at source ↗

**Figure 22.** Figure 22: Trait confidence analysis for Team, 1 Trait, 2 Rounds using DeepSeek-R1. Left: Average judge confidence when a trait [PITH_FULL_IMAGE:figures/full_fig_p017_22.png] view at source ↗

**Figure 23.** Figure 23: Elo rankings of traits under Team, 1 Trait, 2 Rounds using DeepSeek-R1, shown for overall performance, prosecution [PITH_FULL_IMAGE:figures/full_fig_p017_23.png] view at source ↗

**Figure 25.** Figure 25: Comparison of trait effectiveness across agent roles under Team, 1 Trait, 2 Rounds using DeepSeek-R1, showing prosecution versus defense Elo for each trait [PITH_FULL_IMAGE:figures/full_fig_p017_25.png] view at source ↗

**Figure 26.** Figure 26: Trait confidence analysis for Team, 1 Trait, 3 Rounds using DeepSeek-R1. Left: Average judge confidence when a trait [PITH_FULL_IMAGE:figures/full_fig_p018_26.png] view at source ↗

**Figure 27.** Figure 27: Elo rankings of traits under Team, 1 Trait, 3 Rounds using DeepSeek-R1, shown for overall performance, prosecution [PITH_FULL_IMAGE:figures/full_fig_p018_27.png] view at source ↗

**Figure 29.** Figure 29: Comparison of trait effectiveness across agent roles under Team, 1 Trait, 3 Rounds using DeepSeek-R1, showing prosecution versus defense Elo for each trait [PITH_FULL_IMAGE:figures/full_fig_p018_29.png] view at source ↗

**Figure 30.** Figure 30: Trait confidence analysis for Team, 2 Traits, 3 Rounds using Gemini-2.5-Pro. Left: Average judge confidence when a [PITH_FULL_IMAGE:figures/full_fig_p019_30.png] view at source ↗

**Figure 31.** Figure 31: Elo rankings of traits under Team, 2 Traits, 3 Rounds using Gemini-2.5-Pro, shown for overall performance, [PITH_FULL_IMAGE:figures/full_fig_p019_31.png] view at source ↗

**Figure 33.** Figure 33: Comparison of trait effectiveness across agent roles under Team, 2 Traits, 3 Rounds using Gemini2.5-Pro, showing prosecution versus defense Elo for each trait [PITH_FULL_IMAGE:figures/full_fig_p019_33.png] view at source ↗

read the original abstract

Strategic interaction in adversarial domains such as law, diplomacy, and negotiation is mediated by language, yet most game-theoretic models abstract away the mechanisms of persuasion that operate through discourse. We present the Strategic Courtroom Framework, a multi-agent simulation environment in which prosecution and defense teams composed of trait-conditioned Large Language Model (LLM) agents engage in iterative, round-based legal argumentation. Agents are instantiated using nine interpretable traits organized into four archetypes, enabling systematic control over rhetorical style and strategic orientation. We evaluate the framework across 10 synthetic legal cases and 84 three-trait team configurations, totaling over 7{,}000 simulated trials using DeepSeek-R1 and Gemini~2.5~Pro. Our results show that heterogeneous teams with complementary traits consistently outperform homogeneous configurations, that moderate interaction depth yields more stable verdicts, and that certain traits (notably quantitative and charismatic) contribute disproportionately to persuasive success. We further introduce a reinforcement-learning-based Trait Orchestrator that dynamically generates defense traits conditioned on the case and opposing team, discovering strategies that outperform static, human-designed trait combinations. Together, these findings demonstrate how language can be treated as a first-class strategic action space and provide a foundation for building autonomous agents capable of adaptive persuasion in multi-agent environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a trait-conditioned multi-agent LLM simulation for legal arguments plus an RL orchestrator, but the results rest on untested assumptions that the traits actually produce consistent behavioral differences.

read the letter

The main thing here is a simulation framework that puts LLM agents with nine assigned traits into round-based legal debates, then compares team mixes and adds an RL system to pick traits dynamically for the defense side. They ran it on 10 synthetic cases with 84 configurations and over 7000 trials using DeepSeek-R1 and Gemini 2.5 Pro, reporting that mixed-trait teams beat uniform ones, moderate round counts give steadier outcomes, and traits like quantitative and charismatic carry more weight. The RL orchestrator also beats fixed human-designed combinations. That setup and the scale of the runs are the concrete new pieces; the specific trait-archetype system and its application to iterative courtroom-style persuasion do not appear in the cited prior work. The experiments give direct head-to-head numbers on heterogeneous versus homogeneous teams and on static versus learned trait selection, which is useful data even if the absolute numbers are simulation-specific. The soft spots are real and central. Nothing in the reported results checks whether the trait prompts actually produce distinguishable, repeatable rhetorical or strategic behavior; there are no adherence metrics, embedding distances, or human ratings of the generated arguments. Without that, the performance gaps across configurations could come from prompt sensitivity or model noise rather than the intended trait effects. The RL superiority is also partly built into the loop since it optimizes inside the same environment it evaluates. Synthetic cases and LLM-generated verdicts add another layer of distance from real legal outcomes. This is for researchers working on multi-agent LLM systems or persuasion modeling who want a concrete testbed and some empirical patterns to build on. It is worth sending to peer review because the framework and the volume of trials give referees something substantive to examine, even though the validation gaps will need direct fixes before the claims can be taken as settled.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Strategic Courtroom Framework, a multi-agent simulation environment where prosecution and defense teams of trait-conditioned LLM agents engage in iterative legal argumentation. Across 10 synthetic legal cases and 84 three-trait team configurations (over 7,000 trials using DeepSeek-R1 and Gemini 2.5 Pro), the authors report that heterogeneous teams with complementary traits outperform homogeneous configurations, moderate interaction depth produces more stable verdicts, traits such as quantitative and charismatic contribute disproportionately to success, and a reinforcement-learning Trait Orchestrator dynamically generates superior defense traits compared to static human-designed combinations.

Significance. If validated, the work offers a promising approach to modeling strategic persuasion in language-based adversarial domains, with implications for autonomous agents in law, negotiation, and diplomacy. The scale of the empirical evaluation (thousands of trials) and the introduction of an adaptive RL component are notable strengths that could advance the field of multi-agent systems. However, the absence of direct validation for the trait-conditioning mechanism substantially reduces the current impact, as the reported performance differences may not be attributable to the intended traits.

major comments (2)

[Abstract] Abstract: The abstract claims results on verdicts and trait importance but provides no details on how verdicts are determined from the argumentation rounds, how LLM stochasticity or bias is controlled across trials, or whether post-hoc analysis affected the reported trait contributions. This information is essential for evaluating the soundness of the empirical findings.
[Trait Conditioning and Experimental Design] Trait Conditioning and Experimental Design: The central results on heterogeneous team superiority, trait-specific contributions, and RL orchestrator performance all depend on the assumption that the nine traits produce consistent and controllable changes in agent behavior. No quantitative validation of trait adherence is reported, such as accuracy of post-generation trait classification, embedding-based separation between conditions, or human evaluation of rhetorical features. Without this, differences across the 84 configurations could arise from prompt sensitivity or case artifacts rather than strategic trait effects.

minor comments (2)

[Abstract] The abstract mentions 'over 7,000 simulated trials' but should include a summary table or reference to the exact distribution across the 84 configurations and 10 cases for clarity.
[Methods] The paper should provide the exact prompting templates and hyperparameter settings (e.g., temperature) used for trait conditioning to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of transparency and validation that we will address through targeted revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract claims results on verdicts and trait importance but provides no details on how verdicts are determined from the argumentation rounds, how LLM stochasticity or bias is controlled across trials, or whether post-hoc analysis affected the reported trait contributions. This information is essential for evaluating the soundness of the empirical findings.

Authors: We agree that the abstract would benefit from greater specificity on these points. In the revised version, we will expand the abstract to note that verdicts are reached by majority vote of three independent LLM judges after a fixed number of argumentation rounds, that stochasticity is mitigated by averaging results over multiple independent trials per configuration (with standard deviations reported), and that trait contributions were quantified via post-hoc regression analysis controlling for case and model effects. Corresponding details will be added to the methods section for full reproducibility. revision: yes
Referee: [Trait Conditioning and Experimental Design] Trait Conditioning and Experimental Design: The central results on heterogeneous team superiority, trait-specific contributions, and RL orchestrator performance all depend on the assumption that the nine traits produce consistent and controllable changes in agent behavior. No quantitative validation of trait adherence is reported, such as accuracy of post-generation trait classification, embedding-based separation between conditions, or human evaluation of rhetorical features. Without this, differences across the 84 configurations could arise from prompt sensitivity or case artifacts rather than strategic trait effects.

Authors: We acknowledge this as a substantive limitation in the current manuscript. The original submission relied on indirect evidence from systematic performance variation across 84 configurations and two distinct LLM backbones. To directly address the concern, we will add a new validation subsection that includes: (i) cosine-distance analysis of sentence embeddings to quantify separation between trait conditions, (ii) accuracy of a post-hoc trait classifier on generated arguments, and (iii) a small human evaluation of rhetorical features in sampled outputs. These additions will provide quantitative support that observed differences arise from the intended trait effects. revision: yes

Circularity Check

1 steps flagged

RL Trait Orchestrator outperformance reduces to fitting within the same simulation loop

specific steps

fitted input called prediction [Abstract and Trait Orchestrator description]
"We further introduce a reinforcement-learning-based Trait Orchestrator that dynamically generates defense traits conditioned on the case and opposing team, discovering strategies that outperform static, human-designed trait combinations."

The orchestrator learns a policy from the exact multi-agent simulation outcomes it produces. Declaring that the learned policy 'outperforms static, human-designed trait combinations' is therefore a statement about performance on data generated by the same closed simulation loop rather than an out-of-sample or externally validated prediction.

full rationale

The paper's central empirical claims (heterogeneous teams outperforming homogeneous ones, trait contributions, and moderate depth stability) are direct contrasts across 84 configurations and 7000 trials and do not reduce to the inputs by construction. The only load-bearing circular element is the RL-based Trait Orchestrator: it is trained on simulation outcomes generated by the same trait-conditioned agents and environment, then presented as discovering superior strategies. This matches the fitted-input-called-prediction pattern but is isolated to one component; the remainder of the derivation chain remains independent.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The central claims rest on assumptions that LLM outputs under trait conditioning produce stable, interpretable strategic behavior and that synthetic cases capture essential persuasion dynamics; no independent evidence for these mappings is provided.

free parameters (2)

Trait definitions and prompting weights
Nine traits and four archetypes are introduced to condition agent behavior; their exact mapping to LLM prompts is not derived from external data.
RL reward function and hyperparameters
The orchestrator is trained to maximize persuasive success within the simulation; reward shaping and training details are free parameters fitted to the generated trials.

axioms (2)

domain assumption LLM agents conditioned on the nine traits exhibit consistent and controllable rhetorical styles that map to real persuasion mechanisms
Invoked to justify treating simulation outcomes as evidence of strategic effectiveness.
domain assumption Verdicts produced by the simulation environment are a valid proxy for legal strategic success
Required for the reported performance differences to be meaningful.

invented entities (2)

Strategic Courtroom Framework no independent evidence
purpose: Multi-agent environment for iterative legal argumentation with trait-conditioned agents
Newly defined simulation system; no external validation provided.
Trait Orchestrator no independent evidence
purpose: RL agent that dynamically selects defense traits conditioned on case and opponent
Introduced to discover superior strategies; trained inside the same simulation loop.

pith-pipeline@v0.9.0 · 5523 in / 1689 out tokens · 52862 ms · 2026-05-10T17:12:29.744992+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 14 canonical work pages · 3 internal anchors

[1]

Using machine learning to predict decisions of the European Court of Human Rights | Artificial Intelligence and Law | Springer Nature Link

[n.d.]. Using machine learning to predict decisions of the European Court of Human Rights | Artificial Intelligence and Law | Springer Nature Link. https: //link.springer.com/article/10.1007/s10506-019-09255-y

work page doi:10.1007/s10506-019-09255-y
[2]

Leila Amgoud and Henri Prade. 2009. Using arguments for making and explaining decisions.Artificial Intelligence173, 3 (March 2009), 413–436. https://doi.org/10. 1016/j.artint.2008.11.006

2009
[3]

Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. 2020. LEGAL-BERT: The Muppets straight out of Law School. https://doi.org/10.48550/arXiv.2010.02559 arXiv:2010.02559 [cs]

work page doi:10.48550/arxiv.2010.02559 2020
[4]

Crawford and Joel Sobel

Vincent P. Crawford and Joel Sobel. 1982. Strategic Information Transmission. Econometrica50, 6 (1982), 1431–1451. https://doi.org/10.2307/1913390 Publisher: [Wiley, Econometric Society]

work page doi:10.2307/1913390 1982
[5]

Sil Hamilton. 2023. Blind Judgement: Agent-Based Supreme Court Modelling With GPT. https://doi.org/10.48550/arXiv.2301.05327 arXiv:2301.05327 [cs]

work page doi:10.48550/arxiv.2301.05327 2023
[6]

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber
[7]

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. https://doi.org/10.48550/arXiv.2308.00352 arXiv:2308.00352 [cs]

work page internal anchor Pith review doi:10.48550/arxiv.2308.00352
[8]

Emir Kamenica and Matthew Gentzkow. 2011. Bayesian Persuasion.American Economic Review101, 6 (Oct. 2011), 2590–2615. https://doi.org/10.1257/aer.101.6. 2590

work page doi:10.1257/aer.101.6 2011
[9]

Daniel Martin Katz, Michael James Bommarito, Shang Gao, and Pablo Arredondo
[10]

https://doi.org/10.2139/ssrn.4389233

GPT-4 Passes the Bar Exam. https://doi.org/10.2139/ssrn.4389233

work page doi:10.2139/ssrn.4389233
[11]

John Lawrence and Chris Reed. 2019. Argument Mining: A Survey.Computational Linguistics45, 4 (Dec. 2019), 765–818. https://doi.org/10.1162/coli_a_00364

work page doi:10.1162/coli_a_00364 2019
[12]

Spithourakis, Jianfeng Gao, and Bill Dolan

Jiwei Li, Michel Galley, Chris Brockett, Georgios P. Spithourakis, Jianfeng Gao, and Bill Dolan. 2016. A Persona-Based Neural Conversation Model. https: //doi.org/10.48550/arXiv.1603.06155 arXiv:1603.06155 [cs]

work page doi:10.48550/arxiv.1603.06155 2016
[13]

Generative Agents: Interactive Simulacra of Human Behavior

Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. https://doi.org/10.48550/arXiv.2304.03442 arXiv:2304.03442 [cs]

work page internal anchor Pith review doi:10.48550/arxiv.2304.03442 2023
[14]

2009.Argumentation in Artificial Intelligence

Guillermo Simari and Iyad Rahwan (Eds.). 2009.Argumentation in Artificial Intelligence. Springer US, Boston, MA. https://doi.org/10.1007/978-0-387-98197-0

work page doi:10.1007/978-0-387-98197-0 2009
[15]

Jacky Visser, John Lawrence, Chris Reed, Jean Wagemans, and Douglas Walton
[16]

https://doi.org/10.1007/s10503-020-09519-x

Annotating Argument Schemes.Argumentation35, 1 (March 2021), 101–139. https://doi.org/10.1007/s10503-020-09519-x

work page doi:10.1007/s10503-020-09519-x 2021
[17]

2009.An Introduction to MultiAgent Systems

Michael Wooldridge. 2009.An Introduction to MultiAgent Systems. John Wiley & Sons. Google-Books-ID: X3ZQ7yeDn2IC

2009
[18]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W. White, Doug Burger, and Chi Wang. 2023. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. https://doi.org/10.48550/arXiv. 2308.08155 arXiv:2308.08155 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2023
[19]

doi: 10.18653/v1/P18-1205

Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason Weston. 2018. Personalizing Dialogue Agents: I have a dog, do you have pets too?. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistic...

work page doi:10.18653/v1/p18-1205 2018