arxiv: 2604.12770 · v1 · submitted 2026-04-14 · 💻 cs.CL

Recognition: unknown

Teaching LLMs Human-Like Editing of Inappropriate Argumentation via Reinforcement Learning

Timon Ziegenbein , Maja Stahl , Henning Wachsmuth

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:56 UTC · model grok-4.3

classification 💻 cs.CL

keywords reinforcement learninglarge language modelsargument editinghuman-like editingargumentationtext revisionpolicy optimization

0 comments

The pith

Reinforcement learning trains LLMs to produce self-contained sentence-level edits that improve argument appropriateness while preserving original meaning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models tend to scatter multiple changes across text and alter meaning when editing arguments, whereas humans typically bundle dependent changes into independent, meaning-preserving edits. The paper introduces a reinforcement learning method that teaches LLMs to generate such human-like edits at the sentence level. Training relies on group relative policy optimization driven by a reward that scores semantic similarity, fluency, and pattern conformity for each edit plus overall argument appropriateness. Automatic and human evaluations show the trained models outperform baselines, and repeated rounds of editing reach appropriateness levels near those of complete rewrites.

Core claim

By training with group relative policy optimization on a multi-component reward that jointly optimizes edit-level semantic similarity, fluency, and pattern conformity together with argument-level appropriateness, LLMs can be made to output self-contained sentence-level edit suggestions that users can accept or reject independently, producing more human-like editing behavior than prior approaches and reaching near-full-rewrite appropriateness through multi-round application.

What carries the argument

Group relative policy optimization guided by a multi-component reward function that scores semantic similarity, fluency, and pattern conformity at the edit level while also rewarding argument-level appropriateness.

Load-bearing premise

The chosen reward components and evaluation metrics accurately capture what counts as human-like editing behavior.

What would settle it

A study in which human judges consistently rate the RL model's sentence-level edits as less meaning-preserving or less appropriate than those from standard LLM baselines would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.12770 by Henning Wachsmuth, Maja Stahl, Timon Ziegenbein.

**Figure 2.** Figure 2: Our reinforcement learning approach to human-like appropriateness editing, including the policy [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Iterative revision results across 11 rounds. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Annotation interface for edit suggestion quality ratings using a 5-point Likert scale (Pattern Conformity, [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

**Figure 5.** Figure 5: Annotation interface for pairwise comparison of arguments after applying edit suggestions. Screenshots [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

read the original abstract

Editing human-written text has become a standard use case of large language models (LLMs), for example, to make one's arguments more appropriate for a discussion. Comparing human to LLM-generated edits, however, we observe a mismatch in editing strategies: While LLMs often perform multiple scattered edits and tend to change meaning notably, humans rather encapsulate dependent changes in self-contained, meaning-preserving edits. In this paper, we present a reinforcement learning approach that teaches LLMs human-like editing to improve the appropriateness of arguments. Our approach produces self-contained sentence-level edit suggestions that can be accepted or rejected independently. We train the approach using group relative policy optimization with a multi-component reward function that jointly optimizes edit-level semantic similarity, fluency, and pattern conformity as well as argument-level appropriateness. In automatic and human evaluation, it outperforms competitive baselines and the state of the art in human-like editing, with multi-round editing achieving appropriateness close to full rewriting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable RL recipe for pushing LLM edits toward self-contained sentence changes instead of scattered ones, but the reward-to-human-style link stays unproven.

read the letter

The core contribution is a reinforcement learning setup that steers LLMs to output independent, meaning-preserving sentence edits when improving argument appropriateness. They use group relative policy optimization and a four-part reward covering edit-level similarity, fluency, pattern match to humans, and argument-level appropriateness. This produces edits that users can accept or reject one by one, and multi-round application gets appropriateness scores near full rewrites. That addresses a real observed gap: LLMs tend to scatter changes and alter meaning more than people do in the same task.

Referee Report

3 major / 1 minor

Summary. The paper claims that LLMs typically produce scattered, meaning-altering edits on argumentative text, in contrast to humans who favor self-contained, meaning-preserving sentence-level edits. It introduces a reinforcement learning approach based on group relative policy optimization (GRPO) with a multi-component reward (edit-level semantic similarity, fluency, pattern conformity, and argument-level appropriateness) to train LLMs to generate independent, human-like edit suggestions. Automatic and human evaluations reportedly show outperformance over competitive baselines and SOTA methods, with multi-round editing achieving appropriateness levels close to full rewriting.

Significance. If the central claims hold after addressing the evaluation gaps, the work would be moderately significant for the field of controllable text generation and argument improvement, as it targets a specific mismatch in editing strategies between LLMs and humans and demonstrates a practical RL-based method for producing editable, self-contained suggestions. The use of a joint reward for both local edit quality and global appropriateness is a reasonable technical choice, though the absence of direct validation against human editing patterns reduces the strength of the 'human-like' contribution.

major comments (3)

[Evaluation] Evaluation section: The abstract and results claim outperformance in human-like editing and near-parity with full rewriting via multi-round editing, but provide no details on exact baselines, statistical significance testing, or how the four reward components were weighted and balanced during training; this directly weakens support for the central claim that the approach yields edits humans perceive as human-like rather than LLM-typical.
[Method] Method and Experiments: The multi-component reward (semantic similarity + fluency + pattern conformity + argument-level appropriateness) is optimized without reported ablation on component weights or correlation studies showing that higher reward scores align with human judgments of self-contained vs. scattered editing style; this is load-bearing because the paper's weakest assumption is that the proxy reward correctly measures human-like behavior.
[Results] Results: No direct comparison is presented of the granularity and dependency patterns of the generated edits against human reference edits (e.g., measuring how often changes are encapsulated in single sentences), which is required to substantiate the claim that the RL policy learns human editing strategies rather than merely optimizing the chosen proxies.

minor comments (1)

[Abstract] The abstract mentions 'pattern conformity' as a reward component but does not define the specific patterns or how they are measured; this notation should be clarified in the method section for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough and constructive feedback, which identifies key areas where additional details and analyses can strengthen the manuscript. We address each major comment below and commit to revisions that enhance the rigor of our claims without altering the core contributions.

read point-by-point responses

Referee: [Evaluation] Evaluation section: The abstract and results claim outperformance in human-like editing and near-parity with full rewriting via multi-round editing, but provide no details on exact baselines, statistical significance testing, or how the four reward components were weighted and balanced during training; this directly weakens support for the central claim that the approach yields edits humans perceive as human-like rather than LLM-typical.

Authors: We agree that the manuscript would benefit from greater transparency on these points. In the revised version, we will explicitly enumerate all baselines (including SOTA methods), report statistical significance testing (e.g., paired t-tests or Wilcoxon tests with p-values and effect sizes) for improvements in automatic metrics and human judgments, and detail the weights assigned to each of the four reward components along with the procedure used to balance them during training. These additions will provide clearer support for the human-like editing claims. revision: yes
Referee: [Method] Method and Experiments: The multi-component reward (semantic similarity + fluency + pattern conformity + argument-level appropriateness) is optimized without reported ablation on component weights or correlation studies showing that higher reward scores align with human judgments of self-contained vs. scattered editing style; this is load-bearing because the paper's weakest assumption is that the proxy reward correctly measures human-like behavior.

Authors: We recognize the value of such validation for the reward design. We will add an ablation study examining different weightings of the reward components and their impact on edit quality and appropriateness. We will also report correlation coefficients between the overall reward scores and human ratings of self-containment, meaning preservation, and editing style from our existing human evaluation data. This will directly test the alignment between the proxy reward and human perceptions. revision: yes
Referee: [Results] Results: No direct comparison is presented of the granularity and dependency patterns of the generated edits against human reference edits (e.g., measuring how often changes are encapsulated in single sentences), which is required to substantiate the claim that the RL policy learns human editing strategies rather than merely optimizing the chosen proxies.

Authors: We will incorporate a new quantitative comparison in the results section. Using the human-edited examples from our initial observation of editing strategies, we will measure and report edit granularity (proportion of single-sentence edits) and dependency patterns (e.g., average interdependent changes per edit) for our method, baselines, and the human references. This analysis will provide direct evidence that the learned policy better matches human editing patterns. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper defines a multi-component reward (semantic similarity, fluency, pattern conformity, argument-level appropriateness) using external pre-trained models, applies GRPO to optimize an LLM policy toward that reward, and evaluates the resulting edits via separate automatic metrics plus direct human judgments. No equation or claim reduces the output edits or the 'human-like' property to a fitted parameter or self-citation by construction; the central result is an empirical outcome of RL optimization against independent proxies and human raters rather than a tautological renaming or self-referential fit.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard RL algorithms and pre-trained LLMs; the main added element is the design of the reward function and the assumption that human editing patterns can be captured by the chosen components.

free parameters (1)

reward component weights
The multi-component reward (semantic similarity, fluency, pattern conformity, appropriateness) requires weighting factors that are tuned during training.

axioms (1)

domain assumption Human edits are typically self-contained, sentence-level, and meaning-preserving.
Stated as an observation that motivates the method.

pith-pipeline@v0.9.0 · 5460 in / 1255 out tokens · 40140 ms · 2026-05-10T14:56:26.548585+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 43 canonical work pages · 6 internal anchors

[1]

Rob Abbott, Brian Ecker, Pranav Anand, and Marilyn Walker. 2016. https://aclanthology.org/L16-1704/ I nternet argument corpus 2.0: An SQL schema for dialogic social media and the corpora to go with it . In Proceedings of the Tenth International Conference on Language Resources and Evaluation ( LREC '16) , pages 4445--4452, Portoro z , Slovenia. European L...

2016
[2]

Rohan Anil, Sebastian Borgeaud, et al. 2025. https://arxiv.org/abs/2312.11805 Gemini: A family of highly capable multimodal models . Preprint, arXiv:2312.11805

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Ralph Allan Bradley and Milton E Terry. 1952. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324--345

1952
[4]

Christiano, Jan Leike, Tom B

Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17, page 4302–4310, Red Hook, NY, USA. Curran Associates Inc

2017
[5]

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, et al. 2025. https://arxiv.org/abs/2507.06261 Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and nex...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

Yue Dong, Zichao Li, Mehdi Rezagholizadeh, and Jackie Chi Kit Cheung. 2019. https://doi.org/10.18653/v1/P19-1331 E dit NTS : An neural programmer-interpreter model for sentence simplification through explicit editing . In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3393--3402, Florence, Italy. Association...

work page doi:10.18653/v1/p19-1331 2019
[7]

Wanyu Du, Vipul Raheja, Dhruv Kumar, Zae Myung Kim, Melissa Lopez, and Dongyeop Kang. 2022. https://doi.org/10.18653/v1/2022.acl-long.250 Understanding iterative revision from human-written text . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3573--3590, Dublin, Ireland. Associati...

work page doi:10.18653/v1/2022.acl-long.250 2022
[8]

Linda S Flower and John R Hayes. 2016. The dynamics of composing: Making plans and juggling constraints. In Cognitive processes in writing, pages 31--50. Routledge

2016
[9]

Allan Gollins and Dedre Gentner. 2016. A framework for a cognitive theory of writing. In Cognitive processes in writing, pages 51--72. Routledge

2016
[10]

Hongyu Gong, Suma Bhat, Lingfei Wu, JinJun Xiong, and Wen-mei Hwu. 2019. https://doi.org/10.18653/v1/N19-1320 Reinforcement learning based text style transfer without parallel training corpus . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long ...

work page doi:10.18653/v1/n19-1320 2019
[11]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, et al. 2024. https://ar...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Ivan Habernal, Henning Wachsmuth, Iryna Gurevych, and Benno Stein. 2018. https://doi.org/10.18653/v1/N18-1036 Before name-calling: D ynamics and triggers of ad hominem fallacies in web argumentation . In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1...

work page doi:10.18653/v1/n18-1036 2018
[13]

Xinlei He, Savvas Zannettou, Yun Shen, and Yang Zhang. 2023. https://arxiv.org/abs/2308.05596 You only prompt once: On the capabilities of prompt learning on large language models to tackle toxic content . Preprint, arXiv:2308.05596

work page arXiv 2023
[14]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. https://openreview.net/forum?id=nZeVKeeFYf9 L o RA : Low-rank adaptation of large language models . In International Conference on Learning Representations

2022
[15]

Thomas Huber and Christina Niklaus. 2025. https://doi.org/10.18653/v1/2025.findings-emnlp.1065 CLEAR : A comprehensive linguistic evaluation of argument rewriting by large language models . In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 19548--19568, Suzhou, China. Association for Computational Linguistics

work page doi:10.18653/v1/2025.findings-emnlp.1065 2025
[16]

Chao Jiang, Wei Xu, and Samuel Stevens. 2022. https://doi.org/10.18653/v1/2022.emnlp-main.641 ar X iv E dits: Understanding the human revision process in scientific writing . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9420--9435, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics

work page doi:10.18653/v1/2022.emnlp-main.641 2022
[17]

Zhijing Jin, Abhinav Lalwani, Tejas Vaidhya, Xiaoyu Shen, Yiwen Ding, Zhiheng Lyu, Mrinmaya Sachan, Rada Mihalcea, and Bernhard Sch \"o lkopf. 2022. https://doi.org/10.18653/v1/2022.findings-emnlp.532 Logical fallacy detection . In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 7180--7198, Abu Dhabi, United Arab Emirates. Ass...

work page doi:10.18653/v1/2022.findings-emnlp.532 2022
[18]

L \'e o Laugier, John Pavlopoulos, Jeffrey Sorensen, and Lucas Dixon. 2021. https://doi.org/10.18653/v1/2021.eacl-main.124 Civil rephrases of toxic texts with self-supervised transformers . In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1442--1461, Online. Association for ...

work page doi:10.18653/v1/2021.eacl-main.124 2021
[19]

Varvara Logacheva, Daryna Dementieva, Sergey Ustyantsev, Daniil Moskovskiy, David Dale, Irina Krotova, Nikita Semenov, and Alexander Panchenko. 2022. https://doi.org/10.18653/v1/2022.acl-long.469 P ara D etox: Detoxification with parallel data . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pape...

work page doi:10.18653/v1/2022.acl-long.469 2022
[20]

Fuli Luo, Peng Li, Jie Zhou, Pengcheng Yang, Baobao Chang, Xu Sun, and Zhifang Sui. 2019. https://doi.org/10.24963/ijcai.2019/711 A dual reinforcement learning framework for unsupervised text style transfer . In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19 , pages 5116--5122. International Joint Conf...

work page doi:10.24963/ijcai.2019/711 2019
[21]

Karthic Madanagopal and James Caverlee. 2023. https://doi.org/10.18653/v1/2023.eacl-main.189 Reinforced sequence training based subjective bias correction . In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2585--2598, Dubrovnik, Croatia. Association for Computational Linguistics

work page doi:10.18653/v1/2023.eacl-main.189 2023
[22]

Jonathan Mallinson, Aliaksei Severyn, Eric Malmi, and Guillermo Garrido. 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.111 FELIX : Flexible text editing through tagging and insertion . In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1244--1255, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.findings-emnlp.111 2020
[23]

Eric Malmi, Sebastian Krause, Sascha Rothe, Daniil Mirylenka, and Aliaksei Severyn. 2019. https://doi.org/10.18653/v1/D19-1510 Encode, tag, realize: High-precision text editing . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)...

work page doi:10.18653/v1/d19-1510 2019
[24]

John Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, and Yanjun Qi. 2020. https://doi.org/10.18653/v1/2020.emnlp-demos.16 T ext A ttack: A framework for adversarial attacks, data augmentation, and adversarial training in NLP . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 1...

work page doi:10.18653/v1/2020.emnlp-demos.16 2020
[25]

Lily Ng, Anne Lauscher, Joel Tetreault, and Courtney Napoles. 2020. https://aclanthology.org/2020.argmining-1.13/ Creating a domain-diverse corpus for theory-based argument quality assessment . In Proceedings of the 7th Workshop on Argument Mining, pages 117--126, Online. Association for Computational Linguistics

2020
[26]

Cicero Nogueira dos Santos, Igor Melnyk, and Inkit Padhi. 2018. https://doi.org/10.18653/v1/P18-2031 Fighting offensive language on social media with unsupervised text style transfer . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 189--194, Melbourne, Australia. Association for C...

work page doi:10.18653/v1/p18-2031 2018
[27]

Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem Chernodub, and Oleksandr Skurzhanskyi. 2020. https://doi.org/10.18653/v1/2020.bea-1.16 GECT o R -- grammatical error correction: Tag, not rewrite . In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 163--170, Seattle, WA, USA Online. Association f...

work page doi:10.18653/v1/2020.bea-1.16 2020
[28]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730--27744

2022
[29]

Vipul Raheja, Dimitris Alikaniotis, Vivek Kulkarni, Bashar Alhafni, and Dhruv Kumar. 2024. https://doi.org/10.18653/v1/2024.naacl-long.56 m E d IT : Multilingual text editing via instruction tuning . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: ...

work page doi:10.18653/v1/2024.naacl-long.56 2024
[30]

Vipul Raheja, Dhruv Kumar, Ryan Koo, and Dongyeop Kang. 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.350 C o E d IT : Text editing by task-specific instruction tuning . In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5274--5291, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.findings-emnlp.350 2023
[31]

Salminen, Hind Almerekhi, Milica Milenkovic, Soon-Gyo Jung, Jisun An, Haewoon Kwak, and B

Joni O. Salminen, Hind Almerekhi, Milica Milenkovic, Soon-Gyo Jung, Jisun An, Haewoon Kwak, and B. Jansen. 2018. https://doi.org/10.1609/icwsm.v12i1.15028 Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media . International Conference on Web and Social Media

work page doi:10.1609/icwsm.v12i1.15028 2018
[32]

Henrique Schechter Vera, Sahil Dua, Biao Zhang, Daniel Salz, Ryan Mullins, Sindhu Raghuram Panyam, Sara Smoot, Iftekhar Naim, Joe Zou, Feiyang Chen, Daniel Cer, Alice Lisak, Min Choi, Lucas Gonzalez, Omar Sanseviero, Glenn Cameron, Ian Ballantyne, Kat Black, Kaifeng Chen, Weiyi Wang, Zhe Li, Gus Martins, Jinhyuk Lee, Mark Sherwood, Juyeong Ji, Renjie Wu, ...

work page arXiv 2025
[33]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. https://arxiv.org/abs/1707.06347 Proximal policy optimization algorithms . Preprint, arXiv:1707.06347

work page internal anchor Pith review Pith/arXiv arXiv 2017
[34]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. https://arxiv.org/abs/2402.03300 Deepseekmath: Pushing the limits of mathematical reasoning in open language models . Preprint, arXiv:2402.03300

work page internal anchor Pith review Pith/arXiv arXiv 2024
[35]

Lei Shu, Liangchen Luo, Jayakumar Hoskere, Yun Zhu, Yinxiao Liu, Simon Tong, Jindong Chen, and Lei Meng. 2024. https://doi.org/10.1609/aaai.v38i17.29863 Rewrite LM : an instruction-tuned large language model for text rewriting . In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applica...

work page doi:10.1609/aaai.v38i17.29863 2024
[36]

Gabriella Skitalinskaya, Maximilian Splieth \"o ver, and Henning Wachsmuth. 2023. https://doi.org/10.18653/v1/2023.inlg-main.10 Claim optimization in computational argumentation . In Proceedings of the 16th International Natural Language Generation Conference, pages 134--152, Prague, Czechia. Association for Computational Linguistics

work page doi:10.18653/v1/2023.inlg-main.10 2023
[37]

Maja Stahl, Timon Ziegenbein, Joonsuk Park, and Henning Wachsmuth. 2025. https://doi.org/10.18653/v1/2025.findings-acl.579 A rg I nstruct: Specialized instruction fine-tuning for computational argumentation . In Findings of the Association for Computational Linguistics: ACL 2025, pages 11103--11127, Vienna, Austria. Association for Computational Linguistics

work page doi:10.18653/v1/2025.findings-acl.579 2025
[38]

Felix Stahlberg and Shankar Kumar. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.418 S eq2 E dits: Sequence transduction using span-level edit operations . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5147--5159, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.emnlp-main.418 2020
[39]

Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul F Christiano. 2020. Learning to summarize with human feedback. Advances in neural information processing systems, 33:3008--3021

2020
[40]

Hashimoto

Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. 2023. Stanford A lpaca: A n instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca

2023
[41]

Vaughan and David D

Marie M. Vaughan and David D. McDonald. 1986. https://doi.org/10.3115/981131.981146 A model of revision in natural language generation . In Proceedings of the 24th Annual Meeting on Association for Computational Linguistics, ACL '86, page 90–96, USA. Association for Computational Linguistics

work page doi:10.3115/981131.981146 1986
[42]

Henning Wachsmuth, Nona Naderi, Yufang Hou, Yonatan Bilu, Vinodkumar Prabhakaran, Tim Alberdingk Thijm, Graeme Hirst, and Benno Stein. 2017. https://aclanthology.org/E17-1017/ Computational argumentation quality assessment in natural language . In Proceedings of the 15th Conference of the E uropean Chapter of the Association for Computational Linguistics:...

2017
[43]

Henning Wachsmuth and Till Werner. 2020. https://doi.org/10.18653/v1/2020.coling-main.592 Intrinsic quality assessment of arguments . In Proceedings of the 28th International Conference on Computational Linguistics, pages 6739--6745, Barcelona, Spain (Online). International Committee on Computational Linguistics

work page doi:10.18653/v1/2020.coling-main.592 2020
[44]

Marilyn Walker, Jean Fox Tree, Pranav Anand, Rob Abbott, and Joseph King. 2012. https://aclanthology.org/L12-1643/ A corpus for research on deliberation and debate . In Proceedings of the Eighth International Conference on Language Resources and Evaluation ( LREC '12) , pages 812--817, Istanbul, Turkey. European Language Resources Association (ELRA)

2012
[45]

D. Walton. 2010. https://books.google.de/books?id=eZ6Tmr2PaHcC The Place of Emotion in Argument . Pennsylvania State University Press

2010
[46]

Smith, Daniel Khashabi, and Hannaneh Hajishirzi

Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. https://doi.org/10.18653/v1/2023.acl-long.754 Self-instruct: Aligning language models with self-generated instructions . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), p...

work page doi:10.18653/v1/2023.acl-long.754 2023
[47]

Benjamin Warner, Antoine Chaffin, Benjamin Clavi \'e , Orion Weller, Oskar Hallstr \"o m, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Griffin Thomas Adams, Jeremy Howard, and Iacopo Poli. 2025. https://doi.org/10.18653/v1/2025.acl-long.127 Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory effi...

work page doi:10.18653/v1/2025.acl-long.127 2025
[48]

Chen Wu, Xuancheng Ren, Fuli Luo, and Xu Sun. 2019. https://doi.org/10.18653/v1/P19-1482 A hierarchical reinforced sequence operation method for unsupervised text style transfer . In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4873--4883, Florence, Italy. Association for Computational Linguistics

work page doi:10.18653/v1/p19-1482 2019
[49]

Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. https://doi.org/10.1145/3038912.3052591 Ex machina: Personal attacks seen at scale . In Proceedings of the 26th International Conference on World Wide Web, WWW '17, pages 1391--1399, Republic and Canton of Geneva, CHE. International World Wide Web Conferences Steering Committee

work page doi:10.1145/3038912.3052591 2017
[50]

Jingjing Xu, Xu Sun, Qi Zeng, Xiaodong Zhang, Xuancheng Ren, Houfeng Wang, and Wenjie Li. 2018. https://doi.org/10.18653/v1/P18-1090 Unpaired sentiment-to-sentiment translation: A cycled reinforcement learning approach . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 979--988, Melb...

work page doi:10.18653/v1/p18-1090 2018
[51]

Diyi Yang, Aaron Halfaker, Robert Kraut, and Eduard Hovy. 2017. https://doi.org/10.18653/v1/D17-1213 Identifying semantic edit intentions from revisions in W ikipedia . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2000--2010, Copenhagen, Denmark. Association for Computational Linguistics

work page doi:10.18653/v1/d17-1213 2017
[52]

Yiming Zeng, Wanhao Yu, Zexin Li, Tao Ren, Yu Ma, Jinghan Cao, Xiyan Chen, and Tingting Yu. 2025. Fineedit: Unlock instruction-based text editing for llms. arXiv preprint arXiv:2502.13358

work page arXiv 2025
[53]

Weinberger, and Yoav Artzi

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020. https://openreview.net/forum?id=SkeHuCVFDr BERTScore : Evaluating text generation with BERT . In International Conference on Learning Representations

2020
[54]

Yiqun Zhang, Mingjie Zhao, Yunfan Zhang, and Yiu-ming Cheung. 2025. https://doi.org/10.1109/TAI.2025.3620272 Trending applications of large language models: A user perspective survey . IEEE Transactions on Artificial Intelligence, 1(01):1--17

work page doi:10.1109/tai.2025.3620272 2025
[55]

Timon Ziegenbein, Gabriella Skitalinskaya, Alireza Bayat Makou, and Henning Wachsmuth. 2024 a . https://doi.org/10.18653/v1/2024.acl-long.244 LLM -based rewriting of inappropriate argumentation using reinforcement learning from machine feedback . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pap...

work page doi:10.18653/v1/2024.acl-long.244 2024
[56]

Timon Ziegenbein, Shahbaz Syed, Felix Lange, Martin Potthast, and Henning Wachsmuth. 2023. https://doi.org/10.18653/v1/2023.acl-long.238 Modeling appropriate language in argumentation . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4344--4363, Toronto, Canada. Association for Comp...

work page doi:10.18653/v1/2023.acl-long.238 2023
[57]

Timon Ziegenbein, Shahbaz Syed, Martin Potthast, and Henning Wachsmuth. 2024 b . Objective argument summarization in search. In Robust Argumentation Machines, pages 335--351, Cham. Springer Nature Switzerland

2024
[58]

Fine-Tuning Language Models from Human Preferences

Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2020. https://arxiv.org/abs/1909.08593 Fine-tuning language models from human preferences . Preprint, arXiv:1909.08593

work page internal anchor Pith review arXiv 2020
[59]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...
[60]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...