Recognition: unknown
Teaching LLMs Human-Like Editing of Inappropriate Argumentation via Reinforcement Learning
Pith reviewed 2026-05-10 14:56 UTC · model grok-4.3
The pith
Reinforcement learning trains LLMs to produce self-contained sentence-level edits that improve argument appropriateness while preserving original meaning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By training with group relative policy optimization on a multi-component reward that jointly optimizes edit-level semantic similarity, fluency, and pattern conformity together with argument-level appropriateness, LLMs can be made to output self-contained sentence-level edit suggestions that users can accept or reject independently, producing more human-like editing behavior than prior approaches and reaching near-full-rewrite appropriateness through multi-round application.
What carries the argument
Group relative policy optimization guided by a multi-component reward function that scores semantic similarity, fluency, and pattern conformity at the edit level while also rewarding argument-level appropriateness.
Load-bearing premise
The chosen reward components and evaluation metrics accurately capture what counts as human-like editing behavior.
What would settle it
A study in which human judges consistently rate the RL model's sentence-level edits as less meaning-preserving or less appropriate than those from standard LLM baselines would falsify the central claim.
Figures
read the original abstract
Editing human-written text has become a standard use case of large language models (LLMs), for example, to make one's arguments more appropriate for a discussion. Comparing human to LLM-generated edits, however, we observe a mismatch in editing strategies: While LLMs often perform multiple scattered edits and tend to change meaning notably, humans rather encapsulate dependent changes in self-contained, meaning-preserving edits. In this paper, we present a reinforcement learning approach that teaches LLMs human-like editing to improve the appropriateness of arguments. Our approach produces self-contained sentence-level edit suggestions that can be accepted or rejected independently. We train the approach using group relative policy optimization with a multi-component reward function that jointly optimizes edit-level semantic similarity, fluency, and pattern conformity as well as argument-level appropriateness. In automatic and human evaluation, it outperforms competitive baselines and the state of the art in human-like editing, with multi-round editing achieving appropriateness close to full rewriting.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that LLMs typically produce scattered, meaning-altering edits on argumentative text, in contrast to humans who favor self-contained, meaning-preserving sentence-level edits. It introduces a reinforcement learning approach based on group relative policy optimization (GRPO) with a multi-component reward (edit-level semantic similarity, fluency, pattern conformity, and argument-level appropriateness) to train LLMs to generate independent, human-like edit suggestions. Automatic and human evaluations reportedly show outperformance over competitive baselines and SOTA methods, with multi-round editing achieving appropriateness levels close to full rewriting.
Significance. If the central claims hold after addressing the evaluation gaps, the work would be moderately significant for the field of controllable text generation and argument improvement, as it targets a specific mismatch in editing strategies between LLMs and humans and demonstrates a practical RL-based method for producing editable, self-contained suggestions. The use of a joint reward for both local edit quality and global appropriateness is a reasonable technical choice, though the absence of direct validation against human editing patterns reduces the strength of the 'human-like' contribution.
major comments (3)
- [Evaluation] Evaluation section: The abstract and results claim outperformance in human-like editing and near-parity with full rewriting via multi-round editing, but provide no details on exact baselines, statistical significance testing, or how the four reward components were weighted and balanced during training; this directly weakens support for the central claim that the approach yields edits humans perceive as human-like rather than LLM-typical.
- [Method] Method and Experiments: The multi-component reward (semantic similarity + fluency + pattern conformity + argument-level appropriateness) is optimized without reported ablation on component weights or correlation studies showing that higher reward scores align with human judgments of self-contained vs. scattered editing style; this is load-bearing because the paper's weakest assumption is that the proxy reward correctly measures human-like behavior.
- [Results] Results: No direct comparison is presented of the granularity and dependency patterns of the generated edits against human reference edits (e.g., measuring how often changes are encapsulated in single sentences), which is required to substantiate the claim that the RL policy learns human editing strategies rather than merely optimizing the chosen proxies.
minor comments (1)
- [Abstract] The abstract mentions 'pattern conformity' as a reward component but does not define the specific patterns or how they are measured; this notation should be clarified in the method section for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their thorough and constructive feedback, which identifies key areas where additional details and analyses can strengthen the manuscript. We address each major comment below and commit to revisions that enhance the rigor of our claims without altering the core contributions.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: The abstract and results claim outperformance in human-like editing and near-parity with full rewriting via multi-round editing, but provide no details on exact baselines, statistical significance testing, or how the four reward components were weighted and balanced during training; this directly weakens support for the central claim that the approach yields edits humans perceive as human-like rather than LLM-typical.
Authors: We agree that the manuscript would benefit from greater transparency on these points. In the revised version, we will explicitly enumerate all baselines (including SOTA methods), report statistical significance testing (e.g., paired t-tests or Wilcoxon tests with p-values and effect sizes) for improvements in automatic metrics and human judgments, and detail the weights assigned to each of the four reward components along with the procedure used to balance them during training. These additions will provide clearer support for the human-like editing claims. revision: yes
-
Referee: [Method] Method and Experiments: The multi-component reward (semantic similarity + fluency + pattern conformity + argument-level appropriateness) is optimized without reported ablation on component weights or correlation studies showing that higher reward scores align with human judgments of self-contained vs. scattered editing style; this is load-bearing because the paper's weakest assumption is that the proxy reward correctly measures human-like behavior.
Authors: We recognize the value of such validation for the reward design. We will add an ablation study examining different weightings of the reward components and their impact on edit quality and appropriateness. We will also report correlation coefficients between the overall reward scores and human ratings of self-containment, meaning preservation, and editing style from our existing human evaluation data. This will directly test the alignment between the proxy reward and human perceptions. revision: yes
-
Referee: [Results] Results: No direct comparison is presented of the granularity and dependency patterns of the generated edits against human reference edits (e.g., measuring how often changes are encapsulated in single sentences), which is required to substantiate the claim that the RL policy learns human editing strategies rather than merely optimizing the chosen proxies.
Authors: We will incorporate a new quantitative comparison in the results section. Using the human-edited examples from our initial observation of editing strategies, we will measure and report edit granularity (proportion of single-sentence edits) and dependency patterns (e.g., average interdependent changes per edit) for our method, baselines, and the human references. This analysis will provide direct evidence that the learned policy better matches human editing patterns. revision: yes
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper defines a multi-component reward (semantic similarity, fluency, pattern conformity, argument-level appropriateness) using external pre-trained models, applies GRPO to optimize an LLM policy toward that reward, and evaluates the resulting edits via separate automatic metrics plus direct human judgments. No equation or claim reduces the output edits or the 'human-like' property to a fitted parameter or self-citation by construction; the central result is an empirical outcome of RL optimization against independent proxies and human raters rather than a tautological renaming or self-referential fit.
Axiom & Free-Parameter Ledger
free parameters (1)
- reward component weights
axioms (1)
- domain assumption Human edits are typically self-contained, sentence-level, and meaning-preserving.
Reference graph
Works this paper leans on
-
[1]
Rob Abbott, Brian Ecker, Pranav Anand, and Marilyn Walker. 2016. https://aclanthology.org/L16-1704/ I nternet argument corpus 2.0: An SQL schema for dialogic social media and the corpora to go with it . In Proceedings of the Tenth International Conference on Language Resources and Evaluation ( LREC '16) , pages 4445--4452, Portoro z , Slovenia. European L...
2016
-
[2]
Rohan Anil, Sebastian Borgeaud, et al. 2025. https://arxiv.org/abs/2312.11805 Gemini: A family of highly capable multimodal models . Preprint, arXiv:2312.11805
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Ralph Allan Bradley and Milton E Terry. 1952. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324--345
1952
-
[4]
Christiano, Jan Leike, Tom B
Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17, page 4302–4310, Red Hook, NY, USA. Curran Associates Inc
2017
-
[5]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, et al. 2025. https://arxiv.org/abs/2507.06261 Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and nex...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[6]
Yue Dong, Zichao Li, Mehdi Rezagholizadeh, and Jackie Chi Kit Cheung. 2019. https://doi.org/10.18653/v1/P19-1331 E dit NTS : An neural programmer-interpreter model for sentence simplification through explicit editing . In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3393--3402, Florence, Italy. Association...
-
[7]
Wanyu Du, Vipul Raheja, Dhruv Kumar, Zae Myung Kim, Melissa Lopez, and Dongyeop Kang. 2022. https://doi.org/10.18653/v1/2022.acl-long.250 Understanding iterative revision from human-written text . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3573--3590, Dublin, Ireland. Associati...
-
[8]
Linda S Flower and John R Hayes. 2016. The dynamics of composing: Making plans and juggling constraints. In Cognitive processes in writing, pages 31--50. Routledge
2016
-
[9]
Allan Gollins and Dedre Gentner. 2016. A framework for a cognitive theory of writing. In Cognitive processes in writing, pages 51--72. Routledge
2016
-
[10]
Hongyu Gong, Suma Bhat, Lingfei Wu, JinJun Xiong, and Wen-mei Hwu. 2019. https://doi.org/10.18653/v1/N19-1320 Reinforcement learning based text style transfer without parallel training corpus . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long ...
-
[11]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, et al. 2024. https://ar...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
Ivan Habernal, Henning Wachsmuth, Iryna Gurevych, and Benno Stein. 2018. https://doi.org/10.18653/v1/N18-1036 Before name-calling: D ynamics and triggers of ad hominem fallacies in web argumentation . In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1...
- [13]
-
[14]
Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. https://openreview.net/forum?id=nZeVKeeFYf9 L o RA : Low-rank adaptation of large language models . In International Conference on Learning Representations
2022
-
[15]
Thomas Huber and Christina Niklaus. 2025. https://doi.org/10.18653/v1/2025.findings-emnlp.1065 CLEAR : A comprehensive linguistic evaluation of argument rewriting by large language models . In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 19548--19568, Suzhou, China. Association for Computational Linguistics
-
[16]
Chao Jiang, Wei Xu, and Samuel Stevens. 2022. https://doi.org/10.18653/v1/2022.emnlp-main.641 ar X iv E dits: Understanding the human revision process in scientific writing . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9420--9435, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics
-
[17]
Zhijing Jin, Abhinav Lalwani, Tejas Vaidhya, Xiaoyu Shen, Yiwen Ding, Zhiheng Lyu, Mrinmaya Sachan, Rada Mihalcea, and Bernhard Sch \"o lkopf. 2022. https://doi.org/10.18653/v1/2022.findings-emnlp.532 Logical fallacy detection . In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 7180--7198, Abu Dhabi, United Arab Emirates. Ass...
-
[18]
L \'e o Laugier, John Pavlopoulos, Jeffrey Sorensen, and Lucas Dixon. 2021. https://doi.org/10.18653/v1/2021.eacl-main.124 Civil rephrases of toxic texts with self-supervised transformers . In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1442--1461, Online. Association for ...
-
[19]
Varvara Logacheva, Daryna Dementieva, Sergey Ustyantsev, Daniil Moskovskiy, David Dale, Irina Krotova, Nikita Semenov, and Alexander Panchenko. 2022. https://doi.org/10.18653/v1/2022.acl-long.469 P ara D etox: Detoxification with parallel data . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pape...
-
[20]
Fuli Luo, Peng Li, Jie Zhou, Pengcheng Yang, Baobao Chang, Xu Sun, and Zhifang Sui. 2019. https://doi.org/10.24963/ijcai.2019/711 A dual reinforcement learning framework for unsupervised text style transfer . In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19 , pages 5116--5122. International Joint Conf...
-
[21]
Karthic Madanagopal and James Caverlee. 2023. https://doi.org/10.18653/v1/2023.eacl-main.189 Reinforced sequence training based subjective bias correction . In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2585--2598, Dubrovnik, Croatia. Association for Computational Linguistics
-
[22]
Jonathan Mallinson, Aliaksei Severyn, Eric Malmi, and Guillermo Garrido. 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.111 FELIX : Flexible text editing through tagging and insertion . In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1244--1255, Online. Association for Computational Linguistics
-
[23]
Eric Malmi, Sebastian Krause, Sascha Rothe, Daniil Mirylenka, and Aliaksei Severyn. 2019. https://doi.org/10.18653/v1/D19-1510 Encode, tag, realize: High-precision text editing . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)...
-
[24]
John Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, and Yanjun Qi. 2020. https://doi.org/10.18653/v1/2020.emnlp-demos.16 T ext A ttack: A framework for adversarial attacks, data augmentation, and adversarial training in NLP . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 1...
-
[25]
Lily Ng, Anne Lauscher, Joel Tetreault, and Courtney Napoles. 2020. https://aclanthology.org/2020.argmining-1.13/ Creating a domain-diverse corpus for theory-based argument quality assessment . In Proceedings of the 7th Workshop on Argument Mining, pages 117--126, Online. Association for Computational Linguistics
2020
-
[26]
Cicero Nogueira dos Santos, Igor Melnyk, and Inkit Padhi. 2018. https://doi.org/10.18653/v1/P18-2031 Fighting offensive language on social media with unsupervised text style transfer . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 189--194, Melbourne, Australia. Association for C...
-
[27]
Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem Chernodub, and Oleksandr Skurzhanskyi. 2020. https://doi.org/10.18653/v1/2020.bea-1.16 GECT o R -- grammatical error correction: Tag, not rewrite . In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 163--170, Seattle, WA, USA Online. Association f...
-
[28]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730--27744
2022
-
[29]
Vipul Raheja, Dimitris Alikaniotis, Vivek Kulkarni, Bashar Alhafni, and Dhruv Kumar. 2024. https://doi.org/10.18653/v1/2024.naacl-long.56 m E d IT : Multilingual text editing via instruction tuning . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: ...
-
[30]
Vipul Raheja, Dhruv Kumar, Ryan Koo, and Dongyeop Kang. 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.350 C o E d IT : Text editing by task-specific instruction tuning . In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5274--5291, Singapore. Association for Computational Linguistics
-
[31]
Salminen, Hind Almerekhi, Milica Milenkovic, Soon-Gyo Jung, Jisun An, Haewoon Kwak, and B
Joni O. Salminen, Hind Almerekhi, Milica Milenkovic, Soon-Gyo Jung, Jisun An, Haewoon Kwak, and B. Jansen. 2018. https://doi.org/10.1609/icwsm.v12i1.15028 Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media . International Conference on Web and Social Media
-
[32]
Henrique Schechter Vera, Sahil Dua, Biao Zhang, Daniel Salz, Ryan Mullins, Sindhu Raghuram Panyam, Sara Smoot, Iftekhar Naim, Joe Zou, Feiyang Chen, Daniel Cer, Alice Lisak, Min Choi, Lucas Gonzalez, Omar Sanseviero, Glenn Cameron, Ian Ballantyne, Kat Black, Kaifeng Chen, Weiyi Wang, Zhe Li, Gus Martins, Jinhyuk Lee, Mark Sherwood, Juyeong Ji, Renjie Wu, ...
-
[33]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. https://arxiv.org/abs/1707.06347 Proximal policy optimization algorithms . Preprint, arXiv:1707.06347
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[34]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. https://arxiv.org/abs/2402.03300 Deepseekmath: Pushing the limits of mathematical reasoning in open language models . Preprint, arXiv:2402.03300
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[35]
Lei Shu, Liangchen Luo, Jayakumar Hoskere, Yun Zhu, Yinxiao Liu, Simon Tong, Jindong Chen, and Lei Meng. 2024. https://doi.org/10.1609/aaai.v38i17.29863 Rewrite LM : an instruction-tuned large language model for text rewriting . In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applica...
-
[36]
Gabriella Skitalinskaya, Maximilian Splieth \"o ver, and Henning Wachsmuth. 2023. https://doi.org/10.18653/v1/2023.inlg-main.10 Claim optimization in computational argumentation . In Proceedings of the 16th International Natural Language Generation Conference, pages 134--152, Prague, Czechia. Association for Computational Linguistics
-
[37]
Maja Stahl, Timon Ziegenbein, Joonsuk Park, and Henning Wachsmuth. 2025. https://doi.org/10.18653/v1/2025.findings-acl.579 A rg I nstruct: Specialized instruction fine-tuning for computational argumentation . In Findings of the Association for Computational Linguistics: ACL 2025, pages 11103--11127, Vienna, Austria. Association for Computational Linguistics
-
[38]
Felix Stahlberg and Shankar Kumar. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.418 S eq2 E dits: Sequence transduction using span-level edit operations . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5147--5159, Online. Association for Computational Linguistics
-
[39]
Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul F Christiano. 2020. Learning to summarize with human feedback. Advances in neural information processing systems, 33:3008--3021
2020
-
[40]
Hashimoto
Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. 2023. Stanford A lpaca: A n instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca
2023
-
[41]
Marie M. Vaughan and David D. McDonald. 1986. https://doi.org/10.3115/981131.981146 A model of revision in natural language generation . In Proceedings of the 24th Annual Meeting on Association for Computational Linguistics, ACL '86, page 90–96, USA. Association for Computational Linguistics
-
[42]
Henning Wachsmuth, Nona Naderi, Yufang Hou, Yonatan Bilu, Vinodkumar Prabhakaran, Tim Alberdingk Thijm, Graeme Hirst, and Benno Stein. 2017. https://aclanthology.org/E17-1017/ Computational argumentation quality assessment in natural language . In Proceedings of the 15th Conference of the E uropean Chapter of the Association for Computational Linguistics:...
2017
-
[43]
Henning Wachsmuth and Till Werner. 2020. https://doi.org/10.18653/v1/2020.coling-main.592 Intrinsic quality assessment of arguments . In Proceedings of the 28th International Conference on Computational Linguistics, pages 6739--6745, Barcelona, Spain (Online). International Committee on Computational Linguistics
-
[44]
Marilyn Walker, Jean Fox Tree, Pranav Anand, Rob Abbott, and Joseph King. 2012. https://aclanthology.org/L12-1643/ A corpus for research on deliberation and debate . In Proceedings of the Eighth International Conference on Language Resources and Evaluation ( LREC '12) , pages 812--817, Istanbul, Turkey. European Language Resources Association (ELRA)
2012
-
[45]
D. Walton. 2010. https://books.google.de/books?id=eZ6Tmr2PaHcC The Place of Emotion in Argument . Pennsylvania State University Press
2010
-
[46]
Smith, Daniel Khashabi, and Hannaneh Hajishirzi
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. https://doi.org/10.18653/v1/2023.acl-long.754 Self-instruct: Aligning language models with self-generated instructions . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), p...
-
[47]
Benjamin Warner, Antoine Chaffin, Benjamin Clavi \'e , Orion Weller, Oskar Hallstr \"o m, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Griffin Thomas Adams, Jeremy Howard, and Iacopo Poli. 2025. https://doi.org/10.18653/v1/2025.acl-long.127 Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory effi...
-
[48]
Chen Wu, Xuancheng Ren, Fuli Luo, and Xu Sun. 2019. https://doi.org/10.18653/v1/P19-1482 A hierarchical reinforced sequence operation method for unsupervised text style transfer . In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4873--4883, Florence, Italy. Association for Computational Linguistics
-
[49]
Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. https://doi.org/10.1145/3038912.3052591 Ex machina: Personal attacks seen at scale . In Proceedings of the 26th International Conference on World Wide Web, WWW '17, pages 1391--1399, Republic and Canton of Geneva, CHE. International World Wide Web Conferences Steering Committee
-
[50]
Jingjing Xu, Xu Sun, Qi Zeng, Xiaodong Zhang, Xuancheng Ren, Houfeng Wang, and Wenjie Li. 2018. https://doi.org/10.18653/v1/P18-1090 Unpaired sentiment-to-sentiment translation: A cycled reinforcement learning approach . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 979--988, Melb...
-
[51]
Diyi Yang, Aaron Halfaker, Robert Kraut, and Eduard Hovy. 2017. https://doi.org/10.18653/v1/D17-1213 Identifying semantic edit intentions from revisions in W ikipedia . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2000--2010, Copenhagen, Denmark. Association for Computational Linguistics
- [52]
-
[53]
Weinberger, and Yoav Artzi
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020. https://openreview.net/forum?id=SkeHuCVFDr BERTScore : Evaluating text generation with BERT . In International Conference on Learning Representations
2020
-
[54]
Yiqun Zhang, Mingjie Zhao, Yunfan Zhang, and Yiu-ming Cheung. 2025. https://doi.org/10.1109/TAI.2025.3620272 Trending applications of large language models: A user perspective survey . IEEE Transactions on Artificial Intelligence, 1(01):1--17
-
[55]
Timon Ziegenbein, Gabriella Skitalinskaya, Alireza Bayat Makou, and Henning Wachsmuth. 2024 a . https://doi.org/10.18653/v1/2024.acl-long.244 LLM -based rewriting of inappropriate argumentation using reinforcement learning from machine feedback . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pap...
-
[56]
Timon Ziegenbein, Shahbaz Syed, Felix Lange, Martin Potthast, and Henning Wachsmuth. 2023. https://doi.org/10.18653/v1/2023.acl-long.238 Modeling appropriate language in argumentation . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4344--4363, Toronto, Canada. Association for Comp...
-
[57]
Timon Ziegenbein, Shahbaz Syed, Martin Potthast, and Henning Wachsmuth. 2024 b . Objective argument summarization in search. In Robust Argumentation Machines, pages 335--351, Cham. Springer Nature Switzerland
2024
-
[58]
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2020. https://arxiv.org/abs/1909.08593 Fine-tuning language models from human preferences . Preprint, arXiv:1909.08593
work page internal anchor Pith review arXiv 2020
-
[59]
online" 'onlinestring :=
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...
-
[60]
write newline
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.