Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms
Pith reviewed 2026-05-23 23:01 UTC · model grok-4.3
The pith
LLMs apply Wikipedia neutrality rules in ways that appeal to the public but differ from expert editors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLMs achieve only 64% accuracy when detecting biased edits on a balanced dataset and exhibit different tendencies to under- or over-predict bias. On the correction task, they remove 79% of the words removed by Wikipedia editors but make additional changes that result in high-recall but low-precision edits. Crowdworkers judge the LLM rewrites as more neutral in 70% of cases and more fluent in 61% of cases compared to the editor versions, although the models sometimes apply neutrality more broadly while also introducing unrelated modifications such as grammar corrections.
What carries the argument
Evaluation of LLMs on two tasks—detecting biased Wikipedia edits and correcting them to follow NPOV—measured against actual Wikipedia editor actions and crowdworker assessments of neutrality and fluency.
If this is right
- LLMs may be effective for generating neutral content but will likely require human oversight to verify added material.
- Application of NPOV by LLMs resonates more with public judgments than with those of Wikipedia community experts.
- Adoption of LLMs could reduce the agency of Wikipedia editors in determining article tone and content.
- Moderation workload may rise because of the need to check for extraneous changes introduced by the models.
Where Pith is reading between the lines
- Platforms with volunteer moderation might face similar issues when integrating LLMs if their norms are hard to specify precisely.
- Future work could test whether fine-tuning on expert editor decisions improves alignment with community standards over general crowd feedback.
- Broader use of AI for rule application could shift how specialized groups maintain their content standards over time.
Load-bearing premise
Crowdworker judgments of neutrality and fluency serve as a good stand-in for how experienced Wikipedia editors would evaluate adherence to NPOV policy.
What would settle it
A direct comparison in which experienced Wikipedia editors evaluate the neutrality of the AI rewrites versus the human editor rewrites on the same set of articles would test whether the crowdworker preference holds for the community.
Figures
read the original abstract
Large language models (LLMs) are trained on broad corpora and then used in communities with specialized norms. Is providing LLMs with community rules enough for models to follow these norms? We evaluate LLMs' capacity to detect (Task 1) and correct (Task 2) biased Wikipedia edits according to Wikipedia's Neutral Point of View (NPOV) policy. LLMs struggled with bias detection, achieving only 64% accuracy on a balanced dataset. Models exhibited contrasting biases (some under- and others over-predicted bias), suggesting distinct priors about neutrality. LLMs performed better at generation, removing 79% of words removed by Wikipedia editors. However, LLMs made additional changes beyond Wikipedia editors' simpler neutralizations, resulting in high-recall but low-precision editing. Interestingly, crowdworkers rated AI rewrites as more neutral (70%) and fluent (61%) than Wikipedia-editor rewrites. Qualitative analysis found LLMs sometimes applied NPOV more comprehensively than Wikipedia editors but often made extraneous non-NPOV-related changes (such as grammar). LLMs may apply rules in ways that resonate with the public but diverge from community experts. While potentially effective for generation, LLMs may reduce editor agency and increase moderation workload (e.g., verifying additions). Even when rules are easy to articulate, having LLMs apply them like community members may still be difficult.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates LLMs on two tasks using Wikipedia edits: detecting biased edits per NPOV policy (Task 1, achieving 64% accuracy on a balanced dataset with varying model biases) and correcting them (Task 2, removing 79% of words targeted by editors but with additional extraneous changes). Crowdworker evaluations rate LLM rewrites as more neutral (70%) and fluent (61%) than editor rewrites. Qualitative analysis notes LLMs sometimes apply NPOV more comprehensively but often introduce non-NPOV changes like grammar fixes. The central claim is that LLMs apply NPOV norms in ways resonating with public preferences but diverging from community experts, with potential impacts on editor agency and moderation workload.
Significance. If the empirical results hold after addressing validation gaps, the work contributes concrete metrics on LLM norm application in a real specialized community (Wikipedia NPOV), highlighting alignment challenges even with explicit rules. Strengths include the dual-task design, word-overlap quantification, and mixed quantitative-qualitative approach to public vs. expert divergence. This could inform AI-assisted moderation research in computational social science and NLP.
major comments (2)
- [§4 (Results, crowdworker study)] §4 (Results, crowdworker study): The central claim that LLMs 'resonate with the public but diverge from community experts' rests on crowdworkers rating AI rewrites higher in neutrality/fluency, but no direct comparison or validation is provided against experienced Wikipedia editors' judgments of NPOV policy adherence. This leaves the divergence interpretation without expert-grounded evidence.
- [§3 (Methods)] §3 (Methods): Dataset construction for the balanced Task 1 set, selection criteria for biased edits, prompt engineering details for LLMs, and any statistical tests or inter-rater reliability metrics for human evaluations are not described. These omissions are load-bearing for interpreting the 64% accuracy and 79% removal figures as evidence of LLM behavior.
minor comments (2)
- [Abstract and §2] Abstract and §2: Clarify the exact definition of 'balanced dataset' and how 'contrasting biases' across models were quantified to aid reader interpretation.
- [Results figures/tables] Figure or table presenting the 70%/61% crowdworker preferences: Include confidence intervals or sample sizes for the ratings to support the reported percentages.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments identify key areas where additional detail and clarification will strengthen the manuscript. We respond to each major comment below.
read point-by-point responses
-
Referee: [§4 (Results, crowdworker study)] The central claim that LLMs 'resonate with the public but diverge from community experts' rests on crowdworkers rating AI rewrites higher in neutrality/fluency, but no direct comparison or validation is provided against experienced Wikipedia editors' judgments of NPOV policy adherence. This leaves the divergence interpretation without expert-grounded evidence.
Authors: The claim of divergence from community experts is supported by the quantitative finding that LLMs make additional changes beyond those of Wikipedia editors (high-recall, low-precision editing) together with the qualitative analysis of non-NPOV edits such as grammar fixes. Crowdworker ratings supply the public-preference comparison. We agree that the interpretation would be stronger with direct NPOV-adherence ratings from experienced Wikipedia editors on the LLM rewrites. Because no such expert ratings were collected, we will revise the manuscript to state this limitation explicitly and to clarify the evidential basis for the current interpretation. revision: partial
-
Referee: [§3 (Methods)] Dataset construction for the balanced Task 1 set, selection criteria for biased edits, prompt engineering details for LLMs, and any statistical tests or inter-rater reliability metrics for human evaluations are not described. These omissions are load-bearing for interpreting the 64% accuracy and 79% removal figures as evidence of LLM behavior.
Authors: We appreciate the referee highlighting these gaps. The revised manuscript will expand §3 to include: (i) the construction and balancing procedure for the Task 1 dataset, (ii) the criteria used to select biased edits, (iii) the exact prompts and few-shot examples provided to each LLM, and (iv) the statistical tests performed together with inter-rater reliability statistics (e.g., Fleiss’ kappa or equivalent) for the crowdworker evaluations. revision: yes
Circularity Check
No circularity: direct empirical evaluation with no derivations or self-referential predictions
full rationale
The paper conducts new experiments on LLMs for bias detection (64% accuracy) and correction (79% word removal overlap), plus crowdworker ratings of neutrality/fluency and qualitative analysis. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the derivation chain. All claims rest on the described tasks, human ratings, and direct comparisons to Wikipedia editor edits, making the work self-contained against external benchmarks with no reduction of outputs to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Wikipedia's NPOV policy can be operationalized into binary bias detection and text correction tasks that capture the policy's intent
Reference graph
Works this paper leans on
-
[1]
Constitutional AI: Harmlessness from AI Feedback
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., Kerr, J., Mueller, J., Ladish, J., Landau, J., Ndousse, K., Lukosuite, K., Lovitt, L., Sellitto, M., Elhage, N., Schiefer, N., ...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[2]
Barbarestani, B., Maks, I., and Vossen, P. T. Content Moderation in Online Platforms: A Study of Annotation Methods for Inappropriate Language. In Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying LREC-COLING-2024 (Torino, Italia, May 2024), R. Kumar, A. K. Ojha, S. Malmasi, B. R. Chakravarthi, B. Lahiri, S. Singh, and S. Ratan, Eds...
work page 2024
-
[3]
Wiki-Gendersort: Automatic gender detection using first names in Wikipedia
Bérubé, N., Ghiasi, G., Sainte-Marie, M., and others . Wiki-Gendersort: Automatic gender detection using first names in Wikipedia. Publisher: OSF
-
[4]
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A.,...
work page 2020
-
[5]
L., Forte, A., and Bruckman, A
Bryant, S. L., Forte, A., and Bruckman, A. Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In Proceedings of the 2005 ACM International Conference on Supporting Group Work (New York, NY, USA, Nov. 2005), GROUP ’05, Association for Computing Machinery, pp. 1–10
work page 2005
-
[6]
Butler, B., Joyce, E., and Pike, J. Don’t look now, but we’ve created a bureaucracy: the nature and roles of policies and rules in wikipedia. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, NY, USA, Apr. 2008), CHI ’08, Association for Computing Machinery, pp. 1101–1110
work page 2008
-
[7]
T., Domingo, L.-F., Gilbert, S
Cao, Y. T., Domingo, L.-F., Gilbert, S. A., Mazurek, M., Shilton, K., and Daumé III, H. Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting Volunteer Content Moderators, Feb. 2024. arXiv:2311.07879 [cs]
-
[8]
Elmimouni, H., Forte, A., and Morgan, J. Why People Trust Wikipedia Articles: Credibility Assessment Strategies Seeing Like an AI 23 Used by Readers. In Proceedings of the 18th International Symposium on Open Collaboration (Madrid Spain, Sept. 2022), ACM, pp. 1–10
work page 2022
-
[9]
ChatGPT outperforms crowd workers for text-annotation tasks
Gilardi, F., Alizadeh, M., and Kubli, M. ChatGPT outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences 120 , 30 (July 2023), e2305016120. Publisher: Proceedings of the National Academy of Sciences
work page 2023
-
[10]
Ideology and Composition Among an Online Crowd: Evidence from Wikipedians
Greenstein, S., Gu, G., and Zhu, F. Ideology and Composition Among an Online Crowd: Evidence from Wikipedians. Management Science 67, 5 (May 2021), 3067–3086. Publisher: INFORMS
work page 2021
-
[11]
Greenstein, S., and Zhu, F. Do Experts or Crowd-Based Models Produce More Bias? Evidence from Encyclopedia Britannica and Wikipedia. MIS Quarterly 42, 3 (Mar. 2018), 945–959
work page 2018
-
[12]
Halfaker, A., and Geiger, R. S. Ores: Lowering barriers with participatory machine learning in wikipedia.Proceedings of the ACM on Human-Computer Interaction 4 , CSCW2 (2020), 1–37. Publisher: ACM New York, NY, USA
work page 2020
-
[13]
Wikipedia, Critical Social Theory, and the Possibility of Rational Discourse
Hansen, S., Berente, N., and Lyytinen, K. Wikipedia, Critical Social Theory, and the Possibility of Rational Discourse
- [14]
-
[15]
Should ChatGPT Be Used to Write Wikipedia Articles? Slate (Jan
Harrison, S. Should ChatGPT Be Used to Write Wikipedia Articles? Slate (Jan. 2023)
work page 2023
-
[16]
Irving, G., Christiano, P., and Amodei, D. AI safety via debate, Oct. 2018. arXiv:1805.00899 [cs, stat]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[17]
Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator
Jhaver, S., Birman, I., Gilbert, E., and Bruckman, A. Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator. ACM Transactions on Computer-Human Interaction 26 , 5 (July 2019), 31:1–31:35
work page 2019
-
[18]
R., Rocktäschel, T., and Perez, E
Khan, A., Hughes, J., V alentine, D., Ruis, L., Sachan, K., Radhakrishnan, A., Grefenstette, E., Bowman, S. R., Rocktäschel, T., and Perez, E. Debating with More Persuasive LLMs Leads to More Truthful Answers, May 2024. arXiv:2402.06782 [cs]
-
[19]
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., V ardhamanan, S., Haq, S., Sharma, A., Joshi, T. T., Moazam, H., Miller, H., Zaharia, M., and Potts, C. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines, Oct. 2023. arXiv:2310.03714 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[20]
Kittur, A., and Kraut, R. E. Harnessing the wisdom of crowds in wikipedia: quality through coordination. In Proceedings of the 2008 ACM conference on Computer supported cooperative work (San Diego CA USA, Nov. 2008), CSCW ’08, ACM, pp. 37–46
work page 2008
-
[21]
Kittur, A., Suh, B., Pendleton, B. A., and Chi, E. H. He says, she says: conflict and coordination in Wikipedia. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose California USA, Apr. 2007), ACM, pp. 453–462
work page 2007
-
[22]
Kolla, M., Salunkhe, S., Chandrasekharan, E., and Saha, K. LLM-Mod: Can Large Language Models Assist Content Moderation? In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (New York, NY, USA, May 2024), CHI EA ’24, Association for Computing Machinery, pp. 1–8
work page 2024
-
[23]
Kumar, D., AbuHashem, Y. A., and Durumeric, Z. Watch Your Language: Investigating Content Moderation with Large Language Models. Proceedings of the International AAAI Conference on Web and Social Media 18 (May 2024), 865–878
work page 2024
-
[24]
G., Dasgupta, I., Marjieh, R., Hu, M
Kumar, S., Correa, C. G., Dasgupta, I., Marjieh, R., Hu, M. Y., Hawkins, R. D., Daw, N. D., Cohen, J. D., Narasimhan, K., and Griffiths, T. L. Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines, Feb. 2023
work page 2023
-
[25]
Detection of Propaganda Using Logistic Regression
Li, J., Ye, Z., and Xiao, L. Detection of Propaganda Using Logistic Regression. In Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda (Hong Kong, China, Nov. 2019), A. Feldman, G. Da San Martino, A. Barrón-Cedeño, C. Brew, C. Leberknight, and P. Nakov, Eds., Association for Co...
work page 2019
-
[26]
Lin, C.-Y., and Och, F. J. ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation. In COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics (Geneva, Switzerland, Aug. 2004), COLING, pp. 501–507
work page 2004
-
[27]
Ma, H., Zhang, C., Fu, H., Zhao, P., and Wu, B. Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning, Mar. 2024. arXiv:2310.03400 [cs]
-
[28]
Matei, S. A., and Dobrescu, C. Wikipedia’s “Neutral Point of View”: Settling Conflict through Ambiguity. The Information Society 27, 1 (Jan. 2011), 40–51. Publisher: Routledge _eprint: https://doi.org/10.1080/01972243.2011.534368
-
[29]
McDowell, Z. J., and Vetter, M. A. It Takes a Village to Combat a Fake News Army: Wikipedia’s Community and Policies for Information Literacy. Social Media + Society 6 , 3 (July 2020), 2056305120937309. Publisher: SAGE Publications Ltd
work page 2020
-
[30]
What do you think? the structuring of an online community as a collective-sensemaking process
Nagar, Y. What do you think? the structuring of an online community as a collective-sensemaking process. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work (New York, NY, USA, Feb. 2012), CSCW ’12, Association for Computing Machinery, pp. 393–402
work page 2012
-
[31]
M., Uppala, A., Sieber, M., Grabitz, P., Mordaunt, M., and Rife, S
Nicholson, J. M., Uppala, A., Sieber, M., Grabitz, P., Mordaunt, M., and Rife, S. C. Measuring the quality of scientific references in Wikipedia: an analysis of more than 115M citations to over 800 000 scientific articles. The FEBS 24 Ashkinaze et al. journal 288, 14 (2021), 4242–4248. Publisher: Wiley Online Library
work page 2021
-
[32]
Sample size issues for conjoint analysis studies
Orme, B. Sample size issues for conjoint analysis studies. Sequim: Sawtooth Software Technical Paper (1998)
work page 1998
-
[33]
Bleu: a Method for Automatic Evaluation of Machine Translation
Papineni, K., Roukos, S., W ard, T., and Zhu, W.-J. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (Philadelphia, Pennsylvania, USA, July 2002), P. Isabelle, E. Charniak, and D. Lin, Eds., Association for Computational Linguistics, pp. 311–318
work page 2002
-
[34]
Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A., and Booth, R. J. The Development and Psychometric Properties of LIWC2007. Tech. rep., 2007
work page 2007
-
[35]
D., Dass, N., Kurohashi, S., Jurafsky, D., and Y ang, D
Pryzant, R., Martinez, R. D., Dass, N., Kurohashi, S., Jurafsky, D., and Y ang, D. Automatically neutralizing subjective bias in text. InProceedings of the aaai conference on artificial intelligence(Dec. 2020), vol. 34, arXiv, pp. 480–489. Issue: 01
work page 2020
-
[36]
Automatic detection of online abuse and analysis of problematic users in wikipedia
Rawat, C., Sarkar, A., Singh, S., Alvarado, R., and Rasberry, L. Automatic detection of online abuse and analysis of problematic users in wikipedia. In 2019 Systems and Information Engineering Design Symposium (SIEDS) (2019), IEEE, pp. 1–6
work page 2019
- [37]
-
[38]
Linguistic models for analyzing and detecting biased language
Recasens, M., Danescu-Niculescu-Mizil, C., and Jurafsky, D. Linguistic models for analyzing and detecting biased language. In Proceedings of the 51st annual meeting of the Association for Computational Linguistics (volume 1: long papers) (Sofia, Bulgaria, Aug. 2013), H. Schuetze, P. Fung, and M. Poesio, Eds., Association for Computational Linguistics, pp....
work page 2013
-
[39]
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Aug
Reimers, N., and Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Aug. 2019
work page 2019
-
[40]
Sathe, A., Ather, S., Le, T. M., Perry, N., and Park, J. Automated fact-checking of claims from Wikipedia. In Proceedings of the Twelfth Language Resources and Evaluation Conference (2020), pp. 6874–6882
work page 2020
-
[41]
Schmahl, K. G., Viering, T. J., Makrodimitris, S., Jahfari, A. N., Tax, D., and Loog, M. Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings. In Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science (2020), pp. 94–103
work page 2020
-
[42]
Scott, J. C. Seeing like a state : how certain schemes to improve the human condition have failed . Yale agrarian studies
-
[43]
Publisher: Yale University Press
-
[44]
Top Websites in the World - April 2024 Most Visited & Popular Rankings
Semrush. Top Websites in the World - April 2024 Most Visited & Popular Rankings
work page 2024
-
[45]
Shang, L., and Chandra, Y. DCE Data Analysis Using R. In Discrete Choice Experiments Using R: A How-To Guide for Social and Managerial Sciences , L. Shang and Y. Chandra, Eds. Springer Nature, Singapore, 2023, pp. 157–181
work page 2023
-
[46]
Steinsson, S. Rule Ambiguity, Institutional Clashes, and Population Loss: How Wikipedia Became the Last Good Place on the Internet. American Political Science Review 118 , 1 (Feb. 2024), 235–251
work page 2024
-
[47]
Suchman, L. A. Plans and situated actions: the problem of human-machine communication . Plans and situated actions: The problem of human-machine communication. Cambridge University Press, USA, Nov. 1987
work page 1987
-
[48]
The collaborative construction of "fact" on Wikipedia
Swarts, J. The collaborative construction of "fact" on Wikipedia. In Proceedings of the 27th ACM international conference on Design of communication (Bloomington Indiana USA, Oct. 2009), ACM, pp. 281–288
work page 2009
-
[49]
B., W attenberg, M., and Dave, K
Viégas, F. B., W attenberg, M., and Dave, K. Studying cooperation and conflict between authors with history flow visualizations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, NY, USA, Apr. 2004), CHI ’04, Association for Computing Machinery, pp. 575–582
work page 2004
-
[50]
W ales, J.Jimmy Wales: The birth of Wikipedia | TED Talk, 2006
work page 2006
-
[51]
Assessing the quality of information on wikipedia: A deep-learning approach
W ang, P., and Li, X. Assessing the quality of information on wikipedia: A deep-learning approach. Journal of the Association for Information Science and Technology 71 , 1 (2020), 16–28. Publisher: Wiley Online Library
work page 2020
-
[52]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jan
Wei, J., W ang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jan. 2023
work page 2023
-
[53]
Wikipedia. Core content policies, Dec. 2023. Page Version ID: 1187927074
work page 2023
-
[54]
Wikipedia: Automated Moderation, 2024
Wikipedia. Wikipedia: Automated Moderation, 2024
work page 2024
- [55]
-
[56]
Wikipedia:Neutral point of view, June 2024
Wikipedia. Wikipedia:Neutral point of view, June 2024. Page Version ID: 1226843190
work page 2024
-
[57]
Wikipedia:NPOV tutorial, May 2024
Wikipedia. Wikipedia:NPOV tutorial, May 2024. Page Version ID: 1222446643
work page 2024
-
[58]
Zheng, R., Dou, S., Gao, S., Hua, Y., Shen, W., W ang, B., Liu, Y., Jin, S., Liu, Q., Zhou, Y., Xiong, L., Chen, L., Xi, Z., Xu, N., Lai, W., Zhu, M., Chang, C., Yin, Z., Weng, R., Cheng, W., Huang, H., Sun, T., Y an, H., Gui, T., Zhang, Q., Qiu, X., and Huang, X. Secrets of RLHF in Large Language Models Part I: PPO, July 2023. Seeing Like an AI 25 Table ...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.