Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

Ceren Budak; Eric Gilbert; Eytan Adar; Joshua Ashkinaze; Laura Kurek; Ruijia Guan

arxiv: 2407.04183 · v5 · submitted 2024-07-04 · 💻 cs.CL · cs.AI· cs.CY· cs.HC

Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

Joshua Ashkinaze , Ruijia Guan , Laura Kurek , Eytan Adar , Ceren Budak , Eric Gilbert This is my paper

Pith reviewed 2026-05-23 23:01 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CYcs.HC

keywords LLMsWikipediaNPOV policybias detectionedit correctioncontent moderationcommunity normsAI alignment

0 comments

The pith

LLMs apply Wikipedia neutrality rules in ways that appeal to the public but differ from expert editors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how well large language models can identify and fix biased language in Wikipedia articles according to the site's Neutral Point of View policy. The models show low accuracy in spotting bias but remove most of the language that human editors target when rewriting. However, they also make additional edits that go beyond neutrality, and general crowdworkers rate the AI outputs as more neutral and fluent than the human edits. A sympathetic reader would care because this suggests AI tools might follow community rules in a way that feels good to outsiders yet creates extra work for the actual community members who maintain the platform.

Core claim

LLMs achieve only 64% accuracy when detecting biased edits on a balanced dataset and exhibit different tendencies to under- or over-predict bias. On the correction task, they remove 79% of the words removed by Wikipedia editors but make additional changes that result in high-recall but low-precision edits. Crowdworkers judge the LLM rewrites as more neutral in 70% of cases and more fluent in 61% of cases compared to the editor versions, although the models sometimes apply neutrality more broadly while also introducing unrelated modifications such as grammar corrections.

What carries the argument

Evaluation of LLMs on two tasks—detecting biased Wikipedia edits and correcting them to follow NPOV—measured against actual Wikipedia editor actions and crowdworker assessments of neutrality and fluency.

If this is right

LLMs may be effective for generating neutral content but will likely require human oversight to verify added material.
Application of NPOV by LLMs resonates more with public judgments than with those of Wikipedia community experts.
Adoption of LLMs could reduce the agency of Wikipedia editors in determining article tone and content.
Moderation workload may rise because of the need to check for extraneous changes introduced by the models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Platforms with volunteer moderation might face similar issues when integrating LLMs if their norms are hard to specify precisely.
Future work could test whether fine-tuning on expert editor decisions improves alignment with community standards over general crowd feedback.
Broader use of AI for rule application could shift how specialized groups maintain their content standards over time.

Load-bearing premise

Crowdworker judgments of neutrality and fluency serve as a good stand-in for how experienced Wikipedia editors would evaluate adherence to NPOV policy.

What would settle it

A direct comparison in which experienced Wikipedia editors evaluate the neutrality of the AI rewrites versus the human editor rewrites on the same set of articles would test whether the crowdworker preference holds for the community.

Figures

Figures reproduced from arXiv: 2407.04183 by Ceren Budak, Eric Gilbert, Eytan Adar, Joshua Ashkinaze, Laura Kurek, Ruijia Guan.

**Figure 2.** Figure 2: Comparison of model performance using confusion matrices and binomial distribution tests. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Edit difficulty was bimodal and models were more accurate for biased edits. [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Words in an explanation with the most negative and most positive logit coefficients after a TF-IDF [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Automated metrics regarding the intensity of edits. The horizontal line is the average for Wikipedian [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: AI tends to neutralize edits via adding words and humans tend to neutralize edits via removing words. [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Experiment results for neutrality and fluency variables. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Comparing AI neutralizations to Wikipedia editor neutralizations. [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: AI had both more removals and more additions than human editors. Error bars are 95% CIs. [PITH_FULL_IMAGE:figures/full_fig_p033_9.png] view at source ↗

**Figure 10.** Figure 10: To rule out that our generation findings were dependent on seemingly minor analytical choices we [PITH_FULL_IMAGE:figures/full_fig_p034_10.png] view at source ↗

**Figure 11.** Figure 11: At the end of the experiment, participants guessed (via 0-100 slider) how often others chose the AI [PITH_FULL_IMAGE:figures/full_fig_p036_11.png] view at source ↗

read the original abstract

Large language models (LLMs) are trained on broad corpora and then used in communities with specialized norms. Is providing LLMs with community rules enough for models to follow these norms? We evaluate LLMs' capacity to detect (Task 1) and correct (Task 2) biased Wikipedia edits according to Wikipedia's Neutral Point of View (NPOV) policy. LLMs struggled with bias detection, achieving only 64% accuracy on a balanced dataset. Models exhibited contrasting biases (some under- and others over-predicted bias), suggesting distinct priors about neutrality. LLMs performed better at generation, removing 79% of words removed by Wikipedia editors. However, LLMs made additional changes beyond Wikipedia editors' simpler neutralizations, resulting in high-recall but low-precision editing. Interestingly, crowdworkers rated AI rewrites as more neutral (70%) and fluent (61%) than Wikipedia-editor rewrites. Qualitative analysis found LLMs sometimes applied NPOV more comprehensively than Wikipedia editors but often made extraneous non-NPOV-related changes (such as grammar). LLMs may apply rules in ways that resonate with the public but diverge from community experts. While potentially effective for generation, LLMs may reduce editor agency and increase moderation workload (e.g., verifying additions). Even when rules are easy to articulate, having LLMs apply them like community members may still be difficult.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper evaluates LLMs on two tasks using Wikipedia edits: detecting biased edits per NPOV policy (Task 1, achieving 64% accuracy on a balanced dataset with varying model biases) and correcting them (Task 2, removing 79% of words targeted by editors but with additional extraneous changes). Crowdworker evaluations rate LLM rewrites as more neutral (70%) and fluent (61%) than editor rewrites. Qualitative analysis notes LLMs sometimes apply NPOV more comprehensively but often introduce non-NPOV changes like grammar fixes. The central claim is that LLMs apply NPOV norms in ways resonating with public preferences but diverging from community experts, with potential impacts on editor agency and moderation workload.

Significance. If the empirical results hold after addressing validation gaps, the work contributes concrete metrics on LLM norm application in a real specialized community (Wikipedia NPOV), highlighting alignment challenges even with explicit rules. Strengths include the dual-task design, word-overlap quantification, and mixed quantitative-qualitative approach to public vs. expert divergence. This could inform AI-assisted moderation research in computational social science and NLP.

major comments (2)

[§4 (Results, crowdworker study)] §4 (Results, crowdworker study): The central claim that LLMs 'resonate with the public but diverge from community experts' rests on crowdworkers rating AI rewrites higher in neutrality/fluency, but no direct comparison or validation is provided against experienced Wikipedia editors' judgments of NPOV policy adherence. This leaves the divergence interpretation without expert-grounded evidence.
[§3 (Methods)] §3 (Methods): Dataset construction for the balanced Task 1 set, selection criteria for biased edits, prompt engineering details for LLMs, and any statistical tests or inter-rater reliability metrics for human evaluations are not described. These omissions are load-bearing for interpreting the 64% accuracy and 79% removal figures as evidence of LLM behavior.

minor comments (2)

[Abstract and §2] Abstract and §2: Clarify the exact definition of 'balanced dataset' and how 'contrasting biases' across models were quantified to aid reader interpretation.
[Results figures/tables] Figure or table presenting the 70%/61% crowdworker preferences: Include confidence intervals or sample sizes for the ratings to support the reported percentages.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments identify key areas where additional detail and clarification will strengthen the manuscript. We respond to each major comment below.

read point-by-point responses

Referee: [§4 (Results, crowdworker study)] The central claim that LLMs 'resonate with the public but diverge from community experts' rests on crowdworkers rating AI rewrites higher in neutrality/fluency, but no direct comparison or validation is provided against experienced Wikipedia editors' judgments of NPOV policy adherence. This leaves the divergence interpretation without expert-grounded evidence.

Authors: The claim of divergence from community experts is supported by the quantitative finding that LLMs make additional changes beyond those of Wikipedia editors (high-recall, low-precision editing) together with the qualitative analysis of non-NPOV edits such as grammar fixes. Crowdworker ratings supply the public-preference comparison. We agree that the interpretation would be stronger with direct NPOV-adherence ratings from experienced Wikipedia editors on the LLM rewrites. Because no such expert ratings were collected, we will revise the manuscript to state this limitation explicitly and to clarify the evidential basis for the current interpretation. revision: partial
Referee: [§3 (Methods)] Dataset construction for the balanced Task 1 set, selection criteria for biased edits, prompt engineering details for LLMs, and any statistical tests or inter-rater reliability metrics for human evaluations are not described. These omissions are load-bearing for interpreting the 64% accuracy and 79% removal figures as evidence of LLM behavior.

Authors: We appreciate the referee highlighting these gaps. The revised manuscript will expand §3 to include: (i) the construction and balancing procedure for the Task 1 dataset, (ii) the criteria used to select biased edits, (iii) the exact prompts and few-shot examples provided to each LLM, and (iv) the statistical tests performed together with inter-rater reliability statistics (e.g., Fleiss’ kappa or equivalent) for the crowdworker evaluations. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical evaluation with no derivations or self-referential predictions

full rationale

The paper conducts new experiments on LLMs for bias detection (64% accuracy) and correction (79% word removal overlap), plus crowdworker ratings of neutrality/fluency and qualitative analysis. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the derivation chain. All claims rest on the described tasks, human ratings, and direct comparisons to Wikipedia editor edits, making the work self-contained against external benchmarks with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As an empirical evaluation study, the central claims rest on standard assumptions about task operationalization and human judgment validity rather than new mathematical constructs or fitted parameters.

axioms (1)

domain assumption Wikipedia's NPOV policy can be operationalized into binary bias detection and text correction tasks that capture the policy's intent
The evaluation framework depends on this mapping from policy text to the two defined tasks.

pith-pipeline@v0.9.0 · 5805 in / 1335 out tokens · 30907 ms · 2026-05-23T23:01:36.159300+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 3 internal anchors

[1]

Constitutional AI: Harmlessness from AI Feedback

Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., Kerr, J., Mueller, J., Ladish, J., Landau, J., Ndousse, K., Lukosuite, K., Lovitt, L., Sellitto, M., Elhage, N., Schiefer, N., ...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[2]

Barbarestani, B., Maks, I., and Vossen, P. T. Content Moderation in Online Platforms: A Study of Annotation Methods for Inappropriate Language. In Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying LREC-COLING-2024 (Torino, Italia, May 2024), R. Kumar, A. K. Ojha, S. Malmasi, B. R. Chakravarthi, B. Lahiri, S. Singh, and S. Ratan, Eds...

work page 2024
[3]

Wiki-Gendersort: Automatic gender detection using first names in Wikipedia

Bérubé, N., Ghiasi, G., Sainte-Marie, M., and others . Wiki-Gendersort: Automatic gender detection using first names in Wikipedia. Publisher: OSF

work page
[4]

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A.,...

work page 2020
[5]

L., Forte, A., and Bruckman, A

Bryant, S. L., Forte, A., and Bruckman, A. Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In Proceedings of the 2005 ACM International Conference on Supporting Group Work (New York, NY, USA, Nov. 2005), GROUP ’05, Association for Computing Machinery, pp. 1–10

work page 2005
[6]

Don’t look now, but we’ve created a bureaucracy: the nature and roles of policies and rules in wikipedia

Butler, B., Joyce, E., and Pike, J. Don’t look now, but we’ve created a bureaucracy: the nature and roles of policies and rules in wikipedia. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, NY, USA, Apr. 2008), CHI ’08, Association for Computing Machinery, pp. 1101–1110

work page 2008
[7]

T., Domingo, L.-F., Gilbert, S

Cao, Y. T., Domingo, L.-F., Gilbert, S. A., Mazurek, M., Shilton, K., and Daumé III, H. Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting Volunteer Content Moderators, Feb. 2024. arXiv:2311.07879 [cs]

work page arXiv 2024
[8]

Why People Trust Wikipedia Articles: Credibility Assessment Strategies Seeing Like an AI 23 Used by Readers

Elmimouni, H., Forte, A., and Morgan, J. Why People Trust Wikipedia Articles: Credibility Assessment Strategies Seeing Like an AI 23 Used by Readers. In Proceedings of the 18th International Symposium on Open Collaboration (Madrid Spain, Sept. 2022), ACM, pp. 1–10

work page 2022
[9]

ChatGPT outperforms crowd workers for text-annotation tasks

Gilardi, F., Alizadeh, M., and Kubli, M. ChatGPT outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences 120 , 30 (July 2023), e2305016120. Publisher: Proceedings of the National Academy of Sciences

work page 2023
[10]

Ideology and Composition Among an Online Crowd: Evidence from Wikipedians

Greenstein, S., Gu, G., and Zhu, F. Ideology and Composition Among an Online Crowd: Evidence from Wikipedians. Management Science 67, 5 (May 2021), 3067–3086. Publisher: INFORMS

work page 2021
[11]

Do Experts or Crowd-Based Models Produce More Bias? Evidence from Encyclopedia Britannica and Wikipedia

Greenstein, S., and Zhu, F. Do Experts or Crowd-Based Models Produce More Bias? Evidence from Encyclopedia Britannica and Wikipedia. MIS Quarterly 42, 3 (Mar. 2018), 945–959

work page 2018
[12]

Halfaker, A., and Geiger, R. S. Ores: Lowering barriers with participatory machine learning in wikipedia.Proceedings of the ACM on Human-Computer Interaction 4 , CSCW2 (2020), 1–37. Publisher: ACM New York, NY, USA

work page 2020
[13]

Wikipedia, Critical Social Theory, and the Possibility of Rational Discourse

Hansen, S., Berente, N., and Lyytinen, K. Wikipedia, Critical Social Theory, and the Possibility of Rational Discourse

work page
[14]

2009), 38–59

The Information Society 25 , 1 (Jan. 2009), 38–59

work page 2009
[15]

Should ChatGPT Be Used to Write Wikipedia Articles? Slate (Jan

Harrison, S. Should ChatGPT Be Used to Write Wikipedia Articles? Slate (Jan. 2023)

work page 2023
[16]

AI safety via debate

Irving, G., Christiano, P., and Amodei, D. AI safety via debate, Oct. 2018. arXiv:1805.00899 [cs, stat]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[17]

Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator

Jhaver, S., Birman, I., Gilbert, E., and Bruckman, A. Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator. ACM Transactions on Computer-Human Interaction 26 , 5 (July 2019), 31:1–31:35

work page 2019
[18]

R., Rocktäschel, T., and Perez, E

Khan, A., Hughes, J., V alentine, D., Ruis, L., Sachan, K., Radhakrishnan, A., Grefenstette, E., Bowman, S. R., Rocktäschel, T., and Perez, E. Debating with More Persuasive LLMs Leads to More Truthful Answers, May 2024. arXiv:2402.06782 [cs]

work page arXiv 2024
[19]

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., V ardhamanan, S., Haq, S., Sharma, A., Joshi, T. T., Moazam, H., Miller, H., Zaharia, M., and Potts, C. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines, Oct. 2023. arXiv:2310.03714 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

Kittur, A., and Kraut, R. E. Harnessing the wisdom of crowds in wikipedia: quality through coordination. In Proceedings of the 2008 ACM conference on Computer supported cooperative work (San Diego CA USA, Nov. 2008), CSCW ’08, ACM, pp. 37–46

work page 2008
[21]

A., and Chi, E

Kittur, A., Suh, B., Pendleton, B. A., and Chi, E. H. He says, she says: conflict and coordination in Wikipedia. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose California USA, Apr. 2007), ACM, pp. 453–462

work page 2007
[22]

Kolla, M., Salunkhe, S., Chandrasekharan, E., and Saha, K. LLM-Mod: Can Large Language Models Assist Content Moderation? In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (New York, NY, USA, May 2024), CHI EA ’24, Association for Computing Machinery, pp. 1–8

work page 2024
[23]

A., and Durumeric, Z

Kumar, D., AbuHashem, Y. A., and Durumeric, Z. Watch Your Language: Investigating Content Moderation with Large Language Models. Proceedings of the International AAAI Conference on Web and Social Media 18 (May 2024), 865–878

work page 2024
[24]

G., Dasgupta, I., Marjieh, R., Hu, M

Kumar, S., Correa, C. G., Dasgupta, I., Marjieh, R., Hu, M. Y., Hawkins, R. D., Daw, N. D., Cohen, J. D., Narasimhan, K., and Griffiths, T. L. Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines, Feb. 2023

work page 2023
[25]

Detection of Propaganda Using Logistic Regression

Li, J., Ye, Z., and Xiao, L. Detection of Propaganda Using Logistic Regression. In Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda (Hong Kong, China, Nov. 2019), A. Feldman, G. Da San Martino, A. Barrón-Cedeño, C. Brew, C. Leberknight, and P. Nakov, Eds., Association for Co...

work page 2019
[26]

Lin, C.-Y., and Och, F. J. ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation. In COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics (Geneva, Switzerland, Aug. 2004), COLING, pp. 501–507

work page 2004
[27]

Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning, Mar

Ma, H., Zhang, C., Fu, H., Zhao, P., and Wu, B. Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning, Mar. 2024. arXiv:2310.03400 [cs]

work page arXiv 2024
[28]

Neutral Point of View

Matei, S. A., and Dobrescu, C. Wikipedia’s “Neutral Point of View”: Settling Conflict through Ambiguity. The Information Society 27, 1 (Jan. 2011), 40–51. Publisher: Routledge _eprint: https://doi.org/10.1080/01972243.2011.534368

work page doi:10.1080/01972243.2011.534368 2011
[29]

J., and Vetter, M

McDowell, Z. J., and Vetter, M. A. It Takes a Village to Combat a Fake News Army: Wikipedia’s Community and Policies for Information Literacy. Social Media + Society 6 , 3 (July 2020), 2056305120937309. Publisher: SAGE Publications Ltd

work page 2020
[30]

What do you think? the structuring of an online community as a collective-sensemaking process

Nagar, Y. What do you think? the structuring of an online community as a collective-sensemaking process. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work (New York, NY, USA, Feb. 2012), CSCW ’12, Association for Computing Machinery, pp. 393–402

work page 2012
[31]

M., Uppala, A., Sieber, M., Grabitz, P., Mordaunt, M., and Rife, S

Nicholson, J. M., Uppala, A., Sieber, M., Grabitz, P., Mordaunt, M., and Rife, S. C. Measuring the quality of scientific references in Wikipedia: an analysis of more than 115M citations to over 800 000 scientific articles. The FEBS 24 Ashkinaze et al. journal 288, 14 (2021), 4242–4248. Publisher: Wiley Online Library

work page 2021
[32]

Sample size issues for conjoint analysis studies

Orme, B. Sample size issues for conjoint analysis studies. Sequim: Sawtooth Software Technical Paper (1998)

work page 1998
[33]

Bleu: a Method for Automatic Evaluation of Machine Translation

Papineni, K., Roukos, S., W ard, T., and Zhu, W.-J. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (Philadelphia, Pennsylvania, USA, July 2002), P. Isabelle, E. Charniak, and D. Lin, Eds., Association for Computational Linguistics, pp. 311–318

work page 2002
[34]

W., Chung, C

Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A., and Booth, R. J. The Development and Psychometric Properties of LIWC2007. Tech. rep., 2007

work page 2007
[35]

D., Dass, N., Kurohashi, S., Jurafsky, D., and Y ang, D

Pryzant, R., Martinez, R. D., Dass, N., Kurohashi, S., Jurafsky, D., and Y ang, D. Automatically neutralizing subjective bias in text. InProceedings of the aaai conference on artificial intelligence(Dec. 2020), vol. 34, arXiv, pp. 480–489. Issue: 01

work page 2020
[36]

Automatic detection of online abuse and analysis of problematic users in wikipedia

Rawat, C., Sarkar, A., Singh, S., Alvarado, R., and Rasberry, L. Automatic detection of online abuse and analysis of problematic users in wikipedia. In 2019 Systems and Information Engineering Design Symposium (SIEDS) (2019), IEEE, pp. 1–6

work page 2019
[37]

Is the Wikipedia Neutral?, Apr

Reagle, J. Is the Wikipedia Neutral?, Apr. 2007

work page 2007
[38]

Linguistic models for analyzing and detecting biased language

Recasens, M., Danescu-Niculescu-Mizil, C., and Jurafsky, D. Linguistic models for analyzing and detecting biased language. In Proceedings of the 51st annual meeting of the Association for Computational Linguistics (volume 1: long papers) (Sofia, Bulgaria, Aug. 2013), H. Schuetze, P. Fung, and M. Poesio, Eds., Association for Computational Linguistics, pp....

work page 2013
[39]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Aug

Reimers, N., and Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Aug. 2019

work page 2019
[40]

M., Perry, N., and Park, J

Sathe, A., Ather, S., Le, T. M., Perry, N., and Park, J. Automated fact-checking of claims from Wikipedia. In Proceedings of the Twelfth Language Resources and Evaluation Conference (2020), pp. 6874–6882

work page 2020
[41]

G., Viering, T

Schmahl, K. G., Viering, T. J., Makrodimitris, S., Jahfari, A. N., Tax, D., and Loog, M. Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings. In Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science (2020), pp. 94–103

work page 2020
[42]

Scott, J. C. Seeing like a state : how certain schemes to improve the human condition have failed . Yale agrarian studies

work page
[43]

Publisher: Yale University Press

work page
[44]

Top Websites in the World - April 2024 Most Visited & Popular Rankings

Semrush. Top Websites in the World - April 2024 Most Visited & Popular Rankings

work page 2024
[45]

DCE Data Analysis Using R

Shang, L., and Chandra, Y. DCE Data Analysis Using R. In Discrete Choice Experiments Using R: A How-To Guide for Social and Managerial Sciences , L. Shang and Y. Chandra, Eds. Springer Nature, Singapore, 2023, pp. 157–181

work page 2023
[46]

Rule Ambiguity, Institutional Clashes, and Population Loss: How Wikipedia Became the Last Good Place on the Internet

Steinsson, S. Rule Ambiguity, Institutional Clashes, and Population Loss: How Wikipedia Became the Last Good Place on the Internet. American Political Science Review 118 , 1 (Feb. 2024), 235–251

work page 2024
[47]

Suchman, L. A. Plans and situated actions: the problem of human-machine communication . Plans and situated actions: The problem of human-machine communication. Cambridge University Press, USA, Nov. 1987

work page 1987
[48]

The collaborative construction of "fact" on Wikipedia

Swarts, J. The collaborative construction of "fact" on Wikipedia. In Proceedings of the 27th ACM international conference on Design of communication (Bloomington Indiana USA, Oct. 2009), ACM, pp. 281–288

work page 2009
[49]

B., W attenberg, M., and Dave, K

Viégas, F. B., W attenberg, M., and Dave, K. Studying cooperation and conflict between authors with history flow visualizations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, NY, USA, Apr. 2004), CHI ’04, Association for Computing Machinery, pp. 575–582

work page 2004
[50]

W ales, J.Jimmy Wales: The birth of Wikipedia | TED Talk, 2006

work page 2006
[51]

Assessing the quality of information on wikipedia: A deep-learning approach

W ang, P., and Li, X. Assessing the quality of information on wikipedia: A deep-learning approach. Journal of the Association for Information Science and Technology 71 , 1 (2020), 16–28. Publisher: Wiley Online Library

work page 2020
[52]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jan

Wei, J., W ang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jan. 2023

work page 2023
[53]

Core content policies, Dec

Wikipedia. Core content policies, Dec. 2023. Page Version ID: 1187927074

work page 2023
[54]

Wikipedia: Automated Moderation, 2024

Wikipedia. Wikipedia: Automated Moderation, 2024

work page 2024
[55]

Wikipedia: WikiTrust, 2024

Wikipedia. Wikipedia: WikiTrust, 2024

work page 2024
[56]

Wikipedia:Neutral point of view, June 2024

Wikipedia. Wikipedia:Neutral point of view, June 2024. Page Version ID: 1226843190

work page 2024
[57]

Wikipedia:NPOV tutorial, May 2024

Wikipedia. Wikipedia:NPOV tutorial, May 2024. Page Version ID: 1222446643

work page 2024
[58]

response

Zheng, R., Dou, S., Gao, S., Hua, Y., Shen, W., W ang, B., Liu, Y., Jin, S., Liu, Q., Zhou, Y., Xiong, L., Chen, L., Xi, Z., Xu, N., Lai, W., Zhu, M., Chang, C., Yin, Z., Weng, R., Cheng, W., Huang, H., Sun, T., Y an, H., Gui, T., Zhang, Q., Qiu, X., and Huang, X. Secrets of RLHF in Large Language Models Part I: PPO, July 2023. Seeing Like an AI 25 Table ...

work page 2023

[1] [1]

Constitutional AI: Harmlessness from AI Feedback

Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., Kerr, J., Mueller, J., Ladish, J., Landau, J., Ndousse, K., Lukosuite, K., Lovitt, L., Sellitto, M., Elhage, N., Schiefer, N., ...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[2] [2]

Barbarestani, B., Maks, I., and Vossen, P. T. Content Moderation in Online Platforms: A Study of Annotation Methods for Inappropriate Language. In Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying LREC-COLING-2024 (Torino, Italia, May 2024), R. Kumar, A. K. Ojha, S. Malmasi, B. R. Chakravarthi, B. Lahiri, S. Singh, and S. Ratan, Eds...

work page 2024

[3] [3]

Wiki-Gendersort: Automatic gender detection using first names in Wikipedia

Bérubé, N., Ghiasi, G., Sainte-Marie, M., and others . Wiki-Gendersort: Automatic gender detection using first names in Wikipedia. Publisher: OSF

work page

[4] [4]

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A.,...

work page 2020

[5] [5]

L., Forte, A., and Bruckman, A

Bryant, S. L., Forte, A., and Bruckman, A. Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In Proceedings of the 2005 ACM International Conference on Supporting Group Work (New York, NY, USA, Nov. 2005), GROUP ’05, Association for Computing Machinery, pp. 1–10

work page 2005

[6] [6]

Don’t look now, but we’ve created a bureaucracy: the nature and roles of policies and rules in wikipedia

Butler, B., Joyce, E., and Pike, J. Don’t look now, but we’ve created a bureaucracy: the nature and roles of policies and rules in wikipedia. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, NY, USA, Apr. 2008), CHI ’08, Association for Computing Machinery, pp. 1101–1110

work page 2008

[7] [7]

T., Domingo, L.-F., Gilbert, S

Cao, Y. T., Domingo, L.-F., Gilbert, S. A., Mazurek, M., Shilton, K., and Daumé III, H. Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting Volunteer Content Moderators, Feb. 2024. arXiv:2311.07879 [cs]

work page arXiv 2024

[8] [8]

Why People Trust Wikipedia Articles: Credibility Assessment Strategies Seeing Like an AI 23 Used by Readers

Elmimouni, H., Forte, A., and Morgan, J. Why People Trust Wikipedia Articles: Credibility Assessment Strategies Seeing Like an AI 23 Used by Readers. In Proceedings of the 18th International Symposium on Open Collaboration (Madrid Spain, Sept. 2022), ACM, pp. 1–10

work page 2022

[9] [9]

ChatGPT outperforms crowd workers for text-annotation tasks

Gilardi, F., Alizadeh, M., and Kubli, M. ChatGPT outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences 120 , 30 (July 2023), e2305016120. Publisher: Proceedings of the National Academy of Sciences

work page 2023

[10] [10]

Ideology and Composition Among an Online Crowd: Evidence from Wikipedians

Greenstein, S., Gu, G., and Zhu, F. Ideology and Composition Among an Online Crowd: Evidence from Wikipedians. Management Science 67, 5 (May 2021), 3067–3086. Publisher: INFORMS

work page 2021

[11] [11]

Do Experts or Crowd-Based Models Produce More Bias? Evidence from Encyclopedia Britannica and Wikipedia

Greenstein, S., and Zhu, F. Do Experts or Crowd-Based Models Produce More Bias? Evidence from Encyclopedia Britannica and Wikipedia. MIS Quarterly 42, 3 (Mar. 2018), 945–959

work page 2018

[12] [12]

Halfaker, A., and Geiger, R. S. Ores: Lowering barriers with participatory machine learning in wikipedia.Proceedings of the ACM on Human-Computer Interaction 4 , CSCW2 (2020), 1–37. Publisher: ACM New York, NY, USA

work page 2020

[13] [13]

Wikipedia, Critical Social Theory, and the Possibility of Rational Discourse

Hansen, S., Berente, N., and Lyytinen, K. Wikipedia, Critical Social Theory, and the Possibility of Rational Discourse

work page

[14] [14]

2009), 38–59

The Information Society 25 , 1 (Jan. 2009), 38–59

work page 2009

[15] [15]

Should ChatGPT Be Used to Write Wikipedia Articles? Slate (Jan

Harrison, S. Should ChatGPT Be Used to Write Wikipedia Articles? Slate (Jan. 2023)

work page 2023

[16] [16]

AI safety via debate

Irving, G., Christiano, P., and Amodei, D. AI safety via debate, Oct. 2018. arXiv:1805.00899 [cs, stat]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[17] [17]

Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator

Jhaver, S., Birman, I., Gilbert, E., and Bruckman, A. Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator. ACM Transactions on Computer-Human Interaction 26 , 5 (July 2019), 31:1–31:35

work page 2019

[18] [18]

R., Rocktäschel, T., and Perez, E

Khan, A., Hughes, J., V alentine, D., Ruis, L., Sachan, K., Radhakrishnan, A., Grefenstette, E., Bowman, S. R., Rocktäschel, T., and Perez, E. Debating with More Persuasive LLMs Leads to More Truthful Answers, May 2024. arXiv:2402.06782 [cs]

work page arXiv 2024

[19] [19]

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., V ardhamanan, S., Haq, S., Sharma, A., Joshi, T. T., Moazam, H., Miller, H., Zaharia, M., and Potts, C. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines, Oct. 2023. arXiv:2310.03714 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2023

[20] [20]

Kittur, A., and Kraut, R. E. Harnessing the wisdom of crowds in wikipedia: quality through coordination. In Proceedings of the 2008 ACM conference on Computer supported cooperative work (San Diego CA USA, Nov. 2008), CSCW ’08, ACM, pp. 37–46

work page 2008

[21] [21]

A., and Chi, E

Kittur, A., Suh, B., Pendleton, B. A., and Chi, E. H. He says, she says: conflict and coordination in Wikipedia. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose California USA, Apr. 2007), ACM, pp. 453–462

work page 2007

[22] [22]

Kolla, M., Salunkhe, S., Chandrasekharan, E., and Saha, K. LLM-Mod: Can Large Language Models Assist Content Moderation? In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (New York, NY, USA, May 2024), CHI EA ’24, Association for Computing Machinery, pp. 1–8

work page 2024

[23] [23]

A., and Durumeric, Z

Kumar, D., AbuHashem, Y. A., and Durumeric, Z. Watch Your Language: Investigating Content Moderation with Large Language Models. Proceedings of the International AAAI Conference on Web and Social Media 18 (May 2024), 865–878

work page 2024

[24] [24]

G., Dasgupta, I., Marjieh, R., Hu, M

Kumar, S., Correa, C. G., Dasgupta, I., Marjieh, R., Hu, M. Y., Hawkins, R. D., Daw, N. D., Cohen, J. D., Narasimhan, K., and Griffiths, T. L. Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines, Feb. 2023

work page 2023

[25] [25]

Detection of Propaganda Using Logistic Regression

Li, J., Ye, Z., and Xiao, L. Detection of Propaganda Using Logistic Regression. In Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda (Hong Kong, China, Nov. 2019), A. Feldman, G. Da San Martino, A. Barrón-Cedeño, C. Brew, C. Leberknight, and P. Nakov, Eds., Association for Co...

work page 2019

[26] [26]

Lin, C.-Y., and Och, F. J. ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation. In COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics (Geneva, Switzerland, Aug. 2004), COLING, pp. 501–507

work page 2004

[27] [27]

Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning, Mar

Ma, H., Zhang, C., Fu, H., Zhao, P., and Wu, B. Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning, Mar. 2024. arXiv:2310.03400 [cs]

work page arXiv 2024

[28] [28]

Neutral Point of View

Matei, S. A., and Dobrescu, C. Wikipedia’s “Neutral Point of View”: Settling Conflict through Ambiguity. The Information Society 27, 1 (Jan. 2011), 40–51. Publisher: Routledge _eprint: https://doi.org/10.1080/01972243.2011.534368

work page doi:10.1080/01972243.2011.534368 2011

[29] [29]

J., and Vetter, M

McDowell, Z. J., and Vetter, M. A. It Takes a Village to Combat a Fake News Army: Wikipedia’s Community and Policies for Information Literacy. Social Media + Society 6 , 3 (July 2020), 2056305120937309. Publisher: SAGE Publications Ltd

work page 2020

[30] [30]

What do you think? the structuring of an online community as a collective-sensemaking process

Nagar, Y. What do you think? the structuring of an online community as a collective-sensemaking process. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work (New York, NY, USA, Feb. 2012), CSCW ’12, Association for Computing Machinery, pp. 393–402

work page 2012

[31] [31]

M., Uppala, A., Sieber, M., Grabitz, P., Mordaunt, M., and Rife, S

Nicholson, J. M., Uppala, A., Sieber, M., Grabitz, P., Mordaunt, M., and Rife, S. C. Measuring the quality of scientific references in Wikipedia: an analysis of more than 115M citations to over 800 000 scientific articles. The FEBS 24 Ashkinaze et al. journal 288, 14 (2021), 4242–4248. Publisher: Wiley Online Library

work page 2021

[32] [32]

Sample size issues for conjoint analysis studies

Orme, B. Sample size issues for conjoint analysis studies. Sequim: Sawtooth Software Technical Paper (1998)

work page 1998

[33] [33]

Bleu: a Method for Automatic Evaluation of Machine Translation

Papineni, K., Roukos, S., W ard, T., and Zhu, W.-J. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (Philadelphia, Pennsylvania, USA, July 2002), P. Isabelle, E. Charniak, and D. Lin, Eds., Association for Computational Linguistics, pp. 311–318

work page 2002

[34] [34]

W., Chung, C

Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A., and Booth, R. J. The Development and Psychometric Properties of LIWC2007. Tech. rep., 2007

work page 2007

[35] [35]

D., Dass, N., Kurohashi, S., Jurafsky, D., and Y ang, D

Pryzant, R., Martinez, R. D., Dass, N., Kurohashi, S., Jurafsky, D., and Y ang, D. Automatically neutralizing subjective bias in text. InProceedings of the aaai conference on artificial intelligence(Dec. 2020), vol. 34, arXiv, pp. 480–489. Issue: 01

work page 2020

[36] [36]

Automatic detection of online abuse and analysis of problematic users in wikipedia

Rawat, C., Sarkar, A., Singh, S., Alvarado, R., and Rasberry, L. Automatic detection of online abuse and analysis of problematic users in wikipedia. In 2019 Systems and Information Engineering Design Symposium (SIEDS) (2019), IEEE, pp. 1–6

work page 2019

[37] [37]

Is the Wikipedia Neutral?, Apr

Reagle, J. Is the Wikipedia Neutral?, Apr. 2007

work page 2007

[38] [38]

Linguistic models for analyzing and detecting biased language

Recasens, M., Danescu-Niculescu-Mizil, C., and Jurafsky, D. Linguistic models for analyzing and detecting biased language. In Proceedings of the 51st annual meeting of the Association for Computational Linguistics (volume 1: long papers) (Sofia, Bulgaria, Aug. 2013), H. Schuetze, P. Fung, and M. Poesio, Eds., Association for Computational Linguistics, pp....

work page 2013

[39] [39]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Aug

Reimers, N., and Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Aug. 2019

work page 2019

[40] [40]

M., Perry, N., and Park, J

Sathe, A., Ather, S., Le, T. M., Perry, N., and Park, J. Automated fact-checking of claims from Wikipedia. In Proceedings of the Twelfth Language Resources and Evaluation Conference (2020), pp. 6874–6882

work page 2020

[41] [41]

G., Viering, T

Schmahl, K. G., Viering, T. J., Makrodimitris, S., Jahfari, A. N., Tax, D., and Loog, M. Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings. In Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science (2020), pp. 94–103

work page 2020

[42] [42]

Scott, J. C. Seeing like a state : how certain schemes to improve the human condition have failed . Yale agrarian studies

work page

[43] [43]

Publisher: Yale University Press

work page

[44] [44]

Top Websites in the World - April 2024 Most Visited & Popular Rankings

Semrush. Top Websites in the World - April 2024 Most Visited & Popular Rankings

work page 2024

[45] [45]

DCE Data Analysis Using R

Shang, L., and Chandra, Y. DCE Data Analysis Using R. In Discrete Choice Experiments Using R: A How-To Guide for Social and Managerial Sciences , L. Shang and Y. Chandra, Eds. Springer Nature, Singapore, 2023, pp. 157–181

work page 2023

[46] [46]

Rule Ambiguity, Institutional Clashes, and Population Loss: How Wikipedia Became the Last Good Place on the Internet

Steinsson, S. Rule Ambiguity, Institutional Clashes, and Population Loss: How Wikipedia Became the Last Good Place on the Internet. American Political Science Review 118 , 1 (Feb. 2024), 235–251

work page 2024

[47] [47]

Suchman, L. A. Plans and situated actions: the problem of human-machine communication . Plans and situated actions: The problem of human-machine communication. Cambridge University Press, USA, Nov. 1987

work page 1987

[48] [48]

The collaborative construction of "fact" on Wikipedia

Swarts, J. The collaborative construction of "fact" on Wikipedia. In Proceedings of the 27th ACM international conference on Design of communication (Bloomington Indiana USA, Oct. 2009), ACM, pp. 281–288

work page 2009

[49] [49]

B., W attenberg, M., and Dave, K

Viégas, F. B., W attenberg, M., and Dave, K. Studying cooperation and conflict between authors with history flow visualizations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, NY, USA, Apr. 2004), CHI ’04, Association for Computing Machinery, pp. 575–582

work page 2004

[50] [50]

W ales, J.Jimmy Wales: The birth of Wikipedia | TED Talk, 2006

work page 2006

[51] [51]

Assessing the quality of information on wikipedia: A deep-learning approach

W ang, P., and Li, X. Assessing the quality of information on wikipedia: A deep-learning approach. Journal of the Association for Information Science and Technology 71 , 1 (2020), 16–28. Publisher: Wiley Online Library

work page 2020

[52] [52]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jan

Wei, J., W ang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jan. 2023

work page 2023

[53] [53]

Core content policies, Dec

Wikipedia. Core content policies, Dec. 2023. Page Version ID: 1187927074

work page 2023

[54] [54]

Wikipedia: Automated Moderation, 2024

Wikipedia. Wikipedia: Automated Moderation, 2024

work page 2024

[55] [55]

Wikipedia: WikiTrust, 2024

Wikipedia. Wikipedia: WikiTrust, 2024

work page 2024

[56] [56]

Wikipedia:Neutral point of view, June 2024

Wikipedia. Wikipedia:Neutral point of view, June 2024. Page Version ID: 1226843190

work page 2024

[57] [57]

Wikipedia:NPOV tutorial, May 2024

Wikipedia. Wikipedia:NPOV tutorial, May 2024. Page Version ID: 1222446643

work page 2024

[58] [58]

response

Zheng, R., Dou, S., Gao, S., Hua, Y., Shen, W., W ang, B., Liu, Y., Jin, S., Liu, Q., Zhou, Y., Xiong, L., Chen, L., Xi, Z., Xu, N., Lai, W., Zhu, M., Chang, C., Yin, Z., Weng, R., Cheng, W., Huang, H., Sun, T., Y an, H., Gui, T., Zhang, Q., Qiu, X., and Huang, X. Secrets of RLHF in Large Language Models Part I: PPO, July 2023. Seeing Like an AI 25 Table ...

work page 2023