Small edits, large models: How Wikipedia advocacy shapes LLM values

Alexa Gnauck; Jasmine Brazilek; Maria Navas

arxiv: 2606.24890 · v2 · pith:4PCZUMQEnew · submitted 2026-04-30 · 💻 cs.CL · cs.AI· cs.CY

Small edits, large models: How Wikipedia advocacy shapes LLM values

Jasmine Brazilek , Maria Navas , Alexa Gnauck This is my paper

Pith reviewed 2026-07-01 09:00 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CY

keywords Wikipedia editingLLM training datadata attributionanimal welfareinfluence estimationadvocacylanguage modelsfine-tuning

0 comments

The pith

A small group of Wikipedia editors can measurably shape how language models discuss animal welfare topics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that 125 edits by the Pro-Animal Wikipedians across 115 Wikipedia pages measurably influence large language model behavior on animal welfare queries. Wikipedia text receives heavy weight in training datasets, and the authors apply retrieval and gradient-based attribution to isolate the contribution of those specific edits. For animal welfare queries the PAW sections dominate the highest-attributed documents at 68 percent, while the same sections show no elevated influence on unrelated queries about the same entities. Fine-tuning experiments confirm the pattern: models trained on the PAW content reduce perplexity only on animal welfare text, and models trained on control content improve only on control text.

Core claim

Using TrackStar retrieval attribution on Llama 3.1 8B, PAW-edited sections made up 68 percent of the highest-attributed documents for animal welfare queries but only 52 percent for unrelated queries about the same companies. MAGIC counterfactual influence estimation on Llama-3.2-1B, repeated across five random training-order seeds, found that the top-10 most influential documents on animal welfare queries were all PAW edits in every seed, while the same top-10 sat at chance levels on general queries. Mean PAW influence exceeded mean control influence on animal welfare queries with p less than 0.0001 in every seed, an effect 6 to 30 times larger than on general queries. Leave-subset-out valid

What carries the argument

Gradient-based data attribution (TrackStar retrieval and MAGIC counterfactual influence estimation) that traces the contribution of specific Wikipedia page sections to model outputs on targeted versus control queries.

If this is right

The influence of the edits is topic-specific rather than a general association with the mentioned companies or pages.
Counterfactual estimation shows that the top influential documents shift completely to PAW edits when the query concerns animal welfare.
Fine-tuning on the edited content improves model performance only on the matching category of text.
The effect size difference between targeted and control queries holds across multiple random training seeds and validation runs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same attribution pipeline could be applied to other coordinated editing campaigns on Wikipedia to check for comparable topic-specific effects.
Public knowledge platforms function as high-leverage insertion points for shaping downstream model behavior when training data is drawn from them.
Auditing training corpora with influence methods might reveal other narrow editing efforts that have already propagated into deployed models.
The observed pattern suggests that small changes in high-weight data sources can produce outsized, topic-local shifts in model outputs without altering the overall training distribution.

Load-bearing premise

The attribution methods correctly isolate the causal effect of the specific PAW edits rather than capturing correlated features of the pages or query selection biases.

What would settle it

If the proportion of PAW-edited sections among the top-attributed documents remained statistically indistinguishable between animal welfare queries and unrelated queries about the same entities, the claim of topic-specific influence would be falsified.

Figures

Figures reproduced from arXiv: 2606.24890 by Alexa Gnauck, Jasmine Brazilek, Maria Navas.

**Figure 2.** Figure 2: MAGIC counterfactual influence per training document, averaged across five random [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Fine-tuning ablation. Each model performs better on the type of text it was trained on. The [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Can a small group of volunteers shape how AI systems discuss animal welfare, just by editing Wikipedia? We show that they can. Wikipedia appears in nearly every major language model training dataset and is weighted more heavily than web-crawled text. The Pro-Animal Wikipedians (PAW), a group of advocates who add sourced animal welfare content to relevant articles, have made 125 edits across 115 pages. Using gradient-based data attribution (Bergson; MAGIC), we traced how these edits influence language model behavior. TrackStar retrieval attribution on Llama 3.1 8B found that PAW-edited sections made up 68 percent of the highest-attributed documents for animal welfare queries (p < 0.0001) but only 52 percent for unrelated queries about the same companies (p = 0.53): the model links PAW content specifically to animal welfare topics, not to the entities in general. MAGIC counterfactual influence estimation on Llama-3.2-1B, run across five random training-order seeds, gave the same picture even more sharply: in every seed, the top-10 most influential documents on animal welfare queries were all PAW edits (10 of 10, 5 of 5 seeds), while on general queries the same top-10 sat at chance (4 to 6 of 10). Mean PAW influence exceeded mean control influence on animal welfare queries with p < 0.0001 in every seed, an effect 6 to 30 times larger than on general queries. Leave-subset-out validation gave Spearman rho = 1.00 for all 10 runs. When we fine-tuned separate models on PAW content versus control content, each model performed better specifically on the type of text it was trained on: the PAW-trained model cut perplexity on animal welfare text from 12.4 to 8.4, while the control-trained model cut perplexity on control text from 16.1 to 11.4. A small, coordinated Wikipedia editing campaign therefore measurably shapes how language models handle the topics those edits address.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper finds topic-specific attribution from real Wikipedia edits on animal welfare queries, backed by multiple methods and fine-tuning, but the causal isolation of those attributions remains the untested premise.

read the letter

The main takeaway is that 125 edits by the Pro-Animal Wikipedians group dominate attribution scores for animal-welfare queries in Llama models while showing no such pattern on control queries about the same companies.

They apply TrackStar retrieval and MAGIC gradient attribution, run MAGIC across five training seeds, and add a fine-tuning check. PAW content reaches 68% of top attributions on target queries versus 52% on controls, hits 10/10 top documents in every MAGIC seed on welfare queries, and produces matching perplexity drops when models are trained on PAW versus control text. The leave-subset-out Spearman rho of 1.00 across runs is clean.

The work is a direct application of existing attribution tools to actual advocacy edits rather than synthetic cases. The differential effect across query types and the alignment between attribution and fine-tuning results are the concrete observations.

The soft spot is whether the attribution methods recover the specific causal contribution of those edits or simply surface topical overlap. TrackStar and MAGIC are known to respond to semantic similarity, and the reported numbers do not include a ground-truth validation that would confirm the tools isolate the edits' unique influence here. Query construction and sampling details would also clarify whether selection effects play a role.

This is relevant for anyone working on training data provenance or external steering of model behavior. The measurements are quantitative enough to merit referee time even with the attribution caveat.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that a small coordinated set of 125 Wikipedia edits by the Pro-Animal Wikipedians (PAW) across 115 pages measurably shapes LLM behavior on animal-welfare topics. Using TrackStar retrieval attribution on Llama 3.1 8B, PAW-edited sections comprise 68% of top-attributed documents for animal-welfare queries (p < 0.0001) versus 52% for unrelated control queries on the same entities (p = 0.53). MAGIC gradient-based influence on Llama-3.2-1B across five training-order seeds shows the top-10 documents for animal-welfare queries are all PAW edits (10/10 in every seed) while controls are at chance (4–6/10); mean PAW influence exceeds control with p < 0.0001 and is 6–30 times larger than on general queries. Leave-subset-out validation yields Spearman rho = 1.00. Separate fine-tuning runs confirm topic-specific effects: PAW-trained models reduce perplexity on animal-welfare text from 12.4 to 8.4 while control-trained models reduce it on control text from 16.1 to 11.4.

Significance. If the attribution methods validly isolate causal influence from the specific PAW edits, the result would demonstrate that modest, targeted Wikipedia editing can produce measurable, topic-specific shifts in LLM behavior. The internal consistency across five seeds, perfect leave-subset-out rank correlation, and the fine-tuning perplexity contrast (topic-specific rather than global) are genuine strengths that support reproducibility of the chosen metrics.

major comments (2)

[Abstract] Abstract: The interpretation that the 68% vs 52% TrackStar differential and the 10/10 MAGIC top-document result demonstrate specific causal influence from the 125 PAW edits assumes that both attribution procedures recover ground-truth influence rather than topical content similarity or query-selection effects; no validation experiment (e.g., synthetic injection of known edits or comparison against a ground-truth influence oracle) is described to test this premise, which is load-bearing for the central claim.
[Abstract] Abstract: Query construction, sampling of documents for attribution, and the precise definition of the control query set are not specified, preventing assessment of whether the reported separation between animal-welfare and control conditions could be driven by semantic overlap or selection bias rather than the PAW edits themselves.

minor comments (1)

[Abstract] Abstract: The fine-tuning perplexity reductions would be easier to interpret if the number of evaluation documents or tokens per condition were reported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and for noting the strengths in internal consistency, cross-seed reproducibility, and the fine-tuning perplexity contrasts. Below we respond point-by-point to the two major comments. We agree that greater methodological transparency and explicit discussion of attribution assumptions are warranted.

read point-by-point responses

Referee: [Abstract] Abstract: The interpretation that the 68% vs 52% TrackStar differential and the 10/10 MAGIC top-document result demonstrate specific causal influence from the 125 PAW edits assumes that both attribution procedures recover ground-truth influence rather than topical content similarity or query-selection effects; no validation experiment (e.g., synthetic injection of known edits or comparison against a ground-truth influence oracle) is described to test this premise, which is load-bearing for the central claim.

Authors: We acknowledge that the manuscript does not include a synthetic-injection or oracle-based validation of the attribution methods, and that such an experiment would strengthen the causal interpretation. The control conditions (unrelated queries on identical entities, plus general queries) were intended to isolate topic-specific effects from entity or semantic overlap, and the leave-subset-out Spearman rho = 1.00 plus perfect top-10 consistency across five seeds provide internal checks. Nevertheless, we will add a dedicated Limitations subsection that explicitly states the lack of synthetic validation, discusses the assumptions of TrackStar and MAGIC, and notes that the reported differentials are conditional on those assumptions. If the available data and compute permit, we will also attempt a small-scale synthetic edit injection as an appendix experiment. revision: partial
Referee: [Abstract] Abstract: Query construction, sampling of documents for attribution, and the precise definition of the control query set are not specified, preventing assessment of whether the reported separation between animal-welfare and control conditions could be driven by semantic overlap or selection bias rather than the PAW edits themselves.

Authors: The full Methods section contains the query templates, retrieval sampling procedure (top-k documents from the indexed corpus), and control-query definitions, but these details are not summarized in the abstract or introduction. We will revise the abstract and add a concise Experimental Setup paragraph that specifies: (i) the exact query templates used for animal-welfare versus control conditions, (ii) the document sampling rule for attribution runs, and (iii) the control set construction (queries about the same 115 entities but on non-welfare topics such as financial metrics or historical facts). This will allow readers to evaluate potential semantic overlap directly. revision: yes

Circularity Check

0 steps flagged

No circularity: central results rely on external attribution tools and independent fine-tuning experiments

full rationale

The paper reports empirical measurements of influence using TrackStar retrieval and MAGIC gradient-based attribution (external methods) plus separate fine-tuning runs on PAW vs. control content. No equations appear in the provided text, and no step reduces a reported influence score or perplexity differential to a parameter fitted inside the paper or to a self-citation chain. Leave-subset-out Spearman rho = 1.00 is an internal consistency check on the chosen metrics rather than a re-derivation of the influence itself. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper relies on the validity of two cited attribution methods and on the assumption that Wikipedia content appears in the training data with the stated weighting; no new free parameters, axioms, or invented entities are introduced in the abstract.

pith-pipeline@v0.9.1-grok · 5933 in / 1125 out tokens · 36346 ms · 2026-07-01T09:00:35.471677+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 13 canonical work pages · 3 internal anchors

[1]

Language models are few-shot learners.Advances in Neural Information Processing Systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in Neural Information Processing Systems, 33:1877–1901, 2020

1901
[2]

arXiv preprint arXiv:2302.10149 , year=

Nicholas Carlini, Matthew Jagielski, Christopher A Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramèr. Poisoning web-scale training datasets is practical.arXiv preprint arXiv:2302.10149, 2023

work page arXiv 2023
[3]

Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, and Ian Tenney

Tyler A. Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, and Ian Tenney. Scalable influence and fact tracing for large language model pretraining.arXiv preprint arXiv:2410.17413,

work page arXiv
[4]

doi: 10.48550/arXiv.2410.17413

work page doi:10.48550/arxiv.2410.17413
[5]

Marcia W. DiStaso. Perceptions of wikipedia by public relations professionals: A comparison of 2012 and 2013 surveys. Institute for Public Relations, 2013. URL https://instituteforpr.org/ perceptions-wikipedia-public-relations-professionals-comparison-2012-2013-surveys/

2012
[6]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. The pile: An 800gb dataset of diverse text for language modeling.arXiv preprint arXiv:2101.00027,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

doi: 10.48550/arXiv.2101.00027

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2101.00027
[8]

MAGIC: Near-optimal data attribution for deep learning

Andrew Ilyas and Logan Engstrom. MAGIC: Near-optimal data attribution for deep learning. arXiv preprint arXiv:2504.16430, 2025. URLhttps://arxiv.org/abs/2504.16430

work page arXiv 2025
[9]

Datamodels: Predicting predictions from training data.arXiv preprint arXiv:2202.00622, 2022

Andrew Ilyas, Sung Min Park, Logan Engstrom, Guillaume Leclerc, and Aleksander Madry. Datamodels: Predicting predictions from training data.arXiv preprint arXiv:2202.00622, 2022

work page arXiv 2022
[10]

Zarine Kharazian, Kate Starbird, and Benjamin Mako Hill. Governance capture in a self- governing community: A qualitative comparison of the Croatian, Serbian, Bosnian, and Serbo- Croatian wikipedias.Proceedings of the ACM on Human-Computer Interaction, 8(CSCW1): 1–26, 2024. doi: 10.1145/3637338

work page doi:10.1145/3637338 2024
[11]

Understanding black-box predictions via influence functions

Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. InInternational Conference on Machine Learning, pages 1885–1894, 2017

2017
[12]

EleutherAI/bergson: Mapping out the “memory” of neural nets with data attribution

Quintin Lucia and Nora Belrose. EleutherAI/bergson: Mapping out the “memory” of neural nets with data attribution. GitHub, 2026. URLhttps://github.com/EleutherAI/bergson

2026
[13]

Trak: Attributing model behavior at scale.arXiv preprint arXiv:2303.14186, 2023

Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, and Aleksander Madry. TRAK: Attributing model behavior at scale.arXiv preprint arXiv:2303.14186, 2023

work page arXiv 2023
[14]

Dolma: An open corpus of three trillion tokens for language model pretraining research.arXiv preprint arXiv:2402.00159, 2024

Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, et al. Dolma: An open corpus of three trillion tokens for language model pretraining research.arXiv preprint arXiv:2402.00159, 2024. doi: 10.48550/arXiv.2402.00159

work page doi:10.48550/arxiv.2402.00159 2024
[15]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. LLaMA: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[16]

Concealed data poisoning attacks on NLP models

Eric Wallace, Tony Z Zhao, Shi Feng, and Sameer Singh. Concealed data poisoning attacks on NLP models. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 139–150, 2021

2021
[17]

Redpajama: An open dataset for training large language models.arXiv preprint arXiv:2411.12372, 2024

Maurice Weber, Daniel Fu, Quentin Anthony, Yonatan Oren, Shane Adams, Anton Alexandrov, Xiaozhong Lyu, Huu Nguyen, Xiaozhe Yao, Virginia Adams, et al. Redpajama: An open dataset for training large language models.arXiv preprint arXiv:2411.12372, 2024. doi: 10.48550/arXiv.2411.12372

work page doi:10.48550/arxiv.2411.12372 2024
[18]

What is Restaurant C chicken welfare policy?

Taha Yasseri, Robert Sumi, András Rung, András Kornai, and János Kertész. Dynamics of conflicts in wikipedia.PLoS ONE, 2012. doi: 10.1371/journal.pone.0038869. 9 A Query Examples and Data Availability All datasets, queries, and fine-tuned models are available from the authors on request. Below we list example queries from each set. A.1 Animal Welfare Quer...

work page doi:10.1371/journal.pone.0038869 2012

[1] [1]

Language models are few-shot learners.Advances in Neural Information Processing Systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in Neural Information Processing Systems, 33:1877–1901, 2020

1901

[2] [2]

arXiv preprint arXiv:2302.10149 , year=

Nicholas Carlini, Matthew Jagielski, Christopher A Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramèr. Poisoning web-scale training datasets is practical.arXiv preprint arXiv:2302.10149, 2023

work page arXiv 2023

[3] [3]

Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, and Ian Tenney

Tyler A. Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, and Ian Tenney. Scalable influence and fact tracing for large language model pretraining.arXiv preprint arXiv:2410.17413,

work page arXiv

[4] [4]

doi: 10.48550/arXiv.2410.17413

work page doi:10.48550/arxiv.2410.17413

[5] [5]

Marcia W. DiStaso. Perceptions of wikipedia by public relations professionals: A comparison of 2012 and 2013 surveys. Institute for Public Relations, 2013. URL https://instituteforpr.org/ perceptions-wikipedia-public-relations-professionals-comparison-2012-2013-surveys/

2012

[6] [6]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. The pile: An 800gb dataset of diverse text for language modeling.arXiv preprint arXiv:2101.00027,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

doi: 10.48550/arXiv.2101.00027

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2101.00027

[8] [8]

MAGIC: Near-optimal data attribution for deep learning

Andrew Ilyas and Logan Engstrom. MAGIC: Near-optimal data attribution for deep learning. arXiv preprint arXiv:2504.16430, 2025. URLhttps://arxiv.org/abs/2504.16430

work page arXiv 2025

[9] [9]

Datamodels: Predicting predictions from training data.arXiv preprint arXiv:2202.00622, 2022

Andrew Ilyas, Sung Min Park, Logan Engstrom, Guillaume Leclerc, and Aleksander Madry. Datamodels: Predicting predictions from training data.arXiv preprint arXiv:2202.00622, 2022

work page arXiv 2022

[10] [10]

Zarine Kharazian, Kate Starbird, and Benjamin Mako Hill. Governance capture in a self- governing community: A qualitative comparison of the Croatian, Serbian, Bosnian, and Serbo- Croatian wikipedias.Proceedings of the ACM on Human-Computer Interaction, 8(CSCW1): 1–26, 2024. doi: 10.1145/3637338

work page doi:10.1145/3637338 2024

[11] [11]

Understanding black-box predictions via influence functions

Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. InInternational Conference on Machine Learning, pages 1885–1894, 2017

2017

[12] [12]

EleutherAI/bergson: Mapping out the “memory” of neural nets with data attribution

Quintin Lucia and Nora Belrose. EleutherAI/bergson: Mapping out the “memory” of neural nets with data attribution. GitHub, 2026. URLhttps://github.com/EleutherAI/bergson

2026

[13] [13]

Trak: Attributing model behavior at scale.arXiv preprint arXiv:2303.14186, 2023

Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, and Aleksander Madry. TRAK: Attributing model behavior at scale.arXiv preprint arXiv:2303.14186, 2023

work page arXiv 2023

[14] [14]

Dolma: An open corpus of three trillion tokens for language model pretraining research.arXiv preprint arXiv:2402.00159, 2024

Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, et al. Dolma: An open corpus of three trillion tokens for language model pretraining research.arXiv preprint arXiv:2402.00159, 2024. doi: 10.48550/arXiv.2402.00159

work page doi:10.48550/arxiv.2402.00159 2024

[15] [15]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. LLaMA: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[16] [16]

Concealed data poisoning attacks on NLP models

Eric Wallace, Tony Z Zhao, Shi Feng, and Sameer Singh. Concealed data poisoning attacks on NLP models. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 139–150, 2021

2021

[17] [17]

Redpajama: An open dataset for training large language models.arXiv preprint arXiv:2411.12372, 2024

Maurice Weber, Daniel Fu, Quentin Anthony, Yonatan Oren, Shane Adams, Anton Alexandrov, Xiaozhong Lyu, Huu Nguyen, Xiaozhe Yao, Virginia Adams, et al. Redpajama: An open dataset for training large language models.arXiv preprint arXiv:2411.12372, 2024. doi: 10.48550/arXiv.2411.12372

work page doi:10.48550/arxiv.2411.12372 2024

[18] [18]

What is Restaurant C chicken welfare policy?

Taha Yasseri, Robert Sumi, András Rung, András Kornai, and János Kertész. Dynamics of conflicts in wikipedia.PLoS ONE, 2012. doi: 10.1371/journal.pone.0038869. 9 A Query Examples and Data Availability All datasets, queries, and fine-tuned models are available from the authors on request. Below we list example queries from each set. A.1 Animal Welfare Quer...

work page doi:10.1371/journal.pone.0038869 2012