Transactions of the Association for Computational Linguistics , volume=

Data statements for natural language processing: Toward mitigating system bias, enabling better science , author= · 2018

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

cs.CL · 2020-12-31 · conditional · novelty 8.0

The Pile is a newly constructed 825 GiB dataset from 22 diverse sources that enables language models to achieve better performance on academic, professional, and cross-domain tasks than models trained on Common Crawl variants.

Evaluating Multi-turn Human-AI Interaction

cs.HC · 2026-05-18 · unverdicted · novelty 6.0

Introduces the TCR framework to evaluate educational LLM assistants on transparency, consistency, and refinement in multi-turn interactions, complementing aggregate metrics.

The Proxy Presumption: From Semantic Embeddings to Valid Social Measures

cs.CL · 2026-05-08 · unverdicted · novelty 5.0

The paper introduces the Construct Validity Protocol to validate semantic embeddings for social constructs and proposes Counterfactual Neutralization using LLMs to reduce confounding.

citing papers explorer

Showing 3 of 3 citing papers.

The Pile: An 800GB Dataset of Diverse Text for Language Modeling cs.CL · 2020-12-31 · conditional · none · ref 130
The Pile is a newly constructed 825 GiB dataset from 22 diverse sources that enables language models to achieve better performance on academic, professional, and cross-domain tasks than models trained on Common Crawl variants.
Evaluating Multi-turn Human-AI Interaction cs.HC · 2026-05-18 · unverdicted · none · ref 30
Introduces the TCR framework to evaluate educational LLM assistants on transparency, consistency, and refinement in multi-turn interactions, complementing aggregate metrics.
The Proxy Presumption: From Semantic Embeddings to Valid Social Measures cs.CL · 2026-05-08 · unverdicted · none · ref 53
The paper introduces the Construct Validity Protocol to validate semantic embeddings for social constructs and proposes Counterfactual Neutralization using LLMs to reduce confounding.

Transactions of the Association for Computational Linguistics , volume=

fields

years

verdicts

representative citing papers

citing papers explorer