Mixed citations

Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell

Jennifer Cobbe, Michelle S · 2021 · arXiv 2188.344592

Mixed citation behavior. Most common role is background (68%).

72 Pith papers citing it

Background 68% of classified citations

read on arXiv browse 72 citing papers

citation-role summary

background 20 baseline 1 other 1

citation-polarity summary

background 15 support 3 unclear 3 baseline 1

co-cited works

representative citing papers

WildChat: 1M ChatGPT Interaction Logs in the Wild

cs.CL · 2024-05-02 · accept · novelty 8.0

WildChat releases a dataset of 1 million ChatGPT conversations with timestamps, demographics, and headers, claimed to be the most diverse and multilingual such resource available.

Is She Even Relevant? When BERT Ignores Explicit Gender Cues

cs.CL · 2026-05-08 · conditional · novelty 7.0

A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

Reverse Constitutional AI: A Framework for Controllable Toxic Data Generation via Probability-Clamped RLAIF

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

R-CAI inverts constitutional AI to automatically generate diverse toxic data for LLM red teaming, with probability clamping improving output coherence by 15% while preserving adversarial strength.

SPAGBias: Uncovering and Tracing Structured Spatial Gender Bias in Large Language Models

cs.CL · 2026-04-16 · unverdicted · novelty 7.0

SPAGBias reveals that LLMs form nuanced gender associations with specific urban micro-spaces that exceed real-world distributions and produce failures in planning and descriptive tasks.

Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models

cs.CL · 2026-03-27 · unverdicted · novelty 7.0

LLMs display a consistent pattern of elevated form-meaning divergence and uniform rhetorical device use in argumentative texts compared to humans, quantified by new metrics FMD, GPR, and RDDE.

Co-Disclosing the Computer: LLM-Mediated Computing through Reflective Conversation

cs.HC · 2026-02-27 · unverdicted · novelty 7.0

Introduces LLM-mediated computing as a paradigm of reflective conversation and co-disclosure where the computer emerges through human-LLM interaction.

Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs

cs.CL · 2025-06-08 · unverdicted · novelty 7.0

VISE is the first benchmark for sycophancy in Video-LLMs, with two training-free mitigation strategies based on key-frame selection and internal representation steering.

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

cs.AI · 2024-06-14 · conditional · novelty 7.0

LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.

"You tell me": A Dataset of GPT-4-Based Behaviour Change Support Conversations

cs.HC · 2024-01-29 · unverdicted · novelty 7.0

Authors share a new dataset of GPT-4 behavior-change conversations with user language metrics, perception measures, and feedback collected in a preregistered study.

GAIA: a benchmark for General AI Assistants

cs.CL · 2023-11-21 · unverdicted · novelty 7.0

GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.

Towards Measuring the Representation of Subjective Global Opinions in Language Models

cs.CL · 2023-06-28 · conditional · novelty 7.0

LLMs default to responses more similar to opinions from the USA and some European and South American countries; prompting for a country shifts alignment but can introduce stereotypes, while translation does not reliably match language speakers.

Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and bounded heterogeneity.

"It became a self-fulfilling prophecy": How Lived Experiences are Entangled with AI Predictions in Menstrual Cycle Tracking Apps

cs.HC · 2026-05-13 · conditional · novelty 6.0

Users entangle their lived experiences with AI predictions in menstrual tracking apps, leading to self-fulfilling prophecies, limited critical awareness from UI, and isolation for non-normative users.

Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.

Creating Group Rules with AI: Human-AI Collaboration in WhatsApp Moderation

cs.HC · 2026-05-12 · accept · novelty 6.0

Admins in India used Meta AI to help create WhatsApp group rules, appreciating reduced workload but remaining cautious about privacy, relational trust, and contextual tone.

A CAP-like Trilemma for Large Language Models: Correctness, Non-bias, and Utility under Semantic Underdetermination

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

Under semantic underdetermination, LLMs cannot always guarantee strong correctness, strict non-bias, and high utility at once.

Push and Pushback in Contesting AI: Demands for and Resistance to Accountability

cs.HC · 2026-05-10 · unverdicted · novelty 6.0

Thematic analysis of 43 AI contestation cases, using Bovens's relational accountability model, produces categories of demands from below, institutional pushback, outcomes, and contextual factors shaping effective contestation.

Modeling Implicit Conflict Monitoring Mechanisms against Stereotypes in LLMs

cs.SI · 2026-05-10 · unverdicted · novelty 6.0

LLMs contain identifiable COCO neurons that enable implicit self-correction against stereotypes; targeted editing of these neurons improves fairness and robustness to jailbreaks while preserving generation quality.

The Grounding Gap: How LLMs Anchor the Meaning of Abstract Concepts Differently from Humans

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

LLMs show a grounding gap with humans on abstract concepts, with property-generation correlations at most r=0.37 versus human-to-human r>0.9, though larger models align better on explicit rating tasks and internal SAE features capture some grounding dimensions.

Query-efficient model evaluation using cached responses

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

DKPS-based methods leverage cached model responses to achieve equivalent benchmark prediction accuracy with substantially fewer queries than standard evaluation.

When AI Meets Science: Research Diversity, Interdisciplinarity, Visibility, and Retractions across Disciplines in a Global Surge

cs.DL · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

AI use in science has grown exponentially since 2015 but stays confined to computer science and statistics topics, shows higher retraction rates and citations, and follows distinct global adoption patterns.

Can AI Debias the News? LLM Interventions Improve Cross-Partisan Receptivity but LLMs Overestimate Their Own Effectiveness

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

Substantive LLM reframing boosts cross-partisan receptivity to news headlines without backfire, but models overestimate effect sizes and lack fidelity in modeling human psychological responses.

Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking

cs.CR · 2026-05-01 · unverdicted · novelty 6.0

BREW achieves TPR of 0.965 and FPR of 0.02 under 10% synonym substitution by shifting from ECC decoding to designated verification with block voting and local validation.

citing papers explorer

Showing 50 of 72 citing papers.

WildChat: 1M ChatGPT Interaction Logs in the Wild cs.CL · 2024-05-02 · accept · none · ref 36
WildChat releases a dataset of 1 million ChatGPT conversations with timestamps, demographics, and headers, claimed to be the most diverse and multilingual such resource available.
Is She Even Relevant? When BERT Ignores Explicit Gender Cues cs.CL · 2026-05-08 · conditional · none · ref 4
A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment cs.CL · 2026-05-08 · unverdicted · none · ref 225
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
Reverse Constitutional AI: A Framework for Controllable Toxic Data Generation via Probability-Clamped RLAIF cs.CL · 2026-04-20 · unverdicted · none · ref 43
R-CAI inverts constitutional AI to automatically generate diverse toxic data for LLM red teaming, with probability clamping improving output coherence by 15% while preserving adversarial strength.
SPAGBias: Uncovering and Tracing Structured Spatial Gender Bias in Large Language Models cs.CL · 2026-04-16 · unverdicted · none · ref 7
SPAGBias reveals that LLMs form nuanced gender associations with specific urban micro-spaces that exceed real-world distributions and produce failures in planning and descriptive tasks.
Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models cs.CL · 2026-03-27 · unverdicted · none · ref 5
LLMs display a consistent pattern of elevated form-meaning divergence and uniform rhetorical device use in argumentative texts compared to humans, quantified by new metrics FMD, GPR, and RDDE.
Co-Disclosing the Computer: LLM-Mediated Computing through Reflective Conversation cs.HC · 2026-02-27 · unverdicted · none · ref 6
Introduces LLM-mediated computing as a paradigm of reflective conversation and co-disclosure where the computer emerges through human-LLM interaction.
Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs cs.CL · 2025-06-08 · unverdicted · none · ref 4
VISE is the first benchmark for sycophancy in Video-LLMs, with two training-free mitigation strategies based on key-frame selection and internal representation steering.
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024-06-14 · conditional · none · ref 130
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
"You tell me": A Dataset of GPT-4-Based Behaviour Change Support Conversations cs.HC · 2024-01-29 · unverdicted · none · ref 12
Authors share a new dataset of GPT-4 behavior-change conversations with user language metrics, perception measures, and feedback collected in a preregistered study.
GAIA: a benchmark for General AI Assistants cs.CL · 2023-11-21 · unverdicted · none · ref 91
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
Towards Measuring the Representation of Subjective Global Opinions in Language Models cs.CL · 2023-06-28 · conditional · none · ref 9
LLMs default to responses more similar to opinions from the USA and some European and South American countries; prompting for a country shifts alignment but can introduce stereotypes, while translation does not reliably match language speakers.
Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity cs.LG · 2026-05-13 · unverdicted · none · ref 7
Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and bounded heterogeneity.
"It became a self-fulfilling prophecy": How Lived Experiences are Entangled with AI Predictions in Menstrual Cycle Tracking Apps cs.HC · 2026-05-13 · conditional · none · ref 38
Users entangle their lived experiences with AI predictions in menstrual tracking apps, leading to self-fulfilling prophecies, limited critical awareness from UI, and isolation for non-normative users.
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces cs.LG · 2026-05-12 · unverdicted · none · ref 257
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
Creating Group Rules with AI: Human-AI Collaboration in WhatsApp Moderation cs.HC · 2026-05-12 · accept · none · ref 3
Admins in India used Meta AI to help create WhatsApp group rules, appreciating reduced workload but remaining cautious about privacy, relational trust, and contextual tone.
A CAP-like Trilemma for Large Language Models: Correctness, Non-bias, and Utility under Semantic Underdetermination cs.AI · 2026-05-12 · unverdicted · none · ref 4
Under semantic underdetermination, LLMs cannot always guarantee strong correctness, strict non-bias, and high utility at once.
Push and Pushback in Contesting AI: Demands for and Resistance to Accountability cs.HC · 2026-05-10 · unverdicted · none · ref 39
Thematic analysis of 43 AI contestation cases, using Bovens's relational accountability model, produces categories of demands from below, institutional pushback, outcomes, and contextual factors shaping effective contestation.
Modeling Implicit Conflict Monitoring Mechanisms against Stereotypes in LLMs cs.SI · 2026-05-10 · unverdicted · none · ref 77
LLMs contain identifiable COCO neurons that enable implicit self-correction against stereotypes; targeted editing of these neurons improves fairness and robustness to jailbreaks while preserving generation quality.
The Grounding Gap: How LLMs Anchor the Meaning of Abstract Concepts Differently from Humans cs.CL · 2026-05-09 · unverdicted · none · ref 66
LLMs show a grounding gap with humans on abstract concepts, with property-generation correlations at most r=0.37 versus human-to-human r>0.9, though larger models align better on explicit rating tasks and internal SAE features capture some grounding dimensions.
Query-efficient model evaluation using cached responses cs.LG · 2026-05-08 · unverdicted · none · ref 34
DKPS-based methods leverage cached model responses to achieve equivalent benchmark prediction accuracy with substantially fewer queries than standard evaluation.
When AI Meets Science: Research Diversity, Interdisciplinarity, Visibility, and Retractions across Disciplines in a Global Surge cs.DL · 2026-05-07 · unverdicted · none · ref 5 · 2 links
AI use in science has grown exponentially since 2015 but stays confined to computer science and statistics topics, shows higher retraction rates and citations, and follows distinct global adoption patterns.
Can AI Debias the News? LLM Interventions Improve Cross-Partisan Receptivity but LLMs Overestimate Their Own Effectiveness cs.CL · 2026-05-01 · unverdicted · none · ref 5
Substantive LLM reframing boosts cross-partisan receptivity to news headlines without backfire, but models overestimate effect sizes and lack fidelity in modeling human psychological responses.
Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking cs.CR · 2026-05-01 · unverdicted · none · ref 46
BREW achieves TPR of 0.965 and FPR of 0.02 under 10% synonym substitution by shifting from ECC decoding to designated verification with block voting and local validation.
How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in System-Prompted Responses cs.CL · 2026-04-30 · unverdicted · none · ref 5
Frontier LLMs adapt structurally to explicit neurodivergence instructions by increasing output length, headings, and step granularity, but ND persona assertion alone fails to suppress harmful tendencies.
Dialect vs Demographics: Quantifying LLM Bias from Implicit Linguistic Signals vs. Explicit User Profiles cs.CY · 2026-04-22 · unverdicted · none · ref 10
Explicit demographic statements trigger higher refusal rates and lower semantic similarity in LLMs than implicit dialect cues, which reduce refusals but also reduce content sanitization.
Using Machine Mental Imagery for Representing Common Ground in Situated Dialogue cs.CL · 2026-04-22 · unverdicted · none · ref 3
Incremental visual scaffolding using multimodal models improves persistent common ground representation in situated dialogue by reducing representational blur compared to text-only approaches, with hybrid text-visual yielding best results on the IndiRef benchmark.
VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech eess.AS · 2026-04-19 · unverdicted · none · ref 32
VIBE evaluates generative biases in large audio-language models with real-world speech and open-ended tasks, showing that gender cues produce larger distributional shifts than accent cues across 11 tested models.
Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models cs.CR · 2026-04-01 · conditional · none · ref 3
A new benchmark exposes food-safety gaps in current LLMs and guardrails, and a fine-tuned 4B model is offered as a domain-specific fix.
Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing cs.LG · 2026-01-31 · unverdicted · none · ref 2
Hallucinations are the space-optimal behavior for limited-capacity models performing membership testing on sparse facts, as shown by a rate-distortion theorem that equates optimal memory use to minimum KL divergence between fact and non-fact score distributions.
Large Language Model Agent for User-friendly Chemical Process Simulations physics.chem-ph · 2026-01-15 · unverdicted · none · ref 54
An LLM agent integrated with AVEVA Process Simulation via MCP enables natural language driven flowsheet analysis, optimization, and construction for chemical separation processes.
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale cs.CL · 2024-06-25 · unverdicted · none · ref 73
FineWeb is a curated 15T-token web dataset that produces stronger LLMs than prior open collections, while its educational subset sharply improves performance on MMLU and ARC benchmarks.
Laissez-Faire Harms: Algorithmic Biases in Generative Language Models cs.CL · 2024-04-11 · unverdicted · none · ref 4 · 2 links
Generative LMs in laissez-faire open-ended prompting settings disproportionately generate subordinated portrayals of minoritized race, gender, and sexual orientation identities at rates hundreds to thousands of times higher than empowering ones.
StarCoder 2 and The Stack v2: The Next Generation cs.SE · 2024-02-29 · accept · none · ref 193
StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 65
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
Emergent Abilities of Large Language Models cs.CL · 2022-06-15 · unverdicted · none · ref 9
Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.
GPT-NeoX-20B: An Open-Source Autoregressive Language Model cs.CL · 2022-04-14 · accept · none · ref 8
GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.
PaLM: Scaling Language Modeling with Pathways cs.CL · 2022-04-05 · accept · none · ref 11
PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.
Ethical and social risks of harm from Language Models cs.CL · 2021-12-08 · accept · none · ref 22 · 2 links
The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.
Deduplicating Training Data Makes Language Models Better cs.CL · 2021-07-14 · unverdicted · none · ref 7
Deduplicating training datasets reduces language model verbatim memorization by 10x, improves training efficiency, and enables more accurate evaluation by cutting train-test overlap.
AI at the Front Lines of Platform Governance: Using LLMs to Support Illegal Content Reporting under the Digital Services Act cs.HC · 2026-05-22 · unverdicted · none · ref 24
EvalAI providing pro/con arguments improves provision-level accuracy and reduces misclassification distance in DSA illegal content reporting under AI error conditions versus conventional XAI.
The Quiet Path from Seemingly Minor Design Errors to Workplace AI Incidents cs.HC · 2026-05-20 · unverdicted · none · ref 10
Empirical analysis of 1,524 AI incident reports shows 83% arise from worker-AI trait misalignments, with 74% of those traceable to developers prioritizing efficiency over precision or personalization.
Position: AI as Part of Self -- Extending the Mind Requires Cognitive Co-Regulation cs.HC · 2026-05-15 · unverdicted · none · ref 111
The paper claims that alignment requires treating AI as part of the self through cognitive co-regulation, identifying risks like deskilling and automation bias while drawing on System 0 cognition theory.
Evaluating Structured Documentation as a Tool for Reflexivity in Dataset Development cs.CY · 2026-05-11 · unverdicted · none · ref 15
Structured dataset documentation shows little engagement with major reflexivity themes from FAccT literature, leading to a new codebook and extended datasheet questions.
Social Policy of Large Language Models: How GPT, Claude, DeepSeek and Grok Allocate Social Budgets in Spain and Germany cs.CY · 2026-05-11 · unverdicted · none · ref 3
Four LLMs exhibit a shared implicit social policy that under-allocates pensions by a factor of three and over-allocates housing by four compared to OECD budgets, with only Claude showing meaningful response to national context.
Governing What the EU AI Act Excludes: Accountability for Autonomous AI Agents in Smart City Critical Infrastructure cs.CY · 2026-05-01 · unverdicted · none · ref 9
The EU AI Act narrows accountability for multi-agent AI in critical infrastructure by excluding safety components from key explanation and impact assessment rights, and the paper proposes AgentGov-SC, a three-layer architecture with 25 measures to address this through traceability to existing AI and
Measuring the Machine: Evaluating Generative AI as Pluralist Sociotechical Systems cs.AI · 2026-04-22 · unverdicted · none · ref 3
Generative AI must be evaluated as recursive pluralist sociotechnical systems via MaSH Loops and distributional World Values Benchmarks instead of static functionalist or prescriptive tests.
Governed Auditable Decisioning Under Uncertainty: Synthesis and Agentic Extension cs.CY · 2026-04-21 · unverdicted · none · ref 7
Synthesizes a governance evidence framework revealing a coverage gradient from full auditability in rule engines to structural breaks in agentic AI, with a cascade of uncertainty and four formal propositions.
Reckoning with the Political Economy of AI: Avoiding Decoys in Pursuit of Accountability cs.CY · 2026-04-17 · unverdicted · none · ref 15
AI accountability efforts are undermined by five decoys that create illusions of progress while co-constituting the extractive political economy of the AI Project.
Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS) cs.CL · 2026-04-16 · unverdicted · none · ref 18
SSAS improves LLM sentiment prediction consistency and data quality by up to 30% on three review datasets via syntactic and semantic context assessment summarization.

Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell

citation-role summary

citation-polarity summary

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer