super hub Mixed citations

Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell

Bender, Emily M · 2021 · arXiv 2188.344592

Mixed citation behavior. Most common role is background (62%).

112 Pith papers citing it

Background 62% of classified citations

read on arXiv browse 112 citing papers more from Bender

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 24 baseline 1 other 1

citation-polarity summary

background 16 support 6 unclear 3 baseline 1

authors

Bender Emily M

co-cited works

representative citing papers

WildChat: 1M ChatGPT Interaction Logs in the Wild

cs.CL · 2024-05-02 · accept · novelty 8.0

WildChat releases a dataset of 1 million ChatGPT conversations with timestamps, demographics, and headers, claimed to be the most diverse and multilingual such resource available.

$\text{DT}^2$: Decision-Targeted Digital Twins

cs.LG · 2026-06-24 · unverdicted · novelty 7.0

DT² trains digital twins to preserve pairwise policy rankings from fitted Q-evaluation on offline data rather than minimizing one-step transition errors, improving policy ranking and reducing decision regret.

CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models

cs.LG · 2026-06-16 · conditional · novelty 7.0

CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.

Polar: A Benchmark for Evaluating Political Bias in LLMs

cs.CL · 2026-06-11 · unverdicted · novelty 7.0

Polar is a new cross-context benchmark showing LLM political bias measurements are not fixed but vary with country, issue, model, and language.

Toward Calibrated, Fair, and accurate Deepfake Detection

cs.LG · 2026-06-03 · unverdicted · novelty 7.0 · 2 refs

Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.

LazyAttention: Efficient Retrieval-Augmented Generation with Deferred Positional Encoding

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

LazyAttention kernelizes deferred positional encoding to enable zero-copy, position-agnostic KV cache reuse, delivering 1.37× lower TTFT and 1.40× higher throughput than Block-Attention under skewed document distributions while preserving output quality.

Chatbots Output Meaningful (but Problematic) Language

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

LLM outputs are meaningful according to standard theories of human language, without requiring anthropomorphic assumptions about the models.

Casual as an Anchor: Resolving Supervision Misalignment in Formality Transfer Dataset

cs.CL · 2026-05-28 · unverdicted · novelty 7.0

The authors introduce a three-level formality spectrum (informal, casual, formal) and the 3LF dataset to correct supervision misalignment in formality transfer, reporting large gains in informal-to-formal performance on models including GPT variants.

Is She Even Relevant? When BERT Ignores Explicit Gender Cues

cs.CL · 2026-05-08 · conditional · novelty 7.0

A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection

cs.CR · 2026-05-02 · unverdicted · novelty 7.0

Causal tracing reveals a persistent Refusal Trajectory in LLM hidden states; SALO detector using sparse activations from a layer window improves jailbreak detection across Qwen, Llama, and Mistral models.

Reverse Constitutional AI: A Framework for Controllable Toxic Data Generation via Probability-Clamped RLAIF

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

R-CAI inverts constitutional AI to automatically generate diverse toxic data for LLM red teaming, with probability clamping improving output coherence by 15% while preserving adversarial strength.

SPAGBias: Uncovering and Tracing Structured Spatial Gender Bias in Large Language Models

cs.CL · 2026-04-16 · unverdicted · novelty 7.0

SPAGBias reveals that LLMs form nuanced gender associations with specific urban micro-spaces that exceed real-world distributions and produce failures in planning and descriptive tasks.

Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models

cs.CL · 2026-03-27 · unverdicted · novelty 7.0

LLMs display a consistent pattern of elevated form-meaning divergence and uniform rhetorical device use in argumentative texts compared to humans, quantified by new metrics FMD, GPR, and RDDE.

Co-Disclosing the Computer: LLM-Mediated Computing through Reflective Conversation

cs.HC · 2026-02-27 · unverdicted · novelty 7.0

Introduces LLM-mediated computing as a paradigm of reflective conversation and co-disclosure where the computer emerges through human-LLM interaction.

Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs

cs.CL · 2025-06-08 · unverdicted · novelty 7.0

VISE is the first benchmark for sycophancy in Video-LLMs, with two training-free mitigation strategies based on key-frame selection and internal representation steering.

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

cs.AI · 2024-06-14 · conditional · novelty 7.0

LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.

"You tell me": A Dataset of GPT-4-Based Behaviour Change Support Conversations

cs.HC · 2024-01-29 · unverdicted · novelty 7.0

Authors share a new dataset of GPT-4 behavior-change conversations with user language metrics, perception measures, and feedback collected in a preregistered study.

GAIA: a benchmark for General AI Assistants

cs.CL · 2023-11-21 · unverdicted · novelty 7.0

GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.

Towards Measuring the Representation of Subjective Global Opinions in Language Models

cs.CL · 2023-06-28 · conditional · novelty 7.0

LLMs default to responses more similar to opinions from the USA and some European and South American countries; prompting for a country shifts alignment but can introduce stereotypes, while translation does not reliably match language speakers.

The Future of NLP may not be at NLP Conferences: Scholarly Migration Patterns in Natural Language Processing

cs.CL · 2026-07-02 · unverdicted · novelty 6.0

NLP authors show migration from *ACL flagship tracks (–19.2pp) to Findings (+14.8pp) and ML venues (+8.6pp), with new authors increasing ML share from 5% to 21% and causal inference indicating a citation premium drives venue choice.

Personality Without Persons? A Psychometric Critique of Big Five Testing in Large Language Models

cs.HC · 2026-07-02 · accept · novelty 6.0

Big Five inventories fail to capture meaningful differences or recover the five-factor structure in LLMs, with only 3% variance between models and four facets collapsing (r >= .92).

Correct codes for the wrong reasons? validating LLMs as measurement instruments for theoretical constructs

cs.CL · 2026-06-26 · unverdicted · novelty 6.0

Grain calibration decomposes theoretical constructs into clause-level components, tests each with extractive evidence, and combines results through explicit theory-derived rules to validate LLM coding beyond agreement with human annotators.

Unmasking LAION-5B: Age, Gender, Race, and Emotion Biases in Large-Scale Image Datasets

cs.CV · 2026-06-22 · unverdicted · novelty 6.0 · 2 refs

Empirical audit of LAION-2B-en and LAION-2B-multi finds overrepresentation of young adults, White people, and males plus stereotypical emotion associations across two attribute classifiers.

citing papers explorer

Showing 31 of 31 citing papers after filters.

Polar: A Benchmark for Evaluating Political Bias in LLMs cs.CL · 2026-06-11 · unverdicted · none · ref 10
Polar is a new cross-context benchmark showing LLM political bias measurements are not fixed but vary with country, issue, model, and language.
LazyAttention: Efficient Retrieval-Augmented Generation with Deferred Positional Encoding cs.CL · 2026-06-03 · unverdicted · none · ref 70
LazyAttention kernelizes deferred positional encoding to enable zero-copy, position-agnostic KV cache reuse, delivering 1.37× lower TTFT and 1.40× higher throughput than Block-Attention under skewed document distributions while preserving output quality.
Chatbots Output Meaningful (but Problematic) Language cs.CL · 2026-06-02 · unverdicted · none · ref 61
LLM outputs are meaningful according to standard theories of human language, without requiring anthropomorphic assumptions about the models.
Casual as an Anchor: Resolving Supervision Misalignment in Formality Transfer Dataset cs.CL · 2026-05-28 · unverdicted · none · ref 34
The authors introduce a three-level formality spectrum (informal, casual, formal) and the 3LF dataset to correct supervision misalignment in formality transfer, reporting large gains in informal-to-formal performance on models including GPT variants.
Is She Even Relevant? When BERT Ignores Explicit Gender Cues cs.CL · 2026-05-08 · conditional · none · ref 4
A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment cs.CL · 2026-05-08 · unverdicted · none · ref 225
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
Reverse Constitutional AI: A Framework for Controllable Toxic Data Generation via Probability-Clamped RLAIF cs.CL · 2026-04-20 · unverdicted · none · ref 43
R-CAI inverts constitutional AI to automatically generate diverse toxic data for LLM red teaming, with probability clamping improving output coherence by 15% while preserving adversarial strength.
SPAGBias: Uncovering and Tracing Structured Spatial Gender Bias in Large Language Models cs.CL · 2026-04-16 · unverdicted · none · ref 7
SPAGBias reveals that LLMs form nuanced gender associations with specific urban micro-spaces that exceed real-world distributions and produce failures in planning and descriptive tasks.
Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models cs.CL · 2026-03-27 · unverdicted · none · ref 5
LLMs display a consistent pattern of elevated form-meaning divergence and uniform rhetorical device use in argumentative texts compared to humans, quantified by new metrics FMD, GPR, and RDDE.
The Future of NLP may not be at NLP Conferences: Scholarly Migration Patterns in Natural Language Processing cs.CL · 2026-07-02 · unverdicted · none · ref 26
NLP authors show migration from *ACL flagship tracks (–19.2pp) to Findings (+14.8pp) and ML venues (+8.6pp), with new authors increasing ML share from 5% to 21% and causal inference indicating a citation premium drives venue choice.
Correct codes for the wrong reasons? validating LLMs as measurement instruments for theoretical constructs cs.CL · 2026-06-26 · unverdicted · none · ref 28
Grain calibration decomposes theoretical constructs into clause-level components, tests each with extractive evidence, and combines results through explicit theory-derived rules to validate LLM coding beyond agreement with human annotators.
The Wrong Kind of Right: Quantifying and Localizing Misfired Alignment in LLMs cs.CL · 2026-06-17 · unverdicted · none · ref 28
LLMs exhibit misfired alignment on stereotype questions at 4.7-18.9% rates on the new VETO benchmark of 2,032 contrastive pairs, unlike humans at 0%, due to overgeneralized safety cues after instruction tuning.
Sch\"utzen: Evaluating LLM Safety in Bulgarian and German Contexts cs.CL · 2026-06-09 · unverdicted · none · ref 41
Schützen is a German-Bulgarian LLM safety dataset showing pronounced cross-language differences in model safety behavior.
The Ghost Annotator: a Framework to Explore Human Label Variation in Content Moderation through Conformal Prediction cs.CL · 2026-06-01 · unverdicted · none · ref 2
The Ghost Annotator framework applies conformal prediction and collaborative filtering representations to measure LLM divergence from human annotations across four models and datasets, revealing higher confidence in misaligned cases and consistent demographic misalignment.
Child-directed speech facilitates production, not comprehension, in BabyLMs cs.CL · 2026-05-31 · unverdicted · none · ref 77
CDS-trained BabyLMs show earlier and more appropriate production in a new frame-completion task while FineWeb-edu models lead on comprehension benchmarks, indicating current tests underestimate CDS benefits.
The Grounding Gap: How LLMs Anchor the Meaning of Abstract Concepts Differently from Humans cs.CL · 2026-05-09 · unverdicted · none · ref 66
LLMs show a grounding gap with humans on abstract concepts, with property-generation correlations at most r=0.37 versus human-to-human r>0.9, though larger models align better on explicit rating tasks and internal SAE features capture some grounding dimensions.
Can AI Debias the News? LLM Interventions Improve Cross-Partisan Receptivity but LLMs Overestimate Their Own Effectiveness cs.CL · 2026-05-01 · unverdicted · none · ref 5
Substantive LLM reframing boosts cross-partisan receptivity to news headlines without backfire, but models overestimate effect sizes and lack fidelity in modeling human psychological responses.
How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in System-Prompted Responses cs.CL · 2026-04-30 · unverdicted · none · ref 5
Frontier LLMs adapt structurally to explicit neurodivergence instructions by increasing output length, headings, and step granularity, but ND persona assertion alone fails to suppress harmful tendencies.
Disentangling Linguistic Relatedness from Task Alignment in Cross-Lingual Transfer cs.CL · 2026-04-26 · unverdicted · none · ref 1
Fine-tuning LLMs on Arabic yields similar zero-shot gains on Semitic and non-Semitic languages, with chain-of-thought reasoning producing parallel benefits, indicating task alignment drives transfer more than language relatedness.
Exposing the Unsaid: Visualizing Hidden LLM Bias through Stochastic Path Aggregation cs.CL · 2026-04-24 · conditional · none · ref 2
TreeTracer aggregates stochastic LLM generations into syntax-aligned Sankey trees with contrastive inference to visualize and quantify hidden representational biases across demographic ontologies.
Using Machine Mental Imagery for Representing Common Ground in Situated Dialogue cs.CL · 2026-04-22 · unverdicted · none · ref 3
Incremental visual scaffolding using multimodal models improves persistent common ground representation in situated dialogue by reducing representational blur compared to text-only approaches, with hybrid text-visual yielding best results on the IndiRef benchmark.
"I understand your perspective": LLM Persuasion and Sycophancy through the Lens of Communicative Action Theory cs.CL · 2026-06-06 · unverdicted · none · ref 43
LLMs outperform humans in expressing illocutionary intents and sycophancy in successful persuasive counter-arguments from ChangeMyView, with crowd workers preferring LLM versions.
The Future of Facts: Tracing the Factual Generation-Verification Gap cs.CL · 2026-05-26 · unverdicted · none · ref 21
Empirical tracing across model families shows verification precedes and outlasts generation for facts, with updates producing simultaneous verification of old and new answers.
Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS) cs.CL · 2026-04-16 · unverdicted · none · ref 18
SSAS improves LLM sentiment prediction consistency and data quality by up to 30% on three review datasets via syntactic and semantic context assessment summarization.
Neutrality Bites: Gender Representation in AI-Generated Animal Stories cs.CL · 2026-06-06 · unverdicted · none · ref 5
LLMs exhibit masculine bias when assigning gender to animal characters in generated stories, with neutrality often resulting in erasure of feminine perspectives.
Effects of Varying LLM Access on Essay Writing Behavior cs.CL · 2026-05-29 · unverdicted · none · ref 9
Pilot experiment shows limited LLM access maintains higher student ownership and strategic use than unlimited access, with no difference in essay quality.
Tracing the ongoing emergence of human-like reasoning in Large Language Models cs.CL · 2026-05-20 · unverdicted · none · ref 47
LLMs function as accurate semantic processors for conditionals but do not replicate the pragmatic inferences that define human reasoning.
Gyan: An Explainable Neuro-Symbolic Language Model cs.CL · 2026-05-06 · unverdicted · none · ref 11
Gyan is a novel explainable non-transformer language model that achieves SOTA results on multiple datasets by mimicking human-like compositional context and world models.
Leveraging Weighted Syntactic and Semantic Context Assessment Summary (wSSAS) Towards Text Categorization Using LLMs cs.CL · 2026-04-13 · unverdicted · none · ref 11
wSSAS is a two-phase deterministic framework that uses hierarchical text organization and SNR-based feature prioritization to improve clustering integrity, categorization accuracy, and reproducibility when applying LLMs to large review datasets.
Weird Generalization is Weirdly Brittle cs.CL · 2026-04-11 · unverdicted · none · ref 4
Weird generalization in fine-tuned models is brittle, appearing only in specific cases and disappearing under prompt-based interventions that make the undesired behavior expected.
Beyond Hooking Onto the World: Referential Profiles and the Numerical Structure of LLM Grounding cs.CL · 2026-06-19 · unverdicted · none · ref 2
LLMs realize derivative referential profiles through distributed numerical structures in their parameters and activations, indirectly supported by mechanistic interpretability findings.

Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell

hub tools

citation-role summary

citation-polarity summary

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer