WildChat releases a dataset of 1 million ChatGPT conversations with timestamps, demographics, and headers, claimed to be the most diverse and multilingual such resource available.
super hub Mixed citations
Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell
Mixed citation behavior. Most common role is background (64%).
hub tools
citation-role summary
citation-polarity summary
authors
co-cited works
representative citing papers
DT² trains digital twins to preserve pairwise policy rankings from fitted Q-evaluation on offline data rather than minimizing one-step transition errors, improving policy ranking and reducing decision regret.
CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.
Polar is a new cross-context benchmark showing LLM political bias measurements are not fixed but vary with country, issue, model, and language.
Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.
LazyAttention kernelizes deferred positional encoding to enable zero-copy, position-agnostic KV cache reuse, delivering 1.37× lower TTFT and 1.40× higher throughput than Block-Attention under skewed document distributions while preserving output quality.
LLM outputs are meaningful according to standard theories of human language, without requiring anthropomorphic assumptions about the models.
The authors introduce a three-level formality spectrum (informal, casual, formal) and the 3LF dataset to correct supervision misalignment in formality transfer, reporting large gains in informal-to-formal performance on models including GPT variants.
A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
Causal tracing reveals a persistent Refusal Trajectory in LLM hidden states; SALO detector using sparse activations from a layer window improves jailbreak detection across Qwen, Llama, and Mistral models.
R-CAI inverts constitutional AI to automatically generate diverse toxic data for LLM red teaming, with probability clamping improving output coherence by 15% while preserving adversarial strength.
SPAGBias reveals that LLMs form nuanced gender associations with specific urban micro-spaces that exceed real-world distributions and produce failures in planning and descriptive tasks.
LLMs display a consistent pattern of elevated form-meaning divergence and uniform rhetorical device use in argumentative texts compared to humans, quantified by new metrics FMD, GPR, and RDDE.
Introduces LLM-mediated computing as a paradigm of reflective conversation and co-disclosure where the computer emerges through human-LLM interaction.
VISE is the first benchmark for sycophancy in Video-LLMs, with two training-free mitigation strategies based on key-frame selection and internal representation steering.
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
Authors share a new dataset of GPT-4 behavior-change conversations with user language metrics, perception measures, and feedback collected in a preregistered study.
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
LLMs default to responses more similar to opinions from the USA and some European and South American countries; prompting for a country shifts alignment but can introduce stereotypes, while translation does not reliably match language speakers.
Grain calibration decomposes theoretical constructs into clause-level components, tests each with extractive evidence, and combines results through explicit theory-derived rules to validate LLM coding beyond agreement with human annotators.
Empirical audit of LAION-2B-en and LAION-2B-multi finds overrepresentation of young adults, White people, and males plus stereotypical emotion associations across two attribute classifiers.
Empirical study of LLM brand recommendations across industries finds moderate concentration (mean Gini 0.28) and low cross-model agreement (41.6%) on top brands.
LLMs exhibit misfired alignment on stereotype questions at 4.7-18.9% rates on the new VETO benchmark of 2,032 contrastive pairs, unlike humans at 0%, due to overgeneralized safety cues after instruction tuning.
citing papers explorer
-
"It became a self-fulfilling prophecy": How Lived Experiences are Entangled with AI Predictions in Menstrual Cycle Tracking Apps
Users entangle their lived experiences with AI predictions in menstrual tracking apps, leading to self-fulfilling prophecies, limited critical awareness from UI, and isolation for non-normative users.
-
Push and Pushback in Contesting AI: Demands for and Resistance to Accountability
Thematic analysis of 43 AI contestation cases, using Bovens's relational accountability model, produces categories of demands from below, institutional pushback, outcomes, and contextual factors shaping effective contestation.
-
AI at the Front Lines of Platform Governance: Using LLMs to Support Illegal Content Reporting under the Digital Services Act
EvalAI providing pro/con arguments improves provision-level accuracy and reduces misclassification distance in DSA illegal content reporting under AI error conditions versus conventional XAI.
-
Position: AI as Part of Self -- Extending the Mind Requires Cognitive Co-Regulation
The paper claims that alignment requires treating AI as part of the self through cognitive co-regulation, identifying risks like deskilling and automation bias while drawing on System 0 cognition theory.
-
"From remembering to shaping": Narrating Shared Experiences by Co-Designing Cultural Heritage Artifacts in Collaborative VR
A collaborative VR workflow with GenAI lets users merge prompts and creatively repurpose outputs to co-create 3D artifacts that narrate shared cultural heritage experiences.
-
Framing an AI with Values Reduces AI Reliance in AI-supported Writing Tasks
An online experiment finds that showing users an overview of an AI's values reduces reliance on AI suggestions during writing tasks.