WildChat releases a dataset of 1 million ChatGPT conversations with timestamps, demographics, and headers, claimed to be the most diverse and multilingual such resource available.
super hub Mixed citations
Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell
Mixed citation behavior. Most common role is background (62%).
hub tools
citation-role summary
citation-polarity summary
authors
co-cited works
representative citing papers
DT² trains digital twins to preserve pairwise policy rankings from fitted Q-evaluation on offline data rather than minimizing one-step transition errors, improving policy ranking and reducing decision regret.
CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.
Polar is a new cross-context benchmark showing LLM political bias measurements are not fixed but vary with country, issue, model, and language.
Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.
LazyAttention kernelizes deferred positional encoding to enable zero-copy, position-agnostic KV cache reuse, delivering 1.37× lower TTFT and 1.40× higher throughput than Block-Attention under skewed document distributions while preserving output quality.
LLM outputs are meaningful according to standard theories of human language, without requiring anthropomorphic assumptions about the models.
The authors introduce a three-level formality spectrum (informal, casual, formal) and the 3LF dataset to correct supervision misalignment in formality transfer, reporting large gains in informal-to-formal performance on models including GPT variants.
A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
Causal tracing reveals a persistent Refusal Trajectory in LLM hidden states; SALO detector using sparse activations from a layer window improves jailbreak detection across Qwen, Llama, and Mistral models.
R-CAI inverts constitutional AI to automatically generate diverse toxic data for LLM red teaming, with probability clamping improving output coherence by 15% while preserving adversarial strength.
SPAGBias reveals that LLMs form nuanced gender associations with specific urban micro-spaces that exceed real-world distributions and produce failures in planning and descriptive tasks.
LLMs display a consistent pattern of elevated form-meaning divergence and uniform rhetorical device use in argumentative texts compared to humans, quantified by new metrics FMD, GPR, and RDDE.
Introduces LLM-mediated computing as a paradigm of reflective conversation and co-disclosure where the computer emerges through human-LLM interaction.
VISE is the first benchmark for sycophancy in Video-LLMs, with two training-free mitigation strategies based on key-frame selection and internal representation steering.
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
Authors share a new dataset of GPT-4 behavior-change conversations with user language metrics, perception measures, and feedback collected in a preregistered study.
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
LLMs default to responses more similar to opinions from the USA and some European and South American countries; prompting for a country shifts alignment but can introduce stereotypes, while translation does not reliably match language speakers.
NLP authors show migration from *ACL flagship tracks (–19.2pp) to Findings (+14.8pp) and ML venues (+8.6pp), with new authors increasing ML share from 5% to 21% and causal inference indicating a citation premium drives venue choice.
Big Five inventories fail to capture meaningful differences or recover the five-factor structure in LLMs, with only 3% variance between models and four facets collapsing (r >= .92).
Grain calibration decomposes theoretical constructs into clause-level components, tests each with extractive evidence, and combines results through explicit theory-derived rules to validate LLM coding beyond agreement with human annotators.
Empirical audit of LAION-2B-en and LAION-2B-multi finds overrepresentation of young adults, White people, and males plus stereotypical emotion associations across two attribute classifiers.
citing papers explorer
No citing papers match the current filters.