Base models beat aligned models at random- ness and creativity

Peter West, Christopher Potts · 2025 · arXiv 2505.00047

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 3

citation-polarity summary

background 2 support 1

representative citing papers

AI Coding Agents in Social Science: Methodologically Diverse, Empirically Consistent, Interpretively Vulnerable

cs.CL · 2026-06-09 · unverdicted · novelty 7.0

LLM agents match or exceed human methodological diversity and produce aligned effect estimates, yet flip final verdicts from 10% to 90% support under a confirmatory prompt while leaving coefficients unchanged.

Fine-Tuning Improves Information Conveyance in Language Models

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

Fine-tuning reorganizes uncertainty in LLMs into more efficient information conveyance, as shown by stronger length-entropy correlations and a tripling of entropy-semantic diversity links after controls.

Activation Steering for Synthetic Data Generation: The Role of Diversity in Downstream Safety Detection

cs.LG · 2026-05-27 · unverdicted · novelty 6.0

Activation steering produces synthetic safety-violating data that improves downstream classifiers over prompting on most tested concepts when a harmonic mean of alignment, coherence, and diversity is optimized.

Unlocking LLM Creativity in Science through Analogical Reasoning

cs.AI · 2026-05-11 · conditional · novelty 6.0

Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.

Annotations Mitigate Post-Training Mode Collapse

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

Annotation-anchored training reduces semantic diversity collapse in post-trained language models by a factor of six compared to standard supervised fine-tuning while preserving instruction-following and improving with scale.

Towards Physical Intuitions for Alignment Dynamics: A Case Study With Randomness Crystallization

cs.CL · 2026-06-29 · unverdicted · novelty 5.0

Proposes a three-phase crystallization model (liquid, nucleation via SFT, settling via RL) for alignment dynamics using random number generation tasks as case study.

IDEAFix: Evaluation Framework for Creative Defixation Prompting in LLMs

cs.CL · 2026-05-30 · unverdicted · novelty 5.0

IDEAFix is an evaluation framework that varies task attributes and defixation prompts in LLM idea generation, showing task formulation affects performance while simple prompts boost originality but homogenization persists.

The Homogenization Problem in LLMs: Towards Meaningful Diversity in AI Safety

cs.AI · 2026-01-03 · 2 refs

citing papers explorer

Showing 6 of 6 citing papers after filters.

AI Coding Agents in Social Science: Methodologically Diverse, Empirically Consistent, Interpretively Vulnerable cs.CL · 2026-06-09 · unverdicted · none · ref 14
LLM agents match or exceed human methodological diversity and produce aligned effect estimates, yet flip final verdicts from 10% to 90% support under a confirmatory prompt while leaving coefficients unchanged.
Fine-Tuning Improves Information Conveyance in Language Models cs.CL · 2026-05-29 · unverdicted · none · ref 37
Fine-tuning reorganizes uncertainty in LLMs into more efficient information conveyance, as shown by stronger length-entropy correlations and a tripling of entropy-semantic diversity links after controls.
Activation Steering for Synthetic Data Generation: The Role of Diversity in Downstream Safety Detection cs.LG · 2026-05-27 · unverdicted · none · ref 54
Activation steering produces synthetic safety-violating data that improves downstream classifiers over prompting on most tested concepts when a harmonic mean of alignment, coherence, and diversity is optimized.
Annotations Mitigate Post-Training Mode Collapse cs.CL · 2026-05-11 · unverdicted · none · ref 28
Annotation-anchored training reduces semantic diversity collapse in post-trained language models by a factor of six compared to standard supervised fine-tuning while preserving instruction-following and improving with scale.
Towards Physical Intuitions for Alignment Dynamics: A Case Study With Randomness Crystallization cs.CL · 2026-06-29 · unverdicted · none · ref 2
Proposes a three-phase crystallization model (liquid, nucleation via SFT, settling via RL) for alignment dynamics using random number generation tasks as case study.
IDEAFix: Evaluation Framework for Creative Defixation Prompting in LLMs cs.CL · 2026-05-30 · unverdicted · none · ref 14
IDEAFix is an evaluation framework that varies task attributes and defixation prompts in LLM idea generation, showing task formulation affects performance while simple prompts boost originality but homogenization persists.

Base models beat aligned models at random- ness and creativity

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer