The price of format: Diversity collapse in llms

Longfei Yun, Chenyang An, Zilong Wang, Letian Peng, Jingbo Shang · 2025 · arXiv 2505.18949

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

dataset 1 method 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

The Extrapolation Cliff in On-Policy Distillation of Near-Deterministic Structured Outputs

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

On-policy distillation has an extrapolation cliff at closed-form lambda*(p,b,c) set by teacher modal probability, warm-start mass, and clip strength, past which training shifts from format-preserving to format-collapsing.

Ex Ante Evaluation of AI-Induced Idea Diversity Collapse

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

Frontier LLMs generate creative ideas with excess population-level crowding below human-relative parity across tasks, but targeted generation protocols can reduce it.

Schema Key Wording as an Instruction Channel in Structured Generation under Constrained Decoding

cs.CL · 2026-04-16 · unverdicted · novelty 7.0

Schema-key wording functions as an implicit instruction channel under constrained decoding, with experiments showing that rephrasing only the keys can substantially change accuracy on math benchmarks while prompt, model, structure, and decoding remain unchanged.

Unlocking LLM Creativity in Science through Analogical Reasoning

cs.AI · 2026-05-11 · conditional · novelty 6.0

Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.

Annotations Mitigate Post-Training Mode Collapse

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

Annotation-anchored training reduces semantic diversity collapse in post-trained language models by a factor of six compared to standard supervised fine-tuning while preserving instruction-following and improving with scale.

LLM-Guided Semantic Bootstrapping for Interpretable Text Classification with Tsetlin Machines

cs.CL · 2026-04-14 · unverdicted · novelty 6.0

A bootstrapping framework transfers LLM semantic knowledge into Tsetlin Machines via synthetic data curricula and cue extraction, yielding interpretable classifiers competitive with BERT.

citing papers explorer

Showing 6 of 6 citing papers.

The Extrapolation Cliff in On-Policy Distillation of Near-Deterministic Structured Outputs cs.LG · 2026-05-09 · unverdicted · none · ref 44
On-policy distillation has an extrapolation cliff at closed-form lambda*(p,b,c) set by teacher modal probability, warm-start mass, and clip strength, past which training shifts from format-preserving to format-collapsing.
Ex Ante Evaluation of AI-Induced Idea Diversity Collapse cs.AI · 2026-05-07 · unverdicted · none · ref 53
Frontier LLMs generate creative ideas with excess population-level crowding below human-relative parity across tasks, but targeted generation protocols can reduce it.
Schema Key Wording as an Instruction Channel in Structured Generation under Constrained Decoding cs.CL · 2026-04-16 · unverdicted · none · ref 23
Schema-key wording functions as an implicit instruction channel under constrained decoding, with experiments showing that rephrasing only the keys can substantially change accuracy on math benchmarks while prompt, model, structure, and decoding remain unchanged.
Unlocking LLM Creativity in Science through Analogical Reasoning cs.AI · 2026-05-11 · conditional · none · ref 55
Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.
Annotations Mitigate Post-Training Mode Collapse cs.CL · 2026-05-11 · unverdicted · none · ref 4
Annotation-anchored training reduces semantic diversity collapse in post-trained language models by a factor of six compared to standard supervised fine-tuning while preserving instruction-following and improving with scale.
LLM-Guided Semantic Bootstrapping for Interpretable Text Classification with Tsetlin Machines cs.CL · 2026-04-14 · unverdicted · none · ref 7
A bootstrapping framework transfers LLM semantic knowledge into Tsetlin Machines via synthetic data curricula and cue extraction, yielding interpretable classifiers competitive with BERT.

The price of format: Diversity collapse in llms

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer