Introduces the Synthetic Data Contamination Equilibrium and derives closed-form optimal provenance subsidies s* = KL(q||p)/(2 kappa) plus watermark strengths to mitigate model collapse, validated by OLS matching structural predictions on C4 data.
hub
Is model collapse inevitable? Breaking the curse of recursion by accumulating real and synthetic data
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Recursive generative retraining with pluralistic preferences converges to a stable diverse distribution that satisfies a weighted Nash bargaining solution.
By mid-2025 roughly 35% of new websites are AI-generated or AI-assisted, correlating with lower semantic diversity and higher positive sentiment but showing no significant drop in factual accuracy or stylistic diversity.
Recursive LLM text generation drives public corpora toward shallow equilibria via drift unless normative selection for quality sustains deeper structure with a bounded divergence.
FuXi-TC combines the FuXi global DL model with a diffusion generative framework to downscale and improve TC intensity and precipitation forecasts, matching ECMWF skill while being faster and generalizing zero-shot to North Atlantic hurricanes.
Filter Babel explores a future of AI-personalized private experiences that may erode common ground in communication while supporting individual identity and selfhood.
LLM integration in software engineering builds epistemological debt that erodes mental models and homogenizes code via recursive training, risking systemic fragility as illustrated by 2026 Amazon outages.
Knowledge distillation evaluations must report lost teacher capabilities via a Distillation Loss Statement rather than relying solely on task scores.
The book introduces the origins, mathematical setup, and optimization stages of RLHF including reward modeling, reinforcement learning, rejection sampling, and direct alignment algorithms.
citing papers explorer
-
Drift and selection in LLM text ecosystems
Recursive LLM text generation drives public corpora toward shallow equilibria via drift unless normative selection for quality sustains deeper structure with a bounded divergence.