ShinkaEvolve improves sample efficiency in LLM-driven program evolution via parent sampling, code novelty rejection-sampling, and bandit LLM ensemble selection, achieving new SOTA circle packing with 150 samples and gains on math reasoning and competitive programming tasks.
Title resolution pending
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
Repeating 0.1% of training data 100 times degrades an 800M parameter model's performance to that of a 400M model by damaging copying mechanisms and induction heads associated with generalization.
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.
Continued AI scaling remains feasible only if efficiency doublings recur repeatedly to keep logical compute affordable.
citing papers explorer
-
ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution
ShinkaEvolve improves sample efficiency in LLM-driven program evolution via parent sampling, code novelty rejection-sampling, and bandit LLM ensemble selection, achieving new SOTA circle packing with 150 samples and gains on math reasoning and competitive programming tasks.
-
Language Models (Mostly) Know What They Know
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
-
Scaling Laws for Transfer
Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.
-
Continued AI Scaling Requires Repeated Efficiency Doublings
Continued AI scaling remains feasible only if efficiency doublings recur repeatedly to keep logical compute affordable.