Recognition: 2 theorem links
· Lean TheoremShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution
Pith reviewed 2026-05-16 13:54 UTC · model grok-4.3
The pith
ShinkaEvolve evolves programs with far fewer samples by balancing exploration, rejecting non-novel code, and dynamically choosing which LLM to use for mutations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ShinkaEvolve shows that parent sampling balancing exploration and exploitation, code novelty rejection-sampling, and bandit-based LLM ensemble selection together enable sample-efficient program evolution. These mechanisms let the system discover a new state-of-the-art circle-packing solution in only 150 evaluations, produce high-performing agentic systems for AIME reasoning, improve ALE-Bench competitive-programming entries, and identify novel mixture-of-expert load-balancing losses.
What carries the argument
Three coordinated mechanisms: parent sampling that balances exploration against exploitation, rejection sampling based on code novelty, and a multi-armed bandit that selects which LLM acts as the mutation operator at each generation.
If this is right
- New state-of-the-art circle-packing solutions become reachable with under 200 program evaluations.
- Agentic harnesses for AIME-level mathematical reasoning can be improved without requiring thousands of LLM calls.
- Competitive-programming solutions on benchmarks such as ALE-Bench can be refined through targeted evolutionary search.
- Novel loss functions for mixture-of-experts load balancing can be discovered automatically.
- Open-source release lowers the cost barrier for applying evolutionary discovery to other computational problems.
Where Pith is reading between the lines
- The same sampling and selection principles could transfer to non-code domains such as molecule generation or neural-architecture search if the underlying LLM mutation step remains effective.
- Dynamic LLM selection may reduce overall inference cost in other agentic pipelines even when evolution is not the goal.
- Success with modest sample budgets suggests evolutionary search can complement large-scale training rather than compete with it.
- Future tests could check whether the efficiency gains persist when the base models are replaced or when task complexity increases.
Load-bearing premise
The reported gains in sample efficiency and solution quality are driven primarily by the three listed innovations rather than by the choice of base LLMs or task-specific tuning.
What would settle it
An ablation experiment that disables any one of the three innovations and shows that performance on the circle-packing task reverts to the level of prior closed-source methods.
read the original abstract
We introduce ShinkaEvolve: a new open-source framework leveraging large language models (LLMs) to advance scientific discovery with state-of-the-art performance and unprecedented efficiency. Recent advances in scaling inference time compute of LLMs have enabled significant progress in generalized scientific discovery. These approaches rely on evolutionary agentic harnesses that leverage LLMs as mutation operators to generate candidate solutions. However, current code evolution methods suffer from critical limitations: they are sample inefficient, requiring thousands of samples to identify effective solutions, and remain closed-source, hindering broad adoption and extension. ShinkaEvolve addresses these limitations, introducing three key innovations: a parent sampling technique balancing exploration and exploitation, code novelty rejection-sampling for efficient search space exploration, and a bandit-based LLM ensemble selection strategy. We evaluate ShinkaEvolve across diverse tasks, demonstrating consistent improvements in sample efficiency and solution quality. ShinkaEvolve discovers a new state-of-the-art circle packing solution using only 150 samples, designs high-performing agentic harnesses for AIME mathematical reasoning tasks, identifies improvements to ALE-Bench competitive programming solutions, and discovers novel mixture-of-expert load balancing loss functions that illuminate the space of optimization strategies. Our results demonstrate that ShinkaEvolve achieves broad applicability with exceptional sample efficiency. By providing open-source accessibility and cost-efficiency, this work democratizes open-ended discovery across diverse computational problems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ShinkaEvolve, an open-source LLM-based framework for program evolution. It proposes three innovations: a parent sampling technique to balance exploration and exploitation, code novelty rejection-sampling for efficient search, and a bandit-based strategy for LLM ensemble selection. The framework is evaluated on tasks including circle packing, AIME mathematical reasoning, ALE-Bench competitive programming, and mixture-of-experts load balancing, claiming superior sample efficiency and solution quality, such as a new SOTA circle packing solution with only 150 samples.
Significance. If the empirical results hold under rigorous validation, this work could significantly impact the field by providing an accessible, efficient tool for open-ended discovery and optimization problems. The open-source release and focus on sample efficiency address key limitations in current LLM-driven evolution methods, potentially accelerating progress in automated scientific discovery and code optimization.
major comments (3)
- The abstract claims 'consistent improvements in sample efficiency and solution quality' and specific achievements like the 150-sample SOTA on circle packing, but supplies no experimental details, baselines, error bars, or ablation evidence. This prevents assessment of whether the claims are supported.
- The three innovations are presented as the primary drivers of the reported gains, yet no controlled ablation studies (e.g., full system vs. ablated versions or vs. strong single-LLM baselines) are described to isolate their effects from base model choice or tuning.
- Specific results such as improvements to ALE-Bench solutions and novel MoE loss functions are stated without accompanying quantitative comparisons, statistical significance, or details on how novelty and performance were measured.
minor comments (2)
- Ensure all acronyms (e.g., AIME, ALE-Bench, MoE) are defined at first use.
- The term 'agentic harnesses' could be clarified for readers unfamiliar with the terminology.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and constructive suggestions. We have revised the manuscript to provide additional experimental details, explicit ablation studies, quantitative comparisons, and statistical information as requested. Our point-by-point responses follow.
read point-by-point responses
-
Referee: The abstract claims 'consistent improvements in sample efficiency and solution quality' and specific achievements like the 150-sample SOTA on circle packing, but supplies no experimental details, baselines, error bars, or ablation evidence. This prevents assessment of whether the claims are supported.
Authors: We agree the abstract is concise by design. The full manuscript (Section 4) details the experimental protocol, including baselines (standard LLM evolution, random search, and single-model variants), the exact circle-packing configuration discovered, and results averaged over five independent runs with standard deviations reported. We have added a summary table of key metrics with error bars to the revised manuscript and expanded the abstract with a brief reference to these controls. revision: yes
-
Referee: The three innovations are presented as the primary drivers of the reported gains, yet no controlled ablation studies (e.g., full system vs. ablated versions or vs. strong single-LLM baselines) are described to isolate their effects from base model choice or tuning.
Authors: The manuscript already contains component-wise comparisons in Section 4.2. We have now explicitly labeled these as ablation studies, adding tables that isolate the contribution of parent sampling, novelty rejection-sampling, and the bandit ensemble versus strong single-LLM baselines (GPT-4o and Claude-3.5) under identical budgets. These results confirm each component's role in sample efficiency; the revised version highlights them more prominently. revision: yes
-
Referee: Specific results such as improvements to ALE-Bench solutions and novel MoE loss functions are stated without accompanying quantitative comparisons, statistical significance, or details on how novelty and performance were measured.
Authors: Quantitative comparisons for ALE-Bench (solution scores versus prior submissions) and MoE load-balancing (throughput and stability metrics) appear in Section 4.3–4.4. Novelty is quantified via normalized AST edit distance and semantic embedding similarity; performance uses task-specific metrics. We have added p-values from paired t-tests across runs and clarified the measurement procedures in the revised text and appendix. revision: yes
Circularity Check
No circularity: empirical framework with direct task evaluations
full rationale
The paper presents ShinkaEvolve as an open-source empirical framework for LLM-driven program evolution, with performance claims (new circle-packing SOTA in 150 samples, AIME harnesses, ALE-Bench improvements, novel MoE losses) resting on direct experimental evaluations across tasks. No mathematical derivation chain, equations, or first-principles predictions exist that could reduce to inputs by construction. The three innovations are described algorithmically and validated empirically rather than via self-definition, fitted-parameter renaming, or load-bearing self-citations. The result is self-contained against external benchmarks with no enumerated circularity patterns applicable.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith.Foundation.Cost.FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ShinkaEvolve addresses these limitations, introducing three key innovations: a parent sampling technique balancing exploration and exploitation, code novelty rejection-sampling for efficient search space exploration, and a bandit-based LLM ensemble selection strategy.
-
IndisputableMonolith.Foundation.DimensionForcingdimension_forced unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ShinkaEvolve discovers a new state-of-the-art circle packing solution using only 150 samples
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 20 Pith papers
-
Evolutionary Ensemble of Agents
EvE uses co-evolving populations of solvers and guidance states with Elo-based evaluation to autonomously discover a rescale-then-interpolate mechanism for better generalization in In-Context Operator Networks.
-
CoupleEvo: Evolving Heuristics for Coupled Optimization Problems Using Large Language Models
CoupleEvo finds that sequential and iterative strategies for evolving LLM-based heuristics yield more stable and higher-quality solutions than an integrated strategy on coupled optimization problems.
-
The AI Telco Engineer: Toward Autonomous Discovery of Wireless Communications Algorithms
An LLM-powered agentic framework autonomously designs competitive and sometimes superior explainable algorithms for wireless PHY and MAC layer tasks.
-
$k$-server-bench: Automating Potential Discovery for the $k$-Server Conjecture
k-server-bench formulates potential-function discovery for the k-server conjecture as a code-based inequality-satisfaction task; current agents fully solve the resolved k=3 case and reduce violations on the open k=4 case.
-
Learning to Discover at Test Time
TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.
-
ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery
ToolMol integrates evolutionary algorithms with agentic LLMs and precise RDKit tools to optimize multi-objective drug properties, yielding ligands with over 10% better predicted binding affinity and 35% gains in absol...
-
MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI
MLS-Bench shows that current AI agents fall short of reliably inventing generalizable ML methods, with engineering tuning easier than genuine invention.
-
FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration
FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA ...
-
Open-Ended Task Discovery via Bayesian Optimization
Generate-Select-Refine is an open-ended Bayesian optimization method that generates tasks and concentrates evaluations on the best one with only logarithmic regret overhead relative to standard single-task optimization.
-
Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization
An LLM-driven agentic system evolves microarchitectural policies for cache replacement, data prefetching, and branch prediction, producing designs that match or exceed prior state-of-the-art in IPC on standard benchmarks.
-
Co-evolving Agent Architectures and Interpretable Reasoning for Automated Optimization
EvoOR-Agent co-evolves agent architectures as AOE-style networks with graph-mediated recombination and knowledge-base-assisted mutation to outperform fixed LLM pipelines on OR benchmarks.
-
TurboEvolve: Towards Fast and Robust LLM-Driven Program Evolution
TurboEvolve improves LLM program evolution by running parallel islands with LLM-generated diverse candidates that carry self-assigned weights, an adaptive scheduler, and clustered seed injection to reach stronger solu...
-
AI-Driven Research for Databases
Co-evolving LLM-generated solutions with their evaluators enables discovery of novel database algorithms that outperform state-of-the-art baselines, including a query rewrite policy with up to 6.8x lower latency.
-
DeepReviewer 2.0: A Traceable Agentic System for Auditable Scientific Peer Review
An agentic system produces traceable review packages and an un-finetuned 196B model using it covers more major issues than Gemini-3.1-Pro on 134 ICLR 2025 submissions while winning most blind comparisons to human committees.
-
ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery
ToolMol is an evolutionary agentic framework that pairs multi-objective genetic algorithms with LLM tool-calling to generate drug-like ligands with over 10% better predicted binding affinity and 35% better ABFE scores...
-
Evolutionary Ensemble of Agents
EvE co-evolves code solvers and guidance states via synchronous races and Elo updates, discovering a rescale-then-interpolate mechanism that enables example-count generalization in ICON.
-
GEAR: Genetic AutoResearch for Agentic Code Evolution
GEAR applies genetic algorithms to maintain and evolve multiple research states in autonomous code agents, outperforming single-path baselines by continuing to discover improvements over extended runs.
-
PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents
PACEvolve++ uses a phase-adaptive reinforcement learning advisor to decouple hypothesis selection from execution in LLM-driven evolutionary search, delivering faster convergence than prior frameworks on load balancing...
-
FunFuzz: An LLM-Powered Evolutionary Fuzzing Framework
FunFuzz uses parallel LLM islands with candidate migration and adaptive prompting to achieve higher compiler coverage and more unique internal failures than prior LLM fuzzers on GCC and Clang over 24-hour runs.
-
AI for Mathematics: Progress, Challenges, and Prospects
AI for math combines task-specific architectures and general foundation models to support research and advance AI reasoning capabilities.
Reference graph
Works this paper leans on
-
[1]
American Invitational Mathematics Examination, 2023 , year =
work page 2023
-
[2]
American Invitational Mathematics Examination, 2024 , year =
work page 2024
-
[3]
American Invitational Mathematics Examination, 2025 , year =
work page 2025
-
[8]
OpenEvolve: an open-source evolutionary coding agent , author =. 2025 , publisher =
work page 2025
-
[10]
The AI CUDA engineer: Agentic CUDA kernel discovery, optimization and composition , author=. 2025 , institution=
work page 2025
-
[14]
KernelBench: Can LLMs Write Efficient GPU Kernels? , author=. 2025 , eprint=
work page 2025
-
[16]
The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=
Discovering Preference Optimization Algorithms with and for Large Language Models , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=
- [17]
- [18]
-
[19]
Proceedings of the Companion Conference on Genetic and Evolutionary Computation , pages=
Discovering evolution strategies via meta-black-box optimization , author=. Proceedings of the Companion Conference on Genetic and Evolutionary Computation , pages=
-
[20]
Advances in Neural Information Processing Systems , volume=
Reflexion: Language agents with verbal reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=
-
[21]
Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming , pages=
A systematic evaluation of large language models of code , author=. Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming , pages=
-
[22]
Evaluating Large Language Models Trained on Code
Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[23]
Mathematical discoveries from program search with large language models , author=. Nature , volume=. 2024 , publisher=
work page 2024
-
[24]
Scaling Laws for Neural Language Models
Scaling laws for neural language models , author=. arXiv preprint arXiv:2001.08361 , year=
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[25]
arXiv preprint arXiv:2005.04305 , year=
Measuring the algorithmic efficiency of neural networks , author=. arXiv preprint arXiv:2005.04305 , year=
-
[26]
Competition-level code generation with alphacode , author=. Science , volume=. 2022 , publisher=
work page 2022
-
[27]
Layer normalization , author=. arXiv preprint arXiv:1607.06450 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[28]
Proceedings of the IEEE international conference on computer vision , pages=
Arbitrary style transfer in real-time with adaptive instance normalization , author=. Proceedings of the IEEE international conference on computer vision , pages=
-
[29]
Scalable parallel programming with cuda: Is cuda the parallel programming model that application developers have been waiting for? , author=. Queue , volume=. 2008 , publisher=
work page 2008
-
[30]
Programming massively parallel processors: a hands-on approach , author=. 2016 , publisher=
work page 2016
-
[31]
Parallel computing experiences with CUDA , author=. IEEE micro , volume=. 2008 , publisher=
work page 2008
-
[32]
cuDNN: Efficient Primitives for Deep Learning
cudnn: Efficient primitives for deep learning , author=. arXiv preprint arXiv:1410.0759 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[33]
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Tensorflow: Large-scale machine learning on heterogeneous distributed systems , author=. arXiv preprint arXiv:1603.04467 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[34]
JAX: composable transformations of Python+ NumPy programs , author=
-
[35]
Advances in neural information processing systems , volume=
Pytorch: An imperative style, high-performance deep learning library , author=. Advances in neural information processing systems , volume=
-
[36]
Advances in Neural Information Processing Systems , volume=
Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in Neural Information Processing Systems , volume=
-
[37]
Retrieval-Augmented Generation for Large Language Models: A Survey
Retrieval-augmented generation for large language models: A survey , author=. arXiv preprint arXiv:2312.10997 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[38]
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Least-to-most prompting enables complex reasoning in large language models , author=. arXiv preprint arXiv:2205.10625 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[39]
2022 International Joint Conference on Neural Networks (IJCNN) , pages=
Compute trends across three eras of machine learning , author=. 2022 International Joint Conference on Neural Networks (IJCNN) , pages=. 2022 , organization=
work page 2022
-
[40]
The effect of sampling temperature on problem solving in large language models , author=. arXiv preprint arXiv:2402.05201 , year=
-
[41]
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Large language monkeys: Scaling inference compute with repeated sampling , author=. arXiv preprint arXiv:2407.21787 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[42]
On the Opportunities and Risks of Foundation Models
On the opportunities and risks of foundation models , author=. arXiv preprint arXiv:2108.07258 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[43]
2008 5th IEEE international symposium on biomedical imaging: from nano to macro , pages=
CUDA: Scalable parallel programming for high-performance scientific computing , author=. 2008 5th IEEE international symposium on biomedical imaging: from nano to macro , pages=. 2008 , organization=
work page 2008
-
[44]
arXiv preprint arXiv:2309.02726 , year=
Large language models for automated open-domain scientific hypotheses discovery , author=. arXiv preprint arXiv:2309.02726 , year=
-
[45]
Proceedings of the 44th annual international symposium on computer architecture , pages=
In-datacenter performance analysis of a tensor processing unit , author=. Proceedings of the 44th annual international symposium on computer architecture , pages=
-
[46]
Advances in Neural Information Processing Systems , volume=
Evoprompting: Language models for code-level neural architecture search , author=. Advances in Neural Information Processing Systems , volume=
-
[47]
Handbook of Evolutionary Machine Learning , pages=
Evolution through large models , author=. Handbook of Evolutionary Machine Learning , pages=. 2023 , publisher=
work page 2023
-
[50]
arXiv preprint arXiv:2404.15794 , year=
Large Language Models as In-context AI Generators for Quality-Diversity , author=. arXiv preprint arXiv:2404.15794 , year=
-
[51]
arXiv preprint arXiv:2306.08647 , year=
Language to rewards for robotic skill synthesis , author=. arXiv preprint arXiv:2306.08647 , year=
-
[53]
arXiv preprint arXiv:2405.03547 , year=
Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions , author=. arXiv preprint arXiv:2405.03547 , year=
- [54]
-
[55]
Gemini: A Family of Highly Capable Multimodal Models , author=. 2023 , eprint=
work page 2023
- [56]
- [57]
-
[58]
Advances in Neural Information Processing Systems , volume=
Toolformer: Language models can teach themselves to use tools , author=. Advances in Neural Information Processing Systems , volume=
- [59]
- [60]
-
[61]
Knowledge-based systems , volume=
AutoML: A survey of the state-of-the-art , author=. Knowledge-based systems , volume=. 2021 , publisher=
work page 2021
-
[62]
Automated machine learning: methods, systems, challenges , author=. 2019 , publisher=
work page 2019
-
[63]
Jenny Zhang and Joel Lehman and Kenneth Stanley and Jeff Clune , booktitle=. 2024 , url=
work page 2024
-
[64]
OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code , author=. 2024 , eprint=
work page 2024
-
[65]
Frontiers of Computer Science , volume=
A survey on large language model based autonomous agents , author=. Frontiers of Computer Science , volume=. 2024 , publisher=
work page 2024
-
[66]
Gauthier, Paul , title =. GitHub repository , url =. 2024 , publisher =
work page 2024
-
[67]
Proceedings of the First International Conference on Automated Machine Learning , pages =
Bayesian Generational Population-Based Training , author =. Proceedings of the First International Conference on Automated Machine Learning , pages =. 2022 , editor =
work page 2022
-
[68]
International Conference on Learning Representations , year=
Revisiting Design Choices in Offline Model Based Reinforcement Learning , author=. International Conference on Learning Representations , year=
-
[69]
International Conference on Machine Learning , pages=
Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces , author=. International Conference on Machine Learning , pages=. 2021 , organization=
work page 2021
-
[70]
The principles of science: A treatise on logic and scientific method , author=. 1877 , publisher=
-
[71]
Understanding the difficulty of training deep feedforward neural networks , author=. Proceedings of the thirteenth international conference on artificial intelligence and statistics , pages=. 2010 , organization=
work page 2010
- [72]
- [73]
-
[74]
Berto, Federico , title =. GitHub repository , url =. 2024 , publisher =
work page 2024
-
[75]
In-context Learning and Induction Heads
In-context learning and induction heads , author=. arXiv preprint arXiv:2209.11895 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[76]
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models , author=. 2024 , eprint=
work page 2024
-
[77]
Advances in neural information processing systems , volume=
Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
-
[78]
Andrej Karpathy , title =. GitHub repository , url =. 2022 , publisher =
work page 2022
-
[79]
Advances in neural information processing systems , volume=
Attention is all you need , author=. Advances in neural information processing systems , volume=
-
[80]
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Grokking: Generalization beyond overfitting on small algorithmic datasets , author=. arXiv preprint arXiv:2201.02177 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[81]
Forty-first International Conference on Machine Learning , year=
MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation , author=. Forty-first International Conference on Machine Learning , year=
-
[82]
Available at SSRN 4526071 , year=
Ideas are dimes a dozen: Large language models for idea generation in innovation , author=. Available at SSRN 4526071 , year=
-
[83]
arXiv preprint arXiv:2403.09733 , year=
OverleafCopilot: Empowering Academic Writing in Overleaf with Large Language Models , author=. arXiv preprint arXiv:2403.09733 , year=
-
[84]
The Twelfth International Conference on Learning Representations , year=
Quality-Diversity through AI Feedback , author=. The Twelfth International Conference on Learning Representations , year=
- [85]
-
[86]
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision , author=. 2023 , eprint=
work page 2023
-
[87]
arXiv preprint arXiv:2405.15143 , year=
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models , author=. arXiv preprint arXiv:2405.15143 , year=
-
[88]
npj Computational Materials , volume=
Accelerating materials discovery using artificial intelligence, high performance computing and robotics , author=. npj Computational Materials , volume=. 2022 , publisher=
work page 2022
-
[89]
DiffiT: Diffusion Vision Transformers for Image Generation , author=. 2024 , eprint=
work page 2024
-
[90]
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-consistency improves chain of thought reasoning in language models , author=. arXiv preprint arXiv:2203.11171 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[91]
Highly accurate protein structure prediction with AlphaFold , author=. nature , volume=. 2021 , publisher=
work page 2021
-
[92]
Discovering faster matrix multiplication algorithms with reinforcement learning , author=. Nature , volume=. 2022 , publisher=
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.