arxiv: 2408.00724 · v3 · submitted 2024-08-01 · 💻 cs.AI

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

Yangzhen Wu , Zhiqing Sun , Shanda Li , Sean Welleck , Yiming Yang This is my paper

Pith reviewed 2026-05-18 06:33 UTC · model grok-4.3

classification 💻 cs.AI

keywords inference scaling lawscompute-optimal inferencetest-time computetree searchlanguage modelsMATH benchmarkmodel size trade-offs

0 comments

The pith

Scaling inference compute with advanced strategies can outperform scaling model size for language models on math problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines trade-offs between using larger language models and spending more compute on inference strategies such as voting or tree search. It measures performance on the MATH benchmark while tracking total compute cost across model sizes from 7B to 34B parameters. The central result is that for the same compute budget, pairing a smaller model with a sophisticated inference algorithm often yields higher accuracy than a larger model using simpler decoding. This holds because generating additional tokens through search can resolve errors that extra parameters alone do not fix. If the pattern is general, it means future performance gains may come more from inference design than from ever-larger training runs.

Core claim

Scaling inference compute with inference strategies can be more computationally efficient than scaling model parameters. Smaller models combined with advanced inference algorithms offer Pareto-optimal trade-offs in cost and performance. For example, the Llemma-7B model, when paired with our novel tree search algorithm, consistently outperforms the Llemma-34B model across all tested inference strategies on the MATH benchmark.

What carries the argument

Empirical cost-performance curves comparing inference strategies (greedy search, majority voting, best-of-n, weighted voting, and two tree search algorithms) across model sizes and total token budgets on the MATH benchmark.

If this is right

For a fixed compute budget, allocating more operations to inference steps on a smaller model produces higher accuracy than using those operations to run a larger model.
Tree search algorithms create better cost-performance frontiers than voting or greedy methods across the tested range.
There exist model-plus-strategy pairs that dominate others in the accuracy-versus-compute plane on MATH.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Model developers might gain more by designing architectures that support efficient long-horizon search than by maximizing parameter count alone.
The same inference-scaling pattern could appear on other reasoning benchmarks if the underlying error-correction mechanism is not MATH-specific.
Hardware systems optimized for variable-length tree search rather than fixed batch inference could unlock further efficiency.

Load-bearing premise

The measured cost and accuracy differences arise mainly from model size and inference strategy rather than from unmeasured details of prompts, formatting, or benchmark artifacts.

What would settle it

Re-running the same model sizes and strategies on MATH while equalizing total floating-point operations shows the 34B model with basic inference matching or exceeding the 7B model with tree search.

read the original abstract

While the scaling laws of large language models (LLMs) training have been extensively studied, optimal inference configurations of LLMs remain underexplored. We study inference scaling laws (aka test-time scaling laws) and compute-optimal inference, focusing on the trade-offs between model sizes and generating additional tokens with different inference strategies. As a first step towards understanding and designing compute-optimal inference methods, we studied cost-performance trade-offs for inference strategies such as greedy search, majority voting, best-of-$n$, weighted voting, and two different tree search algorithms, using different model sizes and compute budgets. Our findings suggest that scaling inference compute with inference strategies can be more computationally efficient than scaling model parameters. Additionally, smaller models combined with advanced inference algorithms offer Pareto-optimal trade-offs in cost and performance. For example, the Llemma-7B model, when paired with our novel tree search algorithm, consistently outperforms the Llemma-34B model across all tested inference strategies on the MATH benchmark. We hope these insights contribute to a deeper understanding of inference scaling laws (test-time scaling laws) for LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper empirically studies inference scaling laws for LLMs on mathematical problem-solving, comparing inference strategies (greedy search, majority voting, best-of-n, weighted voting, and two tree search algorithms) across model sizes and compute budgets on the MATH benchmark. It claims that scaling inference compute via advanced strategies is more efficient than scaling model parameters, with smaller models like Llemma-7B plus a novel tree search algorithm offering Pareto-superior cost-performance trade-offs over larger models like Llemma-34B.

Significance. If the empirical comparisons hold under fair compute accounting, the results would indicate that inference-time optimization can substitute for larger model sizes in some settings, providing practical guidance for efficient LLM deployment and highlighting the value of test-time scaling laws as a complement to training scaling laws.

major comments (2)

The central claim that Llemma-7B with the novel tree search outperforms Llemma-34B (and offers better efficiency) depends on equivalent total compute across conditions. Tree search requires multiple forward passes, branching, and backtracking; if cost is measured only in tokens or wall-clock time without explicit FLOPs or model-call normalization that holds the budget constant, the reported Pareto dominance may be an artifact of unequal effective compute rather than strategy superiority.
The abstract and reported comparisons do not specify an explicit FLOPs or model-call budget held constant across model sizes and strategies. Without this, it is unclear whether the measured performance differences arise from inference strategy efficiency or from unaccounted differences in total computation.

minor comments (2)

Add details on statistical controls, variance across runs, and exact compute accounting (including how tree search calls are tallied) to strengthen the support for the efficiency claims.
Clarify the precise definition and implementation of the novel tree search algorithm, including any hyperparameters that affect compute usage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback emphasizing the need for transparent and equivalent compute accounting across model sizes and inference strategies. We agree that this is critical for interpreting the efficiency claims. Below we respond to each major comment and outline the revisions we will make to strengthen the presentation.

read point-by-point responses

Referee: The central claim that Llemma-7B with the novel tree search outperforms Llemma-34B (and offers better efficiency) depends on equivalent total compute across conditions. Tree search requires multiple forward passes, branching, and backtracking; if cost is measured only in tokens or wall-clock time without explicit FLOPs or model-call normalization that holds the budget constant, the reported Pareto dominance may be an artifact of unequal effective compute rather than strategy superiority.

Authors: We agree that fair and explicit compute normalization is necessary to support the efficiency comparisons. In the experiments, we held the inference compute budget constant by fixing the total number of tokens generated (or equivalently the number of model forward passes) for each strategy under each budget level, with tree search explicitly counting all tokens from branching and backtracking. Because the primary comparisons for Pareto dominance are performed within the same model size before contrasting across sizes, the token-based budget provides a consistent measure. That said, we acknowledge that an explicit statement of this normalization (including its relation to FLOPs) would remove any ambiguity. We will add a dedicated paragraph in the methods section and update the figure captions to detail the exact model-call counting procedure. revision: yes
Referee: The abstract and reported comparisons do not specify an explicit FLOPs or model-call budget held constant across model sizes and strategies. Without this, it is unclear whether the measured performance differences arise from inference strategy efficiency or from unaccounted differences in total computation.

Authors: The manuscript states that experiments were conducted across different model sizes and compute budgets, but we accept that neither the abstract nor the main text currently provides an explicit definition of the budget in FLOPs or normalized model calls. We will revise the abstract to include a concise statement that all comparisons are performed under matched total inference compute (measured in tokens generated / model calls) and add a short subsection describing the normalization, confirming that tree-search costs are fully included and that cross-model comparisons respect the differing per-token FLOPs of each model size. revision: yes

Circularity Check

0 steps flagged

Purely empirical study with no derivation chain or self-referential reductions

full rationale

The paper conducts an empirical comparison of inference strategies (greedy search, majority voting, best-of-n, weighted voting, and tree search) across model sizes and compute budgets on the MATH benchmark. No mathematical derivations, first-principles predictions, or equations are presented that could reduce to fitted inputs or self-citations. Claims rest on observed performance and measured costs rather than any self-definitional or load-bearing self-referential steps. The analysis is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical benchmarking study with no theoretical free parameters, axioms, or invented entities; all claims rest on observed performance numbers from standard models and a public benchmark.

pith-pipeline@v0.9.0 · 5737 in / 1085 out tokens · 39110 ms · 2026-05-18T06:33:56.974343+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Foundation.HierarchyEmergence hierarchy_emergence_forces_phi unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our findings suggest that scaling inference compute with inference strategies can be more computationally efficient than scaling model parameters. Additionally, smaller models combined with advanced inference algorithms offer Pareto-optimal trade-offs in cost and performance.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 26 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models
cs.LG 2026-05 unverdicted novelty 7.0

Agentic program search over frozen embedding APIs yields a parameter-free inference algebra—a softmax-weighted centroid of top-K documents interpolated with the query—that lifts nDCG@10 across seven model families on ...
Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models
cs.LG 2026-05 unverdicted novelty 7.0

A softmax-weighted centroid of the local top-K documents interpolated with the query improves nDCG@10 for frozen embedding models across seven families on held-out BEIR data.
Beyond Static Best-of-N: Bayesian List-wise Alignment for LLM-based Recommendation
cs.IR 2026-05 conditional novelty 7.0

BLADE uses Bayesian list-wise alignment with dynamic estimation to create a self-evolving target that overcomes limitations of static references in LLM-based recommendation, yielding sustained gains in ranking and com...
POSTCONDBENCH: Benchmarking Correctness and Completeness in Formal Postcondition Inference
cs.SE 2026-05 unverdicted novelty 7.0

POSTCONDBENCH is a new multilingual benchmark that evaluates LLM postcondition generation on real code using defect discrimination to assess completeness beyond surface matching.
Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning
cs.CL 2026-04 unverdicted novelty 7.0

CoT-PoT ensembling achieves self-consistency accuracy in LLMs with only two samples for 78.6% of tasks, reducing computation by 9.3x compared to standard methods.
ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling
cs.AI 2025-10 unverdicted novelty 7.0

ToolPRM provides fine-grained intra-call process supervision via a new dataset and reward model, outperforming outcome and coarse-grained alternatives on function-calling benchmarks.
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
cs.CL 2025-03 unverdicted novelty 7.0

LCPO trains L1 reasoning models to adhere to prompt-specified CoT lengths, supporting accuracy-compute trade-offs and yielding short reasoning models that outperform larger baselines at matched lengths.
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
cs.CL 2024-12 unverdicted novelty 7.0

o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
OpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregation
cs.AI 2026-05 unverdicted novelty 6.0

OpenDeepThink improves LLM reasoning by ranking parallel candidate traces via Bradley-Terry aggregation of LLM pairwise judgments, achieving a +405 Codeforces Elo gain on Gemini 3.1 Pro after eight rounds.
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
cs.LG 2026-05 unverdicted novelty 6.0

DECO sparse MoE matches dense Transformer performance at 20% expert activation with a 3x hardware inference speedup.
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
cs.LG 2026-05 conditional novelty 6.0

DECO matches dense model performance at 20% expert activation via ReLU-based routing with learnable scaling and the NormSiLU activation, plus a 3x real-hardware speedup.
When Less is Enough: Efficient Inference via Collaborative Reasoning
cs.LG 2026-05 conditional novelty 6.0

A large model generates a compact reasoning signal that a small model uses to solve tasks, reducing the large model's output tokens by up to 60% on benchmarks like AIME and GPQA.
Evaluation-driven Scaling for Scientific Discovery
cs.LG 2026-04 unverdicted novelty 6.0

SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster ...
LACE: Lattice Attention for Cross-thread Exploration
cs.AI 2026-04 unverdicted novelty 6.0

LACE enables parallel reasoning paths in LLMs to communicate via lattice attention and error-correct using synthetic training data, improving accuracy by over 7 points over standard parallel search.
Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning
cs.AI 2025-10 unverdicted novelty 6.0

UCAS refines RLVR advantage signals with a logit-space self-confidence proxy for response-level modulation and asymmetric token-level penalties based on raw logit certainty to boost exploration and reduce entropy collapse.
Entropy After </Think> for reasoning model early exiting
cs.LG 2025-09 unverdicted novelty 6.0

Entropy After </Think> (EAT) enables early exiting in reasoning LLMs by tracking entropy stabilization after a </think> token, cutting token use 12-22% on MATH500 and AIME2025 with no accuracy loss.
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
cs.AI 2025-09 unverdicted novelty 6.0

DeepSearch embeds MCTS into RLVR training with global frontier selection, entropy guidance, and adaptive replay to achieve 62.95% average accuracy on math reasoning benchmarks while using 5.7x fewer GPU hours than ext...
Muon is Scalable for LLM Training
cs.LG 2025-02 unverdicted novelty 6.0

Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling
cs.AI 2026-05 unverdicted novelty 5.0

Multi-agent debate and mixture-of-agents outperform self-consistency by 1.3 and 2.7 percentage points respectively at equal compute budgets on MMLU-Pro and BBH, with advantages that continue at higher scales while sel...
Physical Foundation Models: Fixed hardware implementations of large-scale neural networks
cs.LG 2026-04 unverdicted novelty 5.0

Physical Foundation Models are fixed physical hardware realizations of foundation-scale neural networks that compute via inherent material dynamics, potentially delivering orders-of-magnitude gains in energy efficienc...
Understanding Inference-Time Token Allocation and Coverage Limits in Agentic Hardware Verification
cs.AR 2026-04 unverdicted novelty 5.0

Domain-specialized LLM agents for hardware verification close 95-99% coverage using 4-13x fewer tokens and 2-4x faster convergence than general-purpose agents by reallocating tokens toward coverage-directed reasoning.
LACE: Lattice Attention for Cross-thread Exploration
cs.AI 2026-04 unverdicted novelty 5.0

LACE enables concurrent reasoning paths in LLMs to interact via lattice attention and a synthetic training pipeline, raising accuracy more than 7 points over independent parallel search.
LACE: Lattice Attention for Cross-thread Exploration
cs.AI 2026-04 unverdicted novelty 5.0

LACE adds lattice attention to let parallel LLM reasoning threads interact and correct errors, raising accuracy over 7 points versus standard independent sampling.
Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models
cs.LG 2025-10 unverdicted novelty 5.0

GenCluster scales test-time compute via large-scale generation, behavioral clustering, ranking, and round-robin submission to achieve IOI gold medal performance with the open-weight gpt-oss-120b model.
Video Parallel Scaling: Aggregating Diverse Frame Subsets for VideoLLMs
cs.CV 2025-09 unverdicted novelty 5.0

Video Parallel Scaling improves VideoLLM performance by aggregating outputs from parallel inferences on complementary disjoint frame subsets, effectively contracting the Chinchilla scaling law via uncorrelated visual ...
Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding
cs.AI 2026-05 unverdicted novelty 3.0

Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.

Reference graph

Works this paper leans on

299 extracted references · 299 canonical work pages · cited by 22 Pith papers · 71 internal anchors

[1]

Making Language Models Better Reasoners with Step-Aware Verifier

Li, Yifei and Lin, Zeqi and Zhang, Shizhuo and Fu, Qiang and Chen, Bei and Lou, Jian-Guang and Chen, Weizhu. Making Language Models Better Reasoners with Step-Aware Verifier. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.291

work page doi:10.18653/v1/2023.acl-long.291 2023
[2]

Scaling Learning Algorithms Towards

Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

work page
[3]

and Osindero, Simon and Teh, Yee Whye , journal =

Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

work page
[4]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

work page 2016
[5]

Principle-driven self-alignment of language models from scratch with minimal human supervision

Principle-driven self-alignment of language models from scratch with minimal human supervision , author=. arXiv preprint arXiv:2305.03047 , year=

work page arXiv
[6]

Hashimoto , title =

Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =. GitHub repository , howpublished =. 2023 , publisher =

work page 2023
[7]

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , journal=

work page
[8]

Liu, Yinhan and Ott, Myle and Goyal, Naman and Du, Jingfei and Joshi, Mandar and Chen, Danqi and Levy, Omer and Lewis, Mike and Zettlemoyer, Luke and Stoyanov, Veselin , journal=

work page
[9]

Advances in Neural Information Processing Systems , editor=

Chain of Thought Prompting Elicits Reasoning in Large Language Models , author=. Advances in Neural Information Processing Systems , editor=. 2022 , url=

work page 2022
[10]

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Exploring the limits of transfer learning with a unified text-to-text transformer , author=. arXiv preprint arXiv:1910.10683 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1910
[11]

TMLR , year=

Emergent abilities of large language models , author=. TMLR , year=

work page
[12]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models , author=. arXiv preprint arXiv:2206.04615 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[13]

arXiv preprint arXiv:2304.01196 , year=

Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data , author=. arXiv preprint arXiv:2304.01196 , year=

work page arXiv
[14]

Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback , author=. arXiv preprint arXiv:2203.02155 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu an...

work page 2019
[16]

and Salakhutdinov, Ruslan , journal=

Dai, Zihang and Yang, Zhilin and Yang, Yiming and Carbonell, Jaime and Le, Quoc V. and Salakhutdinov, Ruslan , journal=. Transformer-

work page
[17]

Adam: A Method for Stochastic Optimization

Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[18]

The Twelfth International Conference on Learning Representations , year=

Think before you speak: Training Language Models With Pause Tokens , author=. The Twelfth International Conference on Learning Representations , year=

work page
[19]

Bowman , booktitle=

Jacob Pfau and William Merrill and Samuel R. Bowman , booktitle=. Let. 2024 , url=

work page 2024
[20]

Neural networks: Tricks of the trade , pages=

Efficient backprop , author=. Neural networks: Tricks of the trade , pages=. 2012 , publisher=

work page 2012
[21]

Layer Normalization

Layer normalization , author=. arXiv preprint arXiv:1607.06450 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Proceedings of the national academy of sciences , volume=

Overcoming catastrophic forgetting in neural networks , author=. Proceedings of the national academy of sciences , volume=. 2017 , publisher=

work page 2017
[23]

NeurIPS , year=

Attention is all you need , author=. NeurIPS , year=

work page
[24]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[25]

Principles and procedures of statistics

Principles and procedures of statistics , author=. Principles and procedures of statistics. , year=

work page
[26]

arXiv preprint arXiv:2202.08137 , year=

A data-driven approach for learning to control computers , author=. arXiv preprint arXiv:2202.08137 , year=

work page arXiv
[27]

Advances in Neural Information Processing Systems , volume=

Generative adversarial imitation learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[28]

2018 IEEE international conference on robotics and automation (ICRA) , pages=

End-to-end driving via conditional imitation learning , author=. 2018 IEEE international conference on robotics and automation (ICRA) , pages=. 2018 , organization=

work page 2018
[29]

End to End Learning for Self-Driving Cars

End to end learning for self-driving cars , author=. arXiv preprint arXiv:1604.07316 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[30]

Computing “

Coulom, R. Computing “. ICGA journal , volume=. 2007 , publisher=

work page 2007
[31]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Deep q-learning from demonstrations , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[32]

Control of memory, active perception, and action in

Oh, Junhyuk and Chockalingam, Valliappa and Lee, Honglak and others , booktitle=. Control of memory, active perception, and action in. 2016 , organization=

work page 2016
[33]

Multi-task curriculum learning in a complex, visual, hard-exploration domain:

Kanitscheider, Ingmar and Huizinga, Joost and Farhi, David and Guss, William Hebgen and Houghton, Brandon and Sampedro, Raul and Zhokhov, Peter and Baker, Bowen and Ecoffet, Adrien and Tang, Jie and others , journal=. Multi-task curriculum learning in a complex, visual, hard-exploration domain:

work page
[34]

Sample efficient reinforcement learning through learning from demonstrations in

Scheller, Christian and Schraner, Yanick and Vogel, Manfred , booktitle=. Sample efficient reinforcement learning through learning from demonstrations in. 2020 , organization=

work page 2020
[35]

Guss, William H and Houghton, Brandon and Topin, Nicholay and Wang, Phillip and Codel, Cayden and Veloso, Manuela and Salakhutdinov, Ruslan , journal=. Mine

work page
[36]

A deep hierarchical approach to lifelong learning in

Tessler, Chen and Givony, Shahar and Zahavy, Tom and Mankowitz, Daniel and Mannor, Shie , booktitle=. A deep hierarchical approach to lifelong learning in

work page
[37]

Most Played Games in 2021, Ranked by Peak Concurrent Players , journal =

Twinfinite Staff , date =. Most Played Games in 2021, Ranked by Peak Concurrent Players , journal =

work page 2021
[38]

Exploration by Random Network Distillation

Exploration by random network distillation , author=. arXiv preprint arXiv:1810.12894 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[39]

Advances in Neural Information Processing Systems , volume=

Unifying count-based exploration and intrinsic motivation , author=. Advances in Neural Information Processing Systems , volume=

work page
[40]

Nature , volume=

First return, then explore , author=. Nature , volume=. 2021 , publisher=

work page 2021
[41]

2018 , publisher=

Reinforcement learning: An introduction , author=. 2018 , publisher=

work page 2018
[42]

Human-level performance in 3

Jaderberg, Max and Czarnecki, Wojciech M and Dunning, Iain and Marris, Luke and Lever, Guy and Castaneda, Antonio Garcia and Beattie, Charles and Rabinowitz, Neil C and Morcos, Ari S and Ruderman, Avraham and others , journal=. Human-level performance in 3. 2019 , publisher=

work page 2019
[43]

Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages=

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? , author=. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages=

work page 2021
[44]

Advances in Neural Information Processing Systems , volume=

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , author=. Advances in Neural Information Processing Systems , volume=

work page
[45]

Advances in Neural Information Processing Systems , volume=

How transferable are features in deep neural networks? , author=. Advances in Neural Information Processing Systems , volume=

work page
[46]

arXiv preprint arXiv:1909.07528 , year=

Emergent tool use from multi-agent autocurricula , author=. arXiv preprint arXiv:1909.07528 , year=

work page arXiv 1909
[47]

Dota 2 with Large Scale Deep Reinforcement Learning

Dota 2 with large scale deep reinforcement learning , author=. arXiv preprint arXiv:1912.06680 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1912
[48]

arXiv preprint arXiv:2107.12808 , year=

Open-ended learning leads to generally capable agents , author=. arXiv preprint arXiv:2107.12808 , year=

work page arXiv
[49]

International Conference on Machine Learning , pages=

Learning transferable visual models from natural language supervision , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[50]

Hierarchical text-conditional image generation with

Ramesh, Aditya and Dhariwal, Prafulla and Nichol, Alex and Chu, Casey and Chen, Mark , journal=. Hierarchical text-conditional image generation with

work page
[51]

IEEE Robotics and Automation Letters , volume=

A machine learning approach to visual perception of forest trails for mobile robots , author=. IEEE Robotics and Automation Letters , volume=. 2015 , publisher=

work page 2015
[52]

Machine Learning Proceedings 1992 , pages=

Learning to fly , author=. Machine Learning Proceedings 1992 , pages=. 1992 , publisher=

work page 1992
[53]

ACM Computing Surveys (CSUR) , volume=

Imitation learning: A survey of learning methods , author=. ACM Computing Surveys (CSUR) , volume=. 2017 , publisher=

work page 2017
[54]

2019 International Conference on Robotics and Automation (ICRA) , pages=

Learning from demonstration in the wild , author=. 2019 International Conference on Robotics and Automation (ICRA) , pages=. 2019 , organization=

work page 2019
[55]

International conference on machine learning , pages=

Imitating latent policies from observation , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019
[56]

2018 IEEE International Conference on Robotics and Automation (ICRA) , pages=

Imitation from observation: Learning to imitate behaviors from raw video via context translation , author=. 2018 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2018 , organization=

work page 2018
[57]

Mastering the game of

Silver, David and Huang, Aja and Maddison, Chris J and Guez, Arthur and Sifre, Laurent and Van Den Driessche, George and Schrittwieser, Julian and Antonoglou, Ioannis and Panneershelvam, Veda and Lanctot, Marc and others , journal=. Mastering the game of. 2016 , publisher=

work page 2016
[58]

CoRR , volume =

Xiaohua Zhai and Alexander Kolesnikov and Neil Houlsby and Lucas Beyer , title =. CoRR , volume =. 2021 , url =. 2106.04560 , timestamp =

work page arXiv 2021
[59]

Nature Machine Intelligence , volume=

Biological underpinnings for lifelong learning machines , author=. Nature Machine Intelligence , volume=. 2022 , publisher=

work page 2022
[60]

Proceedings of the European conference on computer vision (ECCV) , pages=

Exploring the limits of weakly supervised pretraining , author=. Proceedings of the European conference on computer vision (ECCV) , pages=

work page
[61]

Grandmaster level in

Vinyals, Oriol and Babuschkin, Igor and Czarnecki, Wojciech M and Mathieu, Micha. Grandmaster level in. Nature , volume=. 2019 , publisher=

work page 2019
[62]

Advances in Neural Information Processing Systems , volume=

Language models are few-shot learners , author=. Advances in Neural Information Processing Systems , volume=

work page
[63]

On the Opportunities and Risks of Foundation Models

On the opportunities and risks of foundation models , author=. arXiv preprint arXiv:2108.07258 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[64]

Robotics and autonomous systems , volume=

A survey of robot learning from demonstration , author=. Robotics and autonomous systems , volume=. 2009 , publisher=

work page 2009
[65]

Trends in cognitive sciences , volume=

Is imitation learning the route to humanoid robots? , author=. Trends in cognitive sciences , volume=. 1999 , publisher=

work page 1999
[66]

, author=

Algorithms for inverse reinforcement learning. , author=. Icml , volume=

work page
[67]

Advances in Neural Information Processing Systems , volume=

Alvinn: An autonomous land vehicle in a neural network , author=. Advances in Neural Information Processing Systems , volume=

work page
[68]

2018 IEEE international conference on robotics and automation (ICRA) , pages=

Time-contrastive networks: Self-supervised learning from video , author=. 2018 IEEE international conference on robotics and automation (ICRA) , pages=. 2018 , organization=

work page 2018
[69]

2015 IEEE International Conference on Robotics and Automation (ICRA) , pages=

Learning inverse dynamics models with contacts , author=. 2015 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2015 , organization=

work page 2015
[70]

Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model

Transfer from simulation to real world through learning deep inverse dynamics model , author=. arXiv preprint arXiv:1610.03518 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[71]

European symposium on artificial neural networks , number=

Learning inverse dynamics: a comparison , author=. European symposium on artificial neural networks , number=

work page
[72]

2018 , publisher=

Peng, Xue Bin and Kanazawa, Angjoo and Malik, Jitendra and Abbeel, Pieter and Levine, Sergey , journal=. 2018 , publisher=

work page 2018
[73]

Recent Advances in Imitation Learning from Observation

Recent advances in imitation learning from observation , author=. arXiv preprint arXiv:1905.13566 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1905
[74]

Behavioral Cloning from Observation

Behavioral cloning from observation , author=. arXiv preprint arXiv:1805.01954 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[75]

Playing hard exploration games by watching

Aytar, Yusuf and Pfaff, Tobias and Budden, David and Paine, Thomas and Wang, Ziyu and De Freitas, Nando , journal=. Playing hard exploration games by watching

work page
[76]

International Conference on Machine Learning , pages=

Agent57: Outperforming the atari human benchmark , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020
[77]

Text and Code Embeddings by Contrastive Pre-Training

Text and Code Embeddings by Contrastive Pre-Training , author=. arXiv preprint arXiv:2201.10005 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[78]

2016 , publisher=

Automatic speech recognition , author=. 2016 , publisher=

work page 2016
[79]

Advances in Neural Information Processing Systems , volume=

Hindsight experience replay , author=. Advances in Neural Information Processing Systems , volume=

work page
[80]

International conference on machine learning , pages=

Universal value function approximators , author=. International conference on machine learning , pages=. 2015 , organization=

work page 2015

Showing first 80 references.