Wesley J Maddox, Pavel Izmailov, Timur Garipov, Dmitry P Vetrov, and Andrew Gordon Wilson

Madaan, L · 2024 · arXiv 2406.10229

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

The Art of Scaling Reinforcement Learning Compute for LLMs

cs.LG · 2025-10-15 · unverdicted · novelty 7.0

A 400k+ GPU-hour study shows RL scaling in LLMs follows predictable sigmoidal trajectories, with most design choices affecting efficiency rather than the performance asymptote, enabling accurate large-scale predictions via the ScaleRL recipe.

A Study of LLMs' Preferences for Libraries and Programming Languages

cs.SE · 2025-03-21 · unverdicted · novelty 6.0

Empirical study of eight LLMs finds overuse of popular libraries like NumPy in up to 45% of unnecessary cases and strong default preference for Python even when suboptimal.

LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws

cs.LG · 2025-02-17 · unverdicted · novelty 6.0

Pretraining data determines loss-to-loss scaling laws in LLMs, while model size, optimization, tokenizer, and architecture have limited impact.

A Tale of Two Variances: When Single-Seed Benchmarks Fail in Bayesian Deep Learning

cs.LG · 2026-04-25 · unverdicted · novelty 5.0

Single-seed CRPS estimates in limited-data BDL show high variance and peaks for heteroscedastic methods, with local variance correlating above 0.96 to single-seed error.

Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models

cs.AI · 2026-05-07

citing papers explorer

Showing 5 of 5 citing papers.

The Art of Scaling Reinforcement Learning Compute for LLMs cs.LG · 2025-10-15 · unverdicted · none · ref 11
A 400k+ GPU-hour study shows RL scaling in LLMs follows predictable sigmoidal trajectories, with most design choices affecting efficiency rather than the performance asymptote, enabling accurate large-scale predictions via the ScaleRL recipe.
A Study of LLMs' Preferences for Libraries and Programming Languages cs.SE · 2025-03-21 · unverdicted · none · ref 51
Empirical study of eight LLMs finds overuse of popular libraries like NumPy in up to 45% of unnecessary cases and strong default preference for Python even when suboptimal.
LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws cs.LG · 2025-02-17 · unverdicted · none · ref 29
Pretraining data determines loss-to-loss scaling laws in LLMs, while model size, optimization, tokenizer, and architecture have limited impact.
A Tale of Two Variances: When Single-Seed Benchmarks Fail in Bayesian Deep Learning cs.LG · 2026-04-25 · unverdicted · none · ref 12
Single-seed CRPS estimates in limited-data BDL show high variance and peaks for heteroscedastic methods, with local variance correlating above 0.96 to single-seed error.
Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models cs.AI · 2026-05-07 · unreviewed · ref 13

Wesley J Maddox, Pavel Izmailov, Timur Garipov, Dmitry P Vetrov, and Andrew Gordon Wilson

fields

years

verdicts

representative citing papers

citing papers explorer