Optimizing generative ai by backpropagating language model feedback.Nature, 639(8055):609–616, 2025

Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Pan Lu, Zhi Huang, Carlos Guestrin, James Zou · 2025

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Steerable Instruction Following Coding Data Synthesis with Actor-Parametric Schema Co-Evolution

cs.SE · 2026-02-27 · unverdicted · novelty 7.0

IFCodeEvolve synthesizes coding data via actor-schema co-evolution with MCTS, boosting a 32B model's performance to match proprietary SOTA on instruction following.

Learning to Discover at Test Time

cs.LG · 2026-01-22 · unverdicted · novelty 7.0

TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.

PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

cs.AI · 2026-05-19 · unverdicted · novelty 6.0

PEEK maintains a constant-sized context map via a programmable cache policy to give LLM agents persistent orientation knowledge about recurring external contexts, yielding 6-34% gains and lower cost than prior prompt-learning methods.

citing papers explorer

Showing 3 of 3 citing papers.

Steerable Instruction Following Coding Data Synthesis with Actor-Parametric Schema Co-Evolution cs.SE · 2026-02-27 · unverdicted · none · ref 35
IFCodeEvolve synthesizes coding data via actor-schema co-evolution with MCTS, boosting a 32B model's performance to match proprietary SOTA on instruction following.
Learning to Discover at Test Time cs.LG · 2026-01-22 · unverdicted · none · ref 84
TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.
PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents cs.AI · 2026-05-19 · unverdicted · none · ref 49
PEEK maintains a constant-sized context map via a programmable cache policy to give LLM agents persistent orientation knowledge about recurring external contexts, yielding 6-34% gains and lower cost than prior prompt-learning methods.

Optimizing generative ai by backpropagating language model feedback.Nature, 639(8055):609–616, 2025

fields

years

verdicts

representative citing papers

citing papers explorer