vsOED uses a variational one-point reward and RL policy optimization to provide a lower bound on expected information gain for sequential experimental design, supporting nuisance parameters, implicit likelihoods, and multiple design goals.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
LLM inference should be reframed and evaluated as energy-to-token production with a Token Production Function that accounts for power, cooling, and efficiency ceilings.
citing papers explorer
-
Variational Sequential Optimal Experimental Design using Reinforcement Learning
vsOED uses a variational one-point reward and RL policy optimization to provide a lower bound on expected information gain for sequential experimental design, supporting nuisance parameters, implicit likelihoods, and multiple design goals.
-
Position: LLM Inference Should Be Evaluated as Energy-to-Token Production
LLM inference should be reframed and evaluated as energy-to-token production with a Token Production Function that accounts for power, cooling, and efficiency ceilings.