RL on binary rewards boosts LLM factual recall by ~27% relative across models by redistributing probability mass to latent correct answers rather than acquiring new knowledge.
Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Introduces a Bayesian framework viewing LLM prompts as textual parameters and proposes MHLP, a novel MCMC algorithm using LLM proposals, to perform inference and improve accuracy plus uncertainty quantification on benchmarks.
Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.
citing papers explorer
-
Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs
RL on binary rewards boosts LLM factual recall by ~27% relative across models by redistributing probability mass to latent correct answers rather than acquiring new knowledge.
-
Textual Bayes: Quantifying Prompt Uncertainty in LLM-Based Systems
Introduces a Bayesian framework viewing LLM prompts as textual parameters and proposes MHLP, a novel MCMC algorithm using LLM proposals, to perform inference and improve accuracy plus uncertainty quantification on benchmarks.
-
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.