The maximum reward gain under KL-regularized LM alignment is a Jeffreys divergence term, estimable as covariance from base samples, with best-of-N approaching the theoretical limit.
OuP Oxford, 1998
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
other 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
other 1polarities
unclear 1representative citing papers
A novel Bayesian copula-based model for joint multi-type spatio-temporal epidemic dynamics, with MCMC inference and validation on simulated data plus European meningococcal incidence records.
citing papers explorer
-
Theoretical Limits of Language Model Alignment
The maximum reward gain under KL-regularized LM alignment is a Jeffreys divergence term, estimable as covariance from base samples, with best-of-N approaching the theoretical limit.
-
Bayesian copula-based modelling for multi-type spatio-temporal epidemic data
A novel Bayesian copula-based model for joint multi-type spatio-temporal epidemic dynamics, with MCMC inference and validation on simulated data plus European meningococcal incidence records.