The maximum reward gain under KL-regularized LM alignment is a Jeffreys divergence term, estimable as covariance from base samples, with best-of-N approaching the theoretical limit.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
VLA-ATTC equips VLA models with adaptive test-time compute via an uncertainty clutch and relative action critic, cutting failure rates by over 50% on LIBERO-LONG.
A multimodal diffusion model trained on synthetic data enhances low-resolution EBSD and corrupted polarized light data, achieving near full-resolution performance with only 25% EBSD data.
citing papers explorer
-
Theoretical Limits of Language Model Alignment
The maximum reward gain under KL-regularized LM alignment is a Jeffreys divergence term, estimable as covariance from base samples, with best-of-N approaching the theoretical limit.
-
VLA-ATTC: Adaptive Test-Time Compute for VLA Models with Relative Action Critic Model
VLA-ATTC equips VLA models with adaptive test-time compute via an uncertainty clutch and relative action critic, cutting failure rates by over 50% on LIBERO-LONG.
-
Multimodal Diffusion to Mutually Enhance Polarized Light and Low Resolution EBSD Data
A multimodal diffusion model trained on synthetic data enhances low-resolution EBSD and corrupted polarized light data, achieving near full-resolution performance with only 25% EBSD data.