A learnable continuous perturbation framework for LLM token prefixes via latent vector transformations, optimized through unbiased estimating equations, yields gains in out-of-domain performance.
(2026), A Statistical Framework for Alignment with Biased AI Feedback
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4roles
background 1polarities
unclear 1representative citing papers
Perturbing prefixes to semantic neighbors during training creates a hierarchical noise model that improves language model predictions on token sequences outside the training corpus support.
Kernel smoothing enables accurate low-variance value and gradient estimates for policy optimization in LLM reasoning under tight sampling constraints per prompt.
A statistical survey of RLHF for LLM alignment that connects preference learning and policy optimization to models like Bradley-Terry-Luce while reviewing methods, extensions, and open challenges.
citing papers explorer
-
Learning Perturbations to Extrapolate Your LLM
A learnable continuous perturbation framework for LLM token prefixes via latent vector transformations, optimized through unbiased estimating equations, yields gains in out-of-domain performance.
-
Perturbation is All You Need for Extrapolating Language Models
Perturbing prefixes to semantic neighbors during training creates a hierarchical noise model that improves language model predictions on token sequences outside the training corpus support.
-
Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning
Kernel smoothing enables accurate low-variance value and gradient estimates for policy optimization in LLM reasoning under tight sampling constraints per prompt.
-
Reinforcement Learning from Human Feedback: A Statistical Perspective
A statistical survey of RLHF for LLM alignment that connects preference learning and policy optimization to models like Bradley-Terry-Luce while reviewing methods, extensions, and open challenges.