Risk-sensitive preference games using convex risk measures produce policies that are robust across data strata and match or exceed standard Nash learning performance without added cost.
Journal of machine learning research , volume=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
The survey unifies extensions of PAC-Bayesian theory to data-dependent sets, geometric and topological complexity measures of optimization trajectories, and stability replacements for information terms into one template inequality with comparative evaluation.
citing papers explorer
-
Structure from Strategic Interaction & Uncertainty: Risk Sensitive Games for Robust Preference Learning
Risk-sensitive preference games using convex risk measures produce policies that are robust across data strata and match or exceed standard Nash learning performance without added cost.
-
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
-
A Survey on Data-Dependent Worst-Case Generalization Bounds
The survey unifies extensions of PAC-Bayesian theory to data-dependent sets, geometric and topological complexity measures of optimization trajectories, and stability replacements for information terms into one template inequality with comparative evaluation.
- A Unified Theory of Conditional Coverage in Conformal Prediction with Applications
- Statistical Consistency and Generalization of Contrastive Representation Learning