Cross-Entropy Games and Frost Training

Arthur Renard; Cl\'ement Hongler; Franck Gabriel; Valentin Hartmann

arxiv: 2605.27701 · v1 · pith:HSYGU7KMnew · submitted 2026-05-26 · 💻 cs.AI

Cross-Entropy Games and Frost Training

Arthur Renard , Franck Gabriel , Valentin Hartmann , Cl\'ement Hongler This is my paper

classification 💻 cs.AI

keywords trainingfrostcross-entropygamesgradientmethodmodelused

0 comments

read the original abstract

We present Frost Training, a method for improving Monte Carlo-based policy optimization for a large family of LLM-as-a-judge tasks called Cross-Entropy Games. The key idea is to exploit the gradient of the reward function in embedding space. This signal is used in the Greedy Coordinate Gradient (GCG) jailbreaking technique; we demonstrate for the first time that it can also be used to boost model training. We validate our method using GRPO training for maximum-likelihood infilling. Frost Training improves the model's ability to generate high-scoring outputs, reaching higher maximum scores in a best-of-k setting, and does so at an increased speed.

This paper has not been read by Pith yet.

Cross-Entropy Games and Frost Training

discussion (0)