As rθ tends to ±∞, the gradient will tend to zero since either (1 − σ(βz)) or σ(βz) will tend to zero

This gradient is simple to interpret: if y is desirable, then d(y) is negative, we push up the probability of πθ(y|x) to minimize the loss, if y is undesirable, then d(y) is positive, we push down the probability of πθ(y|x) to minimiz · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

KTO: Model Alignment as Prospect Theoretic Optimization

cs.LG · 2024-02-02 · conditional · novelty 7.0

KTO aligns LLMs by directly maximizing prospect-theoretic utility on binary signals and matches or exceeds preference-based methods like DPO from 1B to 30B parameters.

citing papers explorer

Showing 1 of 1 citing paper.

KTO: Model Alignment as Prospect Theoretic Optimization cs.LG · 2024-02-02 · conditional · none · ref 30
KTO aligns LLMs by directly maximizing prospect-theoretic utility on binary signals and matches or exceeds preference-based methods like DPO from 1B to 30B parameters.

As rθ tends to ±∞, the gradient will tend to zero since either (1 − σ(βz)) or σ(βz) will tend to zero

fields

years

verdicts

representative citing papers

citing papers explorer