Towards robust alignment of lan- guage models: Distributionally robustifying direct pref- erence optimization.arXiv preprint arXiv:2407.07880

Wu, J · arXiv 2407.07880

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Revisiting Robustness for LLM Safety Alignment via Selective Geometry Control

cs.LG · 2026-02-07 · unverdicted · novelty 5.0

ShaPO improves LLM safety robustness over standard preference optimization by enforcing worst-case objectives via selective geometry control at token and reward levels.

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training

cs.LG · 2026-05-11

Efficient Preference Poisoning Attack on Offline RLHF

cs.LG · 2026-05-04

citing papers explorer

Showing 3 of 3 citing papers.

Revisiting Robustness for LLM Safety Alignment via Selective Geometry Control cs.LG · 2026-02-07 · unverdicted · none · ref 25
ShaPO improves LLM safety robustness over standard preference optimization by enforcing worst-case objectives via selective geometry control at token and reward levels.
Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training cs.LG · 2026-05-11 · unreviewed · ref 45
Efficient Preference Poisoning Attack on Offline RLHF cs.LG · 2026-05-04 · unreviewed · ref 78

Towards robust alignment of lan- guage models: Distributionally robustifying direct pref- erence optimization.arXiv preprint arXiv:2407.07880

fields

years

verdicts

representative citing papers

citing papers explorer