Preference poisoning against log-linear DPO reduces to a binary sparse approximation problem solved by lattice-reduction (BAL-A) and matching-pursuit (BMP-A) algorithms that carry recovery guarantees.
arXiv preprint arXiv:2402.01920 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
An off-Earth autonomy pathway can reduce AGI confrontation incentives by making early cooperation preferable to power-seeking on Earth.
citing papers explorer
-
Efficient Preference Poisoning Attack on Offline RLHF
Preference poisoning against log-linear DPO reduces to a binary sparse approximation problem solved by lattice-reduction (BAL-A) and matching-pursuit (BMP-A) algorithms that carry recovery guarantees.