Preference poisoning against log-linear DPO reduces to a binary sparse approximation problem solved by lattice-reduction (BAL-A) and matching-pursuit (BMP-A) algorithms that carry recovery guarantees.
The method of paired comparisons , author=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
LVLMs show vocabulary hijacking by inert tokens that decode to hijacking anchors; HABI locates them, NHAR finds resilient heads, and HAVAE boosts those heads to cut hallucinations.
citing papers explorer
-
Efficient Preference Poisoning Attack on Offline RLHF
Preference poisoning against log-linear DPO reduces to a binary sparse approximation problem solved by lattice-reduction (BAL-A) and matching-pursuit (BMP-A) algorithms that carry recovery guarantees.
-
Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination
LVLMs show vocabulary hijacking by inert tokens that decode to hijacking anchors; HABI locates them, NHAR finds resilient heads, and HAVAE boosts those heads to cut hallucinations.