Under review

10 Preprint · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Beyond Semantic Manipulation: Token-Space Attacks on Reward Models

cs.LG · 2026-04-03 · unverdicted · novelty 7.0

TOMPA performs black-box adversarial optimization in token space to discover non-linguistic patterns that nearly double the reward scores of GPT-5 answers on Skywork-Reward-V2 while producing gibberish text.

citing papers explorer

Showing 1 of 1 citing paper.

Beyond Semantic Manipulation: Token-Space Attacks on Reward Models cs.LG · 2026-04-03 · unverdicted · none · ref 11
TOMPA performs black-box adversarial optimization in token space to discover non-linguistic patterns that nearly double the reward scores of GPT-5 answers on Skywork-Reward-V2 while producing gibberish text.

Under review

fields

years

verdicts

representative citing papers

citing papers explorer