org/abs/1703.06748

· 2017 · arXiv 1703.06748

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Efficient Preference Poisoning Attack on Offline RLHF

cs.LG · 2026-05-04 · unverdicted · novelty 7.0

Preference poisoning against log-linear DPO reduces to a binary sparse approximation problem solved by lattice-reduction (BAL-A) and matching-pursuit (BMP-A) algorithms that carry recovery guarantees.

RoAd-RL: A Unified Library and Benchmark for Robust Adversarial Reinforcement Learning

cs.LG · 2026-06-29 · conditional · novelty 6.0

RoAd-RL is a new benchmarking library for adversarial reinforcement learning that evaluates DQN, PPO, and SAC agents across 192 attack-defense configurations and finds substantial robustness variations plus cases where defenses harm performance more than attacks.

TRAP: Tail-aware Ranking Attack for World-Model Planning

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

TRAP is a tail-aware ranking attack that plants a backdoor in world models so that a trigger causes the model to reorder a few critical imagined trajectories and redirect planning while preserving normal behavior on clean inputs.

How Adversarial Environments Mislead Agentic AI?

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

Adversarial compromise of tool outputs misleads agentic AI via breadth and depth attacks, revealing that epistemic and navigational robustness are distinct and often trade off against each other.

Scaling Laws for Reward Model Overoptimization

cs.LG · 2022-10-19 · unverdicted · novelty 6.0

Synthetic measurements show that gold-standard performance degrades according to distinct functional forms when optimizing proxy reward models via RL or best-of-n, with coefficients scaling smoothly by reward model parameter count.

Safe-RULE: Safe Reinforcement UnLEarning

cs.LG · 2026-06-08 · unverdicted · novelty 5.0

Safe-RULE introduces a reinforcement unlearning defense for offline safe RL that counters data poisoning by removing malicious data influence while preserving task performance and safety.

citing papers explorer

Showing 6 of 6 citing papers.

Efficient Preference Poisoning Attack on Offline RLHF cs.LG · 2026-05-04 · unverdicted · none · ref 126
Preference poisoning against log-linear DPO reduces to a binary sparse approximation problem solved by lattice-reduction (BAL-A) and matching-pursuit (BMP-A) algorithms that carry recovery guarantees.
RoAd-RL: A Unified Library and Benchmark for Robust Adversarial Reinforcement Learning cs.LG · 2026-06-29 · conditional · none · ref 20
RoAd-RL is a new benchmarking library for adversarial reinforcement learning that evaluates DQN, PPO, and SAC agents across 192 attack-defense configurations and finds substantial robustness variations plus cases where defenses harm performance more than attacks.
TRAP: Tail-aware Ranking Attack for World-Model Planning cs.LG · 2026-05-03 · unverdicted · none · ref 32
TRAP is a tail-aware ranking attack that plants a backdoor in world models so that a trigger causes the model to reorder a few critical imagined trajectories and redirect planning while preserving normal behavior on clean inputs.
How Adversarial Environments Mislead Agentic AI? cs.AI · 2026-04-20 · unverdicted · none · ref 40
Adversarial compromise of tool outputs misleads agentic AI via breadth and depth attacks, revealing that epistemic and navigational robustness are distinct and often trade off against each other.
Scaling Laws for Reward Model Overoptimization cs.LG · 2022-10-19 · unverdicted · none · ref 19
Synthetic measurements show that gold-standard performance degrades according to distinct functional forms when optimizing proxy reward models via RL or best-of-n, with coefficients scaling smoothly by reward model parameter count.
Safe-RULE: Safe Reinforcement UnLEarning cs.LG · 2026-06-08 · unverdicted · none · ref 31
Safe-RULE introduces a reinforcement unlearning defense for offline safe RL that counters data poisoning by removing malicious data influence while preserving task performance and safety.

org/abs/1703.06748

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer