pith. sign in

Universal Jailbreak Backdoors from Poisoned Human Feedback

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

years

2026 6 2025 2

roles

background 2

polarities

background 2

clear filters

representative citing papers

Efficient Preference Poisoning Attack on Offline RLHF

cs.LG · 2026-05-04 · unverdicted · novelty 7.0

Preference poisoning against log-linear DPO reduces to a binary sparse approximation problem solved by lattice-reduction (BAL-A) and matching-pursuit (BMP-A) algorithms that carry recovery guarantees.

A Security Analysis of the OpenClaw AI Agent Framework

cs.CR · 2026-03-29 · conditional · novelty 6.0 · 2 refs

Security analysis of OpenClaw reveals composable RCE paths from LLM tool calls, invalid closed-world assumptions in exec allowlists, and plugin-based attacks that bypass runtime policy.

ACE: A Security Architecture for LLM-Integrated App Systems

cs.CR · 2025-04-29 · unverdicted · novelty 6.0

ACE decouples planning into abstract and concrete phases with static information-flow verification and enforces execution barriers to secure LLM app systems against prompt injection and related attacks.

citing papers explorer

Showing 6 of 6 citing papers after filters.