pith. sign in

Open the Cabinet and break the Window,

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.CL 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

On Safety Risks in Experience-Driven Self-Evolving Agents

cs.CL · 2026-04-18 · unverdicted · novelty 6.0

Benign-task experience in self-evolving agents degrades safety in high-risk scenarios by reinforcing execution over refusal, while mixed benign-harmful experience creates a safety-utility trade-off via over-refusal.

citing papers explorer

Showing 1 of 1 citing paper.

  • On Safety Risks in Experience-Driven Self-Evolving Agents cs.CL · 2026-04-18 · unverdicted · none · ref 7

    Benign-task experience in self-evolving agents degrades safety in high-risk scenarios by reinforcing execution over refusal, while mixed benign-harmful experience creates a safety-utility trade-off via over-refusal.