pith. sign in

Watermarking makes language models radioactive

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.CR 1 cs.LG 1

years

2026 1 2025 1

representative citing papers

Lossless Anti-Distillation Sampling

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.

citing papers explorer

Showing 2 of 2 citing papers.

  • RLCracker: Evaluating the Worst-Case Vulnerability of LLM Watermarks with Adaptive RL Attacks cs.CR · 2025-09-25 · conditional · none · ref 29

    RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.

  • Lossless Anti-Distillation Sampling cs.LG · 2026-05-12 · unverdicted · none · ref 115

    LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.