pith. sign in

hub

On the reliability of watermarks for large language models

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

hub tools

citation-role summary

background 2

citation-polarity summary

roles

background 2

polarities

background 2

clear filters

representative citing papers

Dataset Watermarking for Closed LLMs with Provable Detection

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

A new watermarking method for closed LLMs boosts random word-pair co-occurrences via rephrasing and detects the signal statistically in outputs, working reliably even when the watermarked data is only 1% of fine-tuning tokens while preserving utility.

Watermarking Should Be Treated as a Monitoring Primitive

cs.CR · 2026-05-13 · conditional · novelty 6.0 · 2 refs

Watermarking enables entity-level attribution and monitoring through signal aggregation even in zero-bit designs, creating an unavoidable dual-use tension between attribution and surveillance.

Whispers in the Machine: Confidentiality in Agentic Systems

cs.CR · 2024-02-10 · unverdicted · novelty 6.0

Systematic testing of ten LLM agents across 20 tool scenarios and 14 attacks finds universal vulnerability to prompt injection enabling data exfiltration, with tooling amplifying leakage.

citing papers explorer

Showing 12 of 12 citing papers after filters.