R-CoT embeds watermarks into LLM reasoning paths via redundant CoT and GRPO-based dual optimization, maintaining over 95% true positive rate under fine-tuning and post-training changes.
A watermark for black-box language models.arXiv preprint arXiv:2410.02099
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
baseline 1polarities
baseline 1representative citing papers
LUNA achieves AUROC 0.9959 and median perplexity shift 0.045 across 12 multilingual settings while outperforming eight baselines on quality metrics.
The thesis presents a kernel method for multiaccuracy across overlooked subpopulations, information-theoretic optimal watermarking for LLMs, and a simulator showing LLM agents outperforming humans in supply chains while creating tail risks.
LLM watermarking adoption is limited by misaligned stakeholder incentives; incentive-aligned approaches such as in-context watermarking can enable practical use in targeted domains like education and peer review.
citing papers explorer
-
R-CoT: A Reasoning-Layer Watermark via Redundant Chain-of-Thought in Large Language Models
R-CoT embeds watermarks into LLM reasoning paths via redundant CoT and GRPO-based dual optimization, maintaining over 95% true positive rate under fine-tuning and post-training changes.
-
Linguistics-Aware Non-Distortionary LLM Watermarking
LUNA achieves AUROC 0.9959 and median perplexity shift 0.045 across 12 multilingual settings while outperforming eight baselines on quality metrics.
-
Trustworthy AI: Ensuring Reliability and Accountability from Models to Agents
The thesis presents a kernel method for multiaccuracy across overlooked subpopulations, information-theoretic optimal watermarking for LLMs, and a simulator showing LLM agents outperforming humans in supply chains while creating tail risks.
-
Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption
LLM watermarking adoption is limited by misaligned stakeholder incentives; incentive-aligned approaches such as in-context watermarking can enable practical use in targeted domains like education and peer review.