Hybrid vector-search plus fingerprinting pipeline for LLM code provenance achieves Winnowing-level MRR on short snippets and up to 5.4% better on longer ones at logarithmic query time.
Can watermarking large language models prevent copyrighted text generation and hide training data?CoRR, abs/2407.17417, 2024
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Efficient and Scalable Provenance Tracking for LLM-Generated Code Snippets
Hybrid vector-search plus fingerprinting pipeline for LLM code provenance achieves Winnowing-level MRR on short snippets and up to 5.4% better on longer ones at logarithmic query time.