Standard deviation distributions of attention matrices in LLMs remain distinctive and stable after continued training, enabling fingerprinting to trace model lineage and detect potential plagiarism such as in Pangu Pro MoE.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CR 2years
2025 2verdicts
UNVERDICTED 2representative citing papers
AP-Test identifies deployed guardrails in LLMs via adversarial prompt testing and a match score metric, reporting perfect accuracy on four open-source guardrails.
citing papers explorer
-
Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model!
Standard deviation distributions of attention matrices in LLMs remain distinctive and stable after continued training, enabling fingerprinting to trace model lineage and detect potential plagiarism such as in Pangu Pro MoE.
-
Peering Behind the Shield: Guardrail Identification in Large Language Models
AP-Test identifies deployed guardrails in LLMs via adversarial prompt testing and a match score metric, reporting perfect accuracy on four open-source guardrails.