AttnDiff: Attention-based Differential Fingerprinting for Large Language Models

· 2026 · cs.CR · arXiv 2604.05502

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Protecting the intellectual property of open-weight large language models (LLMs) requires verifying whether a suspect model is derived from a victim model despite common laundering operations such as fine-tuning (including PPO/DPO), pruning/compression, and model merging. We propose \textsc{AttnDiff}, a data-efficient white-box framework that extracts fingerprints from models via intrinsic information-routing behavior. \textsc{AttnDiff} probes minimally edited prompt pairs that induce controlled semantic conflicts, captures differential attention patterns, summarizes them with compact spectral descriptors, and compares models using CKA. Across Llama-2/3 and Qwen2.5 (3B--14B) and additional open-source families, it yields high similarity for related derivatives while separating unrelated model families (e.g., $>0.98$ vs.\ $<0.22$ with $M=60$ probes). With 5--60 multi-domain probes, it supports practical provenance verification and accountability.

representative citing papers

KBF: Knowledge Boundary as Fingerprint for Language Model and Black-Box API Auditing

cs.CR · 2026-05-28 · unverdicted · novelty 7.0

KBF uses stable numerical recall near the knowledge boundary to fingerprint and audit black-box LLM APIs, successfully detecting all tested substitutions and some real-world inconsistencies across production endpoints.

citing papers explorer

Showing 1 of 1 citing paper after filters.

KBF: Knowledge Boundary as Fingerprint for Language Model and Black-Box API Auditing cs.CR · 2026-05-28 · unverdicted · none · ref 28 · internal anchor
KBF uses stable numerical recall near the knowledge boundary to fingerprint and audit black-box LLM APIs, successfully detecting all tested substitutions and some real-world inconsistencies across production endpoints.

AttnDiff: Attention-based Differential Fingerprinting for Large Language Models

fields

years

verdicts

representative citing papers

citing papers explorer