pith. sign in

arxiv: 2605.29524 · v1 · pith:SB4OJHZNnew · submitted 2026-05-28 · 💻 cs.CR · cs.AI

KBF: Knowledge Boundary as Fingerprint for Language Model and Black-Box API Auditing

Pith reviewed 2026-06-29 06:42 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords LLM auditingblack-box fingerprintingknowledge boundaryAPI verificationmodel substitutionmixed-routing detectionClaude endpoints
0
0 comments X

The pith

KBF fingerprints LLM APIs by measuring stable numerical recall near the knowledge boundary to detect substitutions and routing inconsistencies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents KBF as a black-box protocol that treats recall rates close to a model's knowledge cutoff as a unique fingerprint. This approach lets users check whether an API endpoint actually delivers the model it claims to serve. Tests across 16 production endpoints show it identifies every substitution of economic interest while leaving same-model instances untouched. The method also flags partial routing attacks even when only a small fraction of queries are diverted and reveals inconsistencies in live platform audits.

Core claim

The numerical recall rate near the knowledge boundary functions as a stable, model-specific fingerprint that enables reliable auditing of claimed LLM endpoints without white-box access. When applied to 16 production APIs, the fingerprint correctly identifies all 155 tested substitutions, produces no false rejections on control cases, remains consistent under deployment changes, detects mixed-routing attacks at 5-10 percent substitution rates, and uncovers statistical mismatches in 7 of 27 audited platform cells, concentrated on premium Claude endpoints.

What carries the argument

KBF protocol, which extracts a fingerprint from the numerical recall rate of facts positioned near a model's knowledge boundary and compares it across endpoints.

If this is right

  • All tested economically relevant substitutions are detected with zero false rejections on same-model controls.
  • The fingerprint remains usable under deployment variation and prompt changes.
  • Mixed-routing attacks become detectable when as little as 5-10 percent of traffic is substituted.
  • Seven of twenty-seven platform model cells show inconsistencies with reference endpoints in a six-platform audit.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Users could run periodic checks on commercial APIs to verify they receive the advertised model version.
  • The same boundary-recall idea might extend to auditing other black-box services that claim specific capabilities.
  • Platforms might adopt boundary-recall benchmarks as part of transparency reporting.

Load-bearing premise

Recall rates near the knowledge boundary stay stable enough and distinct enough across different deployments and minor updates to serve as a reliable unique identifier.

What would settle it

A same-model endpoint that produces recall rates statistically different from its reference fingerprint under normal operation, or a substituted endpoint whose rates match the claimed model within the audit threshold.

Figures

Figures reproduced from arXiv: 2605.29524 by Bingyu Li, Mingxun Zhou, Yijia Fang, Yiqing Feng.

Figure 1
Figure 1. Figure 1: The auditor interacts with both the official reference [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: System model for black-box relay auditing. The [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Knowledge-boundary intuition and example probes. KBF retains regimes where the reference endpoint commits [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Detection heatmap over all 16×16 model pairs. Cell color indicates cross-model mismatch rate on the reference probe set. Colored cells are economically relevant substitutions, and all 155 such pairs are detected at p < 0.05. Diagonal cells are same-reference endpoint controls. Blank cells correspond to upgrade directions outside our threat model. it genuinely unstable, accounts for 2 of the 80 trials, for … view at source ↗
Figure 4
Figure 4. Figure 4: Online audit cost. Each bar is the cost of one query [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Adaptive routing results under the two-round [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: the same advertised model identifier is served under [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 7
Figure 7. Figure 7: Tier-dependent inconsistency across 7 shadow API [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
read the original abstract

Relay and reseller APIs increasingly intermediate access to large language models (LLMs), but users have no direct way to verify that a claimed endpoint is actually serving the advertised model. We introduce KBF, a low-cost black-box auditing protocol that fingerprints model APIs using stable numerical recall near the knowledge boundary. Across 16 production LLM endpoints, KBF flags all 155 economically relevant substitutions without rejecting any same-model controls, remains stable under deployment variation, detects high-separation mixed-routing attacks when only 5-10% of traffic is substituted, and finds that 7 of 27 platform model cells in a six-platform shadow API audit are statistically inconsistent with their reference endpoints, with inconsistencies concentrated on premium Claude endpoints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces KBF, a black-box auditing protocol that fingerprints LLM APIs via numerical recall rates on fixed probes near the knowledge boundary. It reports that across 16 production endpoints the method flags all 155 economically relevant substitutions with zero false positives on same-model controls, remains stable under deployment variation, detects mixed-routing attacks at 5-10% substitution levels, and identifies statistical inconsistencies in 7 of 27 platform-model cells in a six-platform shadow audit (concentrated on premium Claude endpoints).

Significance. If the empirical stability and model-specificity claims hold, KBF supplies a low-cost, practical tool for verifying claimed LLM endpoints in the presence of relays and resellers—an increasingly relevant security and trust issue. The real-world audit results on 27 platform cells constitute concrete, falsifiable evidence of the method’s utility; the absence of free parameters or fitted models is a strength of the protocol as described.

major comments (3)
  1. [Abstract / stability results] Abstract and results on stability: the central claim that recall rates near the knowledge boundary function as a reliable, unique fingerprint requires that within-model variance (across deployments, prompts, or minor updates) remains smaller than inter-model separation. The manuscript states stability under deployment variation but supplies no explicit quantification or experiments on version updates or fine-tunes; this assumption is load-bearing for the auditing application and for the zero-false-positive claim on same-model controls.
  2. [Abstract] Abstract: the reported performance figures (155/155 detections, 0 false positives, 5-10% traffic detection) are presented without accompanying error bars, dataset definitions, probe-set construction details, or statistical tests. This prevents independent verification that the data support the stated claims and is load-bearing for the empirical validation of the method.
  3. [Audit results] Shadow-audit results (7/27 inconsistent cells): the claim of statistical inconsistency with reference endpoints requires a clear definition of the inconsistency test, threshold, and multiple-testing correction. Without these, it is unclear whether the concentration on premium Claude endpoints reflects a genuine finding or an artifact of the chosen metric.
minor comments (2)
  1. [Abstract] The abstract supplies numerical success rates but no methodological details; moving a concise methods paragraph or reference to the relevant section into the abstract would improve readability.
  2. [Method] Notation for the recall metric and probe-set construction should be defined once at first use and used consistently; current presentation leaves the exact numerical recall computation implicit.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for your thorough review and valuable feedback on our manuscript. We appreciate the recognition of KBF's potential utility for LLM API auditing. We address each of the major comments below and will incorporate revisions to improve the manuscript's clarity and completeness.

read point-by-point responses
  1. Referee: [Abstract / stability results] Abstract and results on stability: the central claim that recall rates near the knowledge boundary function as a reliable, unique fingerprint requires that within-model variance (across deployments, prompts, or minor updates) remains smaller than inter-model separation. The manuscript states stability under deployment variation but supplies no explicit quantification or experiments on version updates or fine-tunes; this assumption is load-bearing for the auditing application and for the zero-false-positive claim on same-model controls.

    Authors: We concur that quantifying within-model variance is crucial for validating the fingerprint reliability. The manuscript provides evidence through same-model controls showing no false positives across varied deployments and prompts. To address the gap on version updates and fine-tunes, the revised manuscript will include new experiments or analysis using available model version data to demonstrate that variance remains smaller than inter-model separation, thereby supporting the auditing claims. revision: yes

  2. Referee: [Abstract] Abstract: the reported performance figures (155/155 detections, 0 false positives, 5-10% traffic detection) are presented without accompanying error bars, dataset definitions, probe-set construction details, or statistical tests. This prevents independent verification that the data support the stated claims and is load-bearing for the empirical validation of the method.

    Authors: The full paper contains the probe-set construction details and dataset information in the methodology section. However, to enhance the abstract and results presentation, we will revise to include error bars (e.g., via bootstrap or binomial confidence intervals), explicit dataset sizes, and results of statistical tests (such as chi-square for detection rates) in the revised abstract or a supplementary results table. revision: yes

  3. Referee: [Audit results] Shadow-audit results (7/27 inconsistent cells): the claim of statistical inconsistency with reference endpoints requires a clear definition of the inconsistency test, threshold, and multiple-testing correction. Without these, it is unclear whether the concentration on premium Claude endpoints reflects a genuine finding or an artifact of the chosen metric.

    Authors: We agree that precise definition of the statistical procedure is necessary. The current manuscript implies inconsistency based on recall rate deviation from reference, but the revision will explicitly state the test (e.g., z-test against control mean and variance), the threshold used, and apply a multiple-testing correction such as Bonferroni to the 27 cells. This will confirm the robustness of the finding regarding Claude endpoints. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical protocol with no derivations or self-referential fits.

full rationale

The paper describes KBF as a black-box auditing protocol relying on empirical measurement of numerical recall rates near the knowledge boundary across production endpoints. No equations, parameter fittings, uniqueness theorems, or derivation chains appear in the abstract or described claims. Results are presented as direct experimental outcomes (zero false positives on controls, detection at 5-10% substitution) rather than quantities forced by construction from inputs. The approach is self-contained against external benchmarks via reported measurements on 16 endpoints and 27 platform cells.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated. The central claim rests on the unelaborated premise that knowledge-boundary recall is stable and distinctive.

pith-pipeline@v0.9.1-grok · 5654 in / 1167 out tokens · 22365 ms · 2026-06-29T06:42:45.082972+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 17 canonical work pages · 4 internal anchors

  1. [1]

    Real money, fake models: Deceptive model claims in shadow apis,

    Y . Zhang, Y . Jiang, Z. Chen, M. Backes, X. Shen, and Y . Zhang, “Real money, fake models: Deceptive model claims in shadow apis,” arXiv preprint arXiv:2603.01919, 2026

  2. [2]

    I’m spartacus, no, i’m spartacus: Measuring and under- standing llm identity confusion,

    K. Li, S. Zhuang, Y . Zhang, M. Xu, R. Wang, K. Xu, X. Fu, and X. Cheng, “I’m spartacus, no, i’m spartacus: Measuring and under- standing llm identity confusion,”arXiv preprint arXiv:2411.10683, 2024

  3. [3]

    Turning your weakness into a strength: Watermarking deep neural networks by backdooring,

    Y . Adi, C. Baum, M. Cisse, B. Pinkas, and J. Keshet, “Turning your weakness into a strength: Watermarking deep neural networks by backdooring,” in27th USENIX security symposium (USENIX Security 18), 2018, pp. 1615–1631

  4. [4]

    Hey, that’s my model! introduc- ing chain & hash, an llm fingerprinting technique,

    M. Russinovich and A. Salem, “Hey, that’s my model! introduc- ing chain & hash, an llm fingerprinting technique,”arXiv preprint arXiv:2407.10887, 2024

  5. [5]

    Instruc- tional fingerprinting of large language models,

    J. Xu, F. Wang, M. Ma, P. W. Koh, C. Xiao, and M. Chen, “Instruc- tional fingerprinting of large language models,” inProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024, pp. 3277–3306

  6. [6]

    zkllm: Zero knowledge proofs for large language models,

    H. Sun, J. Li, and H. Zhang, “zkllm: Zero knowledge proofs for large language models,” inProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, 2024, pp. 4405–4419

  7. [7]

    Telesparse: Practical privacy-preserving verification of deep neural networks,

    M. M. Maheri, H. Haddadi, and A. Davidson, “Telesparse: Practical privacy-preserving verification of deep neural networks,”arXiv preprint arXiv:2504.19274, 2025

  8. [8]

    Immaculate: A practical llm auditing frame- work via verifiable computation,

    Y . Guo, W. Qu, L. Wu, S. Zhai, L. Z. Wang, M. Xu, Y . Liu, B. Yuan, D. Song, and J. Zhang, “Immaculate: A practical llm auditing frame- work via verifiable computation,”arXiv preprint arXiv:2602.22700, 2026

  9. [9]

    Are robust llm fingerprints adversarially robust?

    A. Nasery, E. Contente, A. Kaz, P. Viswanath, and S. Oh, “Are robust llm fingerprints adversarially robust?”arXiv preprint arXiv:2509.26598, 2025. [Online]. Available: https://arxiv.org/abs/ 2509.26598

  10. [10]

    Model equality testing: Which model is this api serving?

    I. Gao, P. Liang, and C. Guestrin, “Model equality testing: Which model is this api serving?”arXiv preprint arXiv:2410.20247, 2024

  11. [11]

    Reading between the lines: Towards reliable black-box llm fingerprinting via zeroth-order gradient estimation,

    S. Shao, Y . Li, H. Yao, Y . Chen, Y . Yang, and Z. Qin, “Reading between the lines: Towards reliable black-box llm fingerprinting via zeroth-order gradient estimation,” inProceedings of the ACM Web Conference 2026, 2026, pp. 2637–2648

  12. [12]

    {LLMmap}: Fingerprinting for large language models,

    D. Pasquini, E. M. Kornaropoulos, and G. Ateniese, “ {LLMmap}: Fingerprinting for large language models,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 299–318

  13. [13]

    Fingerprinting LLMs via Prompt Injection

    Y . Hu, Z. Jiang, M. Li, O. Ahmed, Z. Huang, C. Hong, and N. Gong, “Fingerprinting llms via prompt injection,”arXiv preprint arXiv:2509.25448, 2025

  14. [14]

    The daunting dilemma with sentence encoders: Success on standard benchmarks, failure in capturing basic semantic properties,

    Y . Mahajan, N. Bansal, and S. K. Karmaker, “The daunting dilemma with sentence encoders: Success on standard benchmarks, failure in capturing basic semantic properties,”arXiv preprint arXiv:2309.03747, 2023

  15. [15]

    Understanding zero-shot adversarial robustness for large-scale models,

    C. Mao, S. Geng, J. Yang, X. Wang, and C. V ondrick, “Understanding zero-shot adversarial robustness for large-scale models,” inThe Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=P4bXCawRi5J

  16. [16]

    Trusted execution environment: What it is, and what it is not,

    M. Sabt, M. Achemlal, and A. Bouabdallah, “Trusted execution environment: What it is, and what it is not,” in2015 IEEE Trust- com/BigDataSE/Ispa, vol. 1. IEEE, 2015, pp. 57–64

  17. [17]

    Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware

    F. Tramer and D. Boneh, “Slalom: Fast, verifiable and private execution of neural networks in trusted hardware,”arXiv preprint arXiv:1806.03287, 2018

  18. [18]

    Rofl: Robust fingerprinting of language models,

    Y .-Y . Tsai, C. Guo, J. Yang, and L. van der Maaten, “Rofl: Robust fingerprinting of language models,”arXiv preprint arXiv:2505.12682, 2025

  19. [19]

    Trap: Targeted random adversarial prompt honeypot for black-box identification,

    M. Gubri, D. Ulmer, H. Lee, S. Yun, and S. J. Oh, “Trap: Targeted random adversarial prompt honeypot for black-box identification,” in Findings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 11 496–11 517

  20. [20]

    Fdllm: A dedicated detector for black-box llms fingerprinting,

    Z. Fu, J. Chen, L. Zhang, T. Yang, J. Niu, H. Sun, R. Li, P. Liu, J. Wang, F. Heet al., “Fdllm: A dedicated detector for black-box llms fingerprinting,” in2025 IEEE 24th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). IEEE, 2025, pp. 1374–1379

  21. [21]

    JPL Small-Body Database Lookup,

    NASA Jet Propulsion Laboratory, “JPL Small-Body Database Lookup,” https://ssd.jpl.nasa.gov/tools/sbdb lookup.html, accessed May 9, 2026

  22. [22]

    The On-Line Encyclopedia of Integer Sequences,

    The OEIS Foundation Inc., “The On-Line Encyclopedia of Integer Sequences,” https://oeis.org/, accessed May 11, 2026

  23. [23]

    Binomial test,

    M. M. Wagner-Menghin, “Binomial test,”Encyclopedia of statistics in behavioral science, 2005

  24. [24]

    Note on the sampling error of the difference between correlated proportions or percentages,

    Q. McNemar, “Note on the sampling error of the difference between correlated proportions or percentages,”Psychometrika, vol. 12, no. 2, pp. 153–157, 1947

  25. [25]

    A fingerprint for large language models,

    Z. Yang and H. Wu, “A fingerprint for large language models,”arXiv preprint arXiv:2407.01235, 2024

  26. [26]

    Robust llm fingerprinting via domain-specific watermarks,

    T. Gloaguen, R. Staab, N. Jovanovi ´c, and M. Vechev, “Robust llm fingerprinting via domain-specific watermarks,” inICML 2025 Workshop on Reliable and Responsible Foundation Models, 2025

  27. [27]

    Evertracer: Hunting stolen large language models via stealthy and robust probabilistic fingerprint,

    Z. Xu, M. Han, and W. Xing, “Evertracer: Hunting stolen large language models via stealthy and robust probabilistic fingerprint,” in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025, pp. 7019–7042

  28. [28]

    AttnDiff: Attention-based Differential Fingerprinting for Large Language Models

    H. Zhang, Z. Xu, J. Li, S. Sheng, D. Kong, and M. Han, “Attndiff: Attention-based differential fingerprinting for large language models,” arXiv preprint arXiv:2604.05502, 2026

  29. [29]

    Gradient-based model fingerprinting for llm similarity detection and family classification,

    Z. Wu, Y . Zhao, and H. Wang, “Gradient-based model fingerprinting for llm similarity detection and family classification,”arXiv preprint arXiv:2506.01631, 2025

  30. [30]

    Every language model has a forgery-resistant signature,

    M. Finlayson, X. Ren, and S. Swayamdipta, “Every language model has a forgery-resistant signature,”arXiv preprint arXiv:2510.14086, 2025

  31. [31]

    A watermark for large language models,

    J. Kirchenbauer, J. Geiping, Y . Wen, J. Katz, I. Miers, and T. Goldstein, “A watermark for large language models,” inInternational conference on machine learning. PMLR, 2023, pp. 17 061–17 084

  32. [32]

    Scalable watermarking for identifying large language model outputs,

    S. Dathathri, A. See, S. Ghaisas, P.-S. Huang, R. McAdam, J. Welbl, V . Bachani, A. Kaskasoli, R. Stanforth, T. Matejovicovaet al., “Scalable watermarking for identifying large language model outputs,” Nature, vol. 634, no. 8035, pp. 818–823, 2024. 18

  33. [33]

    Semstamp: A semantic watermark with paraphrastic robustness for text generation,

    A. Hou, J. Zhang, T. He, Y . Wang, Y .-S. Chuang, H. Wang, L. Shen, B. Van Durme, D. Khashabi, and Y . Tsvetkov, “Semstamp: A semantic watermark with paraphrastic robustness for text generation,” inProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long ...

  34. [34]

    Pmark: Towards robust and distortion-free semantic-level watermark- ing with channel constraints,

    J. Huo, S. Liu, B. Wang, J. Zhang, Y . Yan, A. Liu, X. Hu, and M. Zhou, “Pmark: Towards robust and distortion-free semantic-level watermark- ing with channel constraints,”arXiv preprint arXiv:2509.21057, 2025

  35. [35]

    Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

    H. Liu, C. Shou, H. Wen, Y . Chen, R. J. Fang, and Y . Feng, “Your agent is mine: Measuring malicious intermediary attacks on the llm supply chain,”arXiv preprint arXiv:2604.08407, 2026. Appendix A. Experimental Setup This appendix records the request-side configuration used in the KBF pipeline. We focus on the parameters that affect probe content and aud...

  36. [36]

    Appendix C

    We follow the public implementation and use Euclidean distance for the ranking computation. Appendix C. Shadow API Platform Mapping The box below lists the mapping from the shortened identifiers used in Section 4.7 to their salted SHA-256 digests. Shadow API platform mapping •Platform 1↔SH-87b19c56 •Platform 2↔SH-3c82a49c •Platform 3↔SH-4f754cc3 •Platform...