pith. sign in

arxiv: 2607.01313 · v1 · pith:TG7VU5W3new · submitted 2026-07-01 · 💻 cs.LG · cs.AI· cs.CL· cs.CR

Black-Box Inference of LLM Architectural Properties with Restrictive API Access

Pith reviewed 2026-07-03 21:32 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CLcs.CR
keywords black-box inferenceLLM architecturehidden dimensionspectral analysisAPI restrictionsmodel depthparameter countcommon-set prompting
0
0 comments X

The pith

Restrictive black-box LLM APIs still permit recovery of hidden dimensions, depth, and parameter counts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that even when providers limit API responses to a single logit per token and remove logit bias options, an attacker can still extract several core architectural parameters. NightVision collects log probabilities for identical token sets across varied prompts, applies spectral analysis to those outputs to recover the hidden dimension, and folds in time-to-first-token measurements to estimate depth and total parameter count. A sympathetic reader would care because the results indicate that current restrictions leave model internals exposed to black-box probing. The evaluation covers 32 open models and reports concrete error rates that improve on mixture-of-expert architectures and on models larger than three billion parameters.

Core claim

NightVision recovers hidden dimension to 23 percent average relative error across all tested models and 9 percent on MoE models by spectral analysis of log probabilities obtained through common-set prompting; it then uses the dimension estimate together with end-to-end time-to-first-token measurements to recover depth and parameter count to within 53 percent for models exceeding three billion parameters.

What carries the argument

Common-set prompting that forces multiple prompts to surface log probabilities for the same output tokens, enabling spectral analysis to isolate the hidden dimension.

If this is right

  • Current API restrictions are insufficient to fully obfuscate LLM architectural details.
  • Hidden-dimension recovery reaches 9 percent average relative error on mixture-of-experts models.
  • Depth and parameter-count estimates reach 53 percent accuracy for models larger than three billion parameters.
  • Inference accuracy improves with larger token budgets and varies systematically with model size and architecture type.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same prompting-plus-timing pipeline could be applied directly to closed commercial endpoints to measure leakage in production settings.
  • Small additive noise on output probabilities might disrupt the spectral signatures that NightVision relies upon.
  • Response-time side channels could be hardened by server-side padding or rate limits that mask true compute depth.

Load-bearing premise

The spectral properties of the collected log probabilities encode the hidden dimension without substantial confounding from other architectural choices or training details.

What would settle it

Running the common-set prompting procedure on open-source models whose hidden dimensions are publicly known and observing that the spectral estimates deviate from the true values by more than the reported average relative error.

Figures

Figures reproduced from arXiv: 2607.01313 by Christopher Ellis, Giulia Fanti, Jos\'e M. F. Moura, Leighton Barnes, Mei-Yu Wang, Shreyas Chaudhari.

Figure 1
Figure 1. Figure 1: NightVision overview We first give an overview of NightVision, as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of our common-set prompting attack to Top-1 binary logit-bias attack of [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Relative error vs. token budget. Errors for [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Mean relative error of the spectral hidden [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Cost overhead ratio as a function of hidden dimension. Cost overhead ratio = (Our cost)/(Cost of Carlini et al. [2024]). We report the cost to reach relative error ≤ 30%. Marker color encodes the number of samples used to reach that error at each point. Overall, the cost ratio varies with hidden dimen￾sion d but NightVision spends about 1 to 2.3 orders of magnitude (median ∼1.7) more to￾kens than the logit… view at source ↗
Figure 6
Figure 6. Figure 6: Required samples per prompt N to reach a common token set of size d, for D ∈ {5000, 10000, 20000} prompts. Token distributions are from the smallest HuggingFace SmolLM model (135M parameters; vocabulary V = 49,152, temperature T = 1). The two theory lines, expected N (blue) and the 95% upper bound (orange, Corollary 1), analyze the strict requirement that a token appear in all D prompts. The empirical 95% … view at source ↗
Figure 7
Figure 7. Figure 7: Mean relative error of the number of layers and total model parameter count estimates on the validation set vs. total prompt-token bud￾get for timing method. Here we assume perfect knowledge of hidden dimension. Each curve averages over the n validation models in the cor￾responding size and architecture group. Lℓ2 d [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Next-token distributions for a high-entropy prompt (top) and a low-entropy prompt (bottom), [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Spectra used to recover the hidden dimension [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Timing-based architecture inference pipeline. The procedure fits prefill-dominated scaling [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Measured runtime versus predicted runtime with scaling relation as function of number of layers, hidden dimension, and input sequence length. B.3.3 Inferring Depth L under Hardware Mismatch In practice, the target model is rarely benchmarked on the same platform and configuration used to establish the reference scaling relation. We therefore introduce a cross-platform calibration procedure that transfers … view at source ↗
Figure 13
Figure 13. Figure 13: Relative error for the estimate of the number of layers (top panel), total number of parameters (middle panel), and hidden dimension (bottom panel) for various models. Here we assume the reference and target platform are the same (both with NVIDIA H200 GPU). Dark bars show the raw estimate; light bars show the estimate after rounding to the nearest integer. Some of the largest errors are driven by small m… view at source ↗
Figure 14
Figure 14. Figure 14: Cross-platform relative error for the estimated layer count (top) and total parameter count (bottom) across models, where the platform transfer is between NVIDIA H200 and A100 GPUs. Dark bars denote H200 as the reference platform and A100 as the target; light bars denote the reverse configuration. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Windowed slope d arctan(λ)/di of the ordered eigenvalues for Llama-3.1-8B at the maximum sample budget (N=106 per prompt). The true hidden dimension d ∗=4,096 is marked with a vertical line. The change-point detector (Section B.1.3) localizes the elbow inside the area of interest (AOI) bracketed by the saddle (1655) and peak (3077). Both bracket landmarks fall well below the true d ∗ , so the estimate ˆd … view at source ↗
Figure 16
Figure 16. Figure 16: Relative error in estimating the number of layers and total number of parameters as a function of API query budget (measured in number of tokens). Some of the largest errors are driven by small models with ≤ 4B parameters [PITH_FULL_IMAGE:figures/full_fig_p028_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Estimation relative error versus number of API calls (left panel) and estimated cost in USD (right panel). Relative errors for d, L, and N are shown as a function of total token budget, with bold curves denoting averages across target models and faint curves denoting individual models. Larger budgets generally reduce error, with L improving fastest and N remaining harder to estimate. Results include ≥3B-p… view at source ↗
read the original abstract

In practice, most commercial LLM providers do not publicly release details of underlying LLM architectures. However, prior work has shown that given limited API access to an LLM (namely, top-$k$ logits and/or a logit bias function), one can recover certain architectural details of an LLM, such as the hidden dimension of the feed-forward network. Perhaps in response to these results, most commercial LLM providers have restricted their APIs to expose only the single logit for each decoded token, and they no longer give users the ability to bias logits. We show that even under current restrictive APIs, several architectural parameters are still recoverable. We present NightVision, an attack that uses restrictive black-box API access to estimate the hidden dimension, depth, and parameter count of an LLM. Algorithmically, NightVision relies on a novel common set prompting technique in which multiple prompts expose log probabilities for the same set of output tokens; a spectral analysis of these results is used to infer hidden dimension. NightVision additionally uses end-to-end time to first token (TTFT) measurements and the estimated hidden dimension to estimate depth and parameter count. We empirically evaluate NightVision on 32 open-source LLMs, recovering hidden dimension to within 23% average relative error across all models (9% on MoE models), and depth and parameter count to within 53% for models exceeding three billion parameters. We run extensive ablations to demonstrate how these accuracies scale with token budget and model properties. Overall, our results suggest that current LLM APIs are not sufficiently restricted to fully obfuscate the architectural details of their underlying models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that restrictive LLM APIs exposing only single logits per token still allow recovery of hidden dimension, depth, and parameter count via NightVision. The method uses common-set prompting to collect log probabilities over identical output tokens across prompts, followed by spectral analysis to infer hidden dimension; TTFT measurements combined with the dimension estimate then recover depth and parameter count. On 32 open-source models under simulated restrictive access, it reports 23% average relative error for hidden dimension (9% on MoE models) and 53% error for depth and parameter count on models exceeding 3B parameters, with ablations on token budget.

Significance. If the spectral features prove to be driven primarily by hidden dimension rather than correlated architectural or training choices, the result would be significant: it demonstrates that current single-logit API restrictions are insufficient to fully obscure model architecture, with direct implications for model-provenance and security analyses. The work earns credit for its scale (32 models), concrete error metrics, and explicit ablations on token budget and model properties.

major comments (2)
  1. [NightVision spectral analysis (as described in the method and abstract)] The central recovery of hidden dimension rests on the premise that spectral properties of the collected log-probability vectors encode hidden dimension with limited interference from other factors. The evaluation reports 23% average relative error across 32 models but provides no controls or ablations that isolate hidden dimension from correlated properties such as attention configuration, activation function, or vocabulary-induced logit distributions; without such isolation the reported accuracy may not transfer to unseen closed models.
  2. [TTFT-based depth and parameter estimation] The TTFT-to-depth and parameter-count mapping is load-bearing for the second half of the claims, yet the manuscript does not detail the precise functional form, any hardware or batch-size assumptions, or sensitivity analysis; the 53% error figure for models >3B therefore cannot be assessed for robustness when the hidden-dimension estimate itself carries 23% error.
minor comments (1)
  1. [Evaluation and ablations] The abstract states that 'extensive ablations' demonstrate scaling with token budget, but the main text should include a dedicated table or figure quantifying error versus number of prompts/tokens for each recovered quantity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We respond point-by-point to the major comments below.

read point-by-point responses
  1. Referee: The central recovery of hidden dimension rests on the premise that spectral properties of the collected log-probability vectors encode hidden dimension with limited interference from other factors. The evaluation reports 23% average relative error across 32 models but provides no controls or ablations that isolate hidden dimension from correlated properties such as attention configuration, activation function, or vocabulary-induced logit distributions; without such isolation the reported accuracy may not transfer to unseen closed models.

    Authors: We acknowledge that the evaluation does not include controlled experiments in which only hidden dimension is varied while fixing all other architectural choices, as this would require training many models from scratch. The 32-model testbed does, however, span substantial diversity in attention configurations, activation functions, and vocabulary sizes, and the method achieves lower error on the MoE subset. We will add a discussion section that explicitly addresses potential confounding factors and the limits this places on claims about generalization to unseen closed models. revision: partial

  2. Referee: The TTFT-to-depth and parameter-count mapping is load-bearing for the second half of the claims, yet the manuscript does not detail the precise functional form, any hardware or batch-size assumptions, or sensitivity analysis; the 53% error figure for models >3B therefore cannot be assessed for robustness when the hidden-dimension estimate itself carries 23% error.

    Authors: We agree that the current manuscript lacks sufficient detail on the TTFT-based estimator. In revision we will specify the exact functional form relating TTFT and the hidden-dimension estimate to depth and parameter count, state the hardware and batch-size conditions under which TTFT was measured, and include a sensitivity analysis that propagates the 23% hidden-dimension error into the final depth and parameter-count predictions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical inference method validated on open models

full rationale

The paper describes an empirical attack (NightVision) that collects log-probabilities via common-set prompting, applies spectral analysis to recover hidden dimension, and uses TTFT measurements plus the dimension estimate to recover depth and parameter count. All performance claims (23% relative error on hidden dimension, 53% on depth/params for large models) are measured directly against ground-truth values on 32 open-source LLMs under simulated restrictive access. No derivation step reduces a claimed prediction to a fitted parameter by the paper's own equations, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled in; the method is therefore self-contained as an experimental procedure rather than a closed definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the method is presented as an empirical attack relying on observed correlations between output statistics, timing, and architecture rather than new theoretical postulates.

pith-pipeline@v0.9.1-grok · 5842 in / 1381 out tokens · 35166 ms · 2026-07-03T21:32:13.701543+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 9 canonical work pages · 2 internal anchors

  1. [1]

    2020 , eprint=

    Scaling Laws for Neural Language Models , author=. 2020 , eprint=

  2. [2]

    The Thirteenth International Conference on Learning Representations , year=

    Gated Delta Networks: Improving Mamba2 with Delta Rule , author=. The Thirteenth International Conference on Learning Representations , year=

  3. [3]

    2020 , eprint=

    GLU Variants Improve Transformer , author=. 2020 , eprint=

  4. [4]

    2023 , url=

    Joshua Ainslie and James Lee-Thorp and Michiel de Jong and Yury Zemlyanskiy and Federico Lebron and Sumit Sanghai , booktitle=. 2023 , url=

  5. [5]

    FlashAttention: Fast and Memory-Efficient Exact Attention with

    Tri Dao and Daniel Y Fu and Stefano Ermon and Atri Rudra and Christopher Re , booktitle=. FlashAttention: Fast and Memory-Efficient Exact Attention with. 2022 , url=

  6. [6]

    Stealing Neural Networks via Timing Side Channels

    Stealing neural networks via timing side channels , author=. arXiv preprint arXiv:1812.11720 , year=

  7. [7]

    Deepsniffer: A

    Hu, Xing and Liang, Ling and Li, Shuangchen and Deng, Lei and Zuo, Pengfei and Ji, Yu and Xie, Xinfeng and Ding, Yufei and Liu, Chang and Sherwood, Timothy and others , booktitle=. Deepsniffer: A

  8. [8]

    Inputsnatch: Stealing input in

    Zheng, Xinyao and Han, Husheng and Shi, Shangyi and Fang, Qiyan and Du, Zidong and Hu, Xing and Guo, Qi , journal=. Inputsnatch: Stealing input in

  9. [9]

    Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

    Instructional fingerprinting of large language models , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

  10. [10]

    Negative Association of Random Variables with Applications , urldate =

    Kumar Joag-Dev and Frank Proschan , journal =. Negative Association of Random Variables with Applications , urldate =

  11. [11]

    Journal of Theoretical Probability , year =

    Shao, Qi-Man , title =. Journal of Theoretical Probability , year =. doi:10.1023/A:1007849609234 , url =

  12. [12]

    BRICS Report Series , volume=

    Balls and bins: A study in negative dependence , author=. BRICS Report Series , volume=

  13. [13]

    Logits of

    Finlayson, Matthew and Ren, Xiang and Swayamdipta, Swabha , booktitle=. Logits of. 2024 , url=

  14. [14]

    Carlini et al

    Stealing part of a production language model , author=. arXiv preprint arXiv:2403.06634 , year=

  15. [15]

    Fast estimation of approximate matrix ranks using spectral densities

    Fast estimation of approximate matrix ranks using spectral densities , author=. arXiv preprint arXiv:1608.05754 , year=

  16. [16]

    Computational Statistics & Data Analysis , volume=

    Automatic dimensionality selection from the scree plot via the use of profile likelihood , author=. Computational Statistics & Data Analysis , volume=. 2006 , publisher=

  17. [17]

    Statistics and Computing , volume=

    A tutorial on spectral clustering , author=. Statistics and Computing , volume=. 2007 , publisher=

  18. [18]

    Reading Between the Lines: Towards Reliable Black-box

    Shao, Shuo and Li, Yiming and Yao, Hongwei and Chen, Yifei and Yang, Yuchen and Qin, Zhan , journal=. Reading Between the Lines: Towards Reliable Black-box

  19. [19]

    arXiv preprint arXiv:2508.19843 , year=

    Sok: Large language model copyright auditing via fingerprinting , author=. arXiv preprint arXiv:2508.19843 , year=

  20. [20]

    34th USENIX Security Symposium (USENIX Security 25) , pages=

    \ LLMmap \ : Fingerprinting for large language models , author=. 34th USENIX Security Symposium (USENIX Security 25) , pages=

  21. [21]

    arXiv preprint arXiv:2410.14273 , year=

    Reef: Representation encoding fingerprints for large language models , author=. arXiv preprint arXiv:2410.14273 , year=

  22. [22]

    Advances in Neural Information Processing Systems , volume=

    Huref: Human-readable fingerprint for large language models , author=. Advances in Neural Information Processing Systems , volume=

  23. [23]

    Findings of the Association for Computational Linguistics: ACL 2024 , pages=

    Trap: Targeted random adversarial prompt honeypot for black-box identification , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=

  24. [24]

    Operationalizing a threat model for red-teaming large language models (

    Verma, Apurv and Krishna, Satyapriya and Gehrmann, Sebastian and Seshadri, Madhavan and Pradhan, Anu and Ault, Tom and Barrett, Leslie and Rabinowitz, David and Doucette, John and Phan, NhatHai , journal=. Operationalizing a threat model for red-teaming large language models (

  25. [25]

    A survey of

    Pan, James and Li, Guoliang , journal=. A survey of

  26. [26]

    Large language model inference acceleration: A comprehensive hardware perspective.arXiv preprint arXiv:2410.04466, 2024

    Large language model inference acceleration: A comprehensive hardware perspective , author=. arXiv preprint arXiv:2410.04466 , year=

  27. [27]

    Incompressible Knowledge Probes: Estimating Black-Box

    Li, Bojie , journal=. Incompressible Knowledge Probes: Estimating Black-Box

  28. [28]

    2025 , publisher=

    Alhazbi, Saeif and Hussain, Ahmed and Oligeri, Gabriele and Papadimitratos, Panos , journal=. 2025 , publisher=

  29. [29]

    Proceedings of the 2025 14th International Conference on Software and Computer Applications , pages=

    Bridging the Security Gap: An Empirical Analysis of LLM-API Integration Vulnerabilities and Mitigation Strategies , author=. Proceedings of the 2025 14th International Conference on Software and Computer Applications , pages=

  30. [30]

    Proceedings of the ACM Web Conference 2026 , pages=

    Exploring and Exploiting Security Vulnerabilities in Self-Hosted LLM Services , author=. Proceedings of the ACM Web Conference 2026 , pages=

  31. [31]

    Proceedings of the 33rd ACM International Conference on Information and Knowledge Management , pages=

    AuditLLM: A tool for auditing large language models using multiprobe approach , author=. Proceedings of the 33rd ACM International Conference on Information and Knowledge Management , pages=

  32. [32]

    AI and Ethics , volume=

    Auditing large language models: a three-layered approach , author=. AI and Ethics , volume=. 2024 , publisher=

  33. [33]

    Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society , pages=

    Supporting human-ai collaboration in auditing llms with llms , author=. Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society , pages=

  34. [34]

    25th USENIX security symposium (USENIX Security 16) , pages=

    Stealing machine learning models via prediction \ APIs \ , author=. 25th USENIX security symposium (USENIX Security 16) , pages=

  35. [35]

    Wei, Junyi and Zhang, Yicheng and Zhou, Zhe and Li, Zhou and Al Faruque, Mohammad Abdullah , booktitle=. Leaky. 2020 , organization=

  36. [36]

    arXiv preprint arXiv:2311.13647 , year=

    Language model inversion , author=. arXiv preprint arXiv:2311.13647 , year=

  37. [37]

    Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

    A survey on model extraction attacks and defenses for large language models , author=. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2 , pages=

  38. [38]

    arXiv preprint arXiv:2512.03816 , year=

    Log Probability Tracking of LLM APIs , author=. arXiv preprint arXiv:2512.03816 , year=

  39. [39]

    Advances in Neural Information Processing Systems , volume=

    Attention is all you need , author=. Advances in Neural Information Processing Systems , volume=