pith. sign in

arxiv: 2606.06755 · v1 · pith:USIYYDTInew · submitted 2026-06-04 · 💻 cs.CL · cs.ET

PromptPrint: Behavioral Biometrics Through Natural Language Prompting in LLMs

Pith reviewed 2026-06-28 00:57 UTC · model grok-4.3

classification 💻 cs.CL cs.ET
keywords behavioral biometricsprompt-based identityauthorship attributionLLM user modelinglexical analysisstylometric features
0
0 comments X

The pith

Short prompts to LLMs carry stable lexical signals that identify individual users.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether brief, task-driven prompts to large language models contain enough consistent author-specific patterns to serve as a behavioral biometric. Using data from over a thousand users, it finds that surface word choices outperform deeper meaning representations for identification. It also identifies a trade-off where users' styles are unique across people but vary within the same person depending on the prompt context, and shows that the signal holds against small changes but not against full rephrasing. If correct, this opens a new way to model users in LLM systems with consequences for security and privacy.

Core claim

Prompt-based identity, defined as the hypothesis that a user's habitual vocabulary, syntax, and discourse patterns in short LLM prompts form a learnable behavioral biometric, is established through strong identification performance on a dataset of 20,680 real prompts from 1,034 users, with lexical features proving most effective.

What carries the argument

Lexical stability hypothesis, the claim that identity is primarily encoded in surface-level word choice rather than abstract intent, which is supported by lexical representations outperforming semantic encoders.

If this is right

  • Lexical representations significantly outperform semantic encoders for user identification.
  • Stylometric features show users are highly distinctive across the population yet inconsistent across contexts.
  • Identity signals are robust to minor lexical perturbations but degrade under semantic paraphrasing.
  • Prompt-based identity enables strong identification performance at scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach could enable new forms of user authentication or tracking in LLM platforms based on prompt history.
  • Privacy concerns arise if prompt data can be used to link anonymous interactions to individuals.
  • Future work might explore whether task-specific prompts reduce the biometric signal compared to free-form ones.

Load-bearing premise

The collected prompts represent natural, habitual user behavior from distinct individuals whose lexical patterns remain stable enough to serve as an identifiable biometric signal independent of task context or data collection artifacts.

What would settle it

Demonstrating that identification accuracy drops to chance levels when the same users provide prompts in new contexts or after paraphrasing would falsify the viability of prompt-based identity as a biometric.

Figures

Figures reproduced from arXiv: 2606.06755 by Kartik Narayan, Shaiv Patel, Vishal Patel.

Figure 1
Figure 1. Figure 1: LLM prompts as a soft biometric for authentication overview. A user’s prompt behavior can capture how they think, ask, and [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Samples prompts from the WildChat-1M dataset. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Left: Uniqueness-consistency trade-off. Each point represents one method (mean ± std, 5 folds). Stylo-NN (red) isolates the paradox: highest d ′ yet lowest Top-1. TF-IDF+LR (orange) achieves optimal verification (d ′ =0.671); Ensemble (blue) achieves opti￾mal identification (Top-1=64.2%). Middle: Top-1/3/5 accuracy vs. ensemble weight α. Plateau between α=0.5–0.9 confirms stability; standardized at α=0.7. … view at source ↗
Figure 4
Figure 4. Figure 4: Adversarial robustness modeling. Left: Top-1 accuracy vs. token substitution rates. Right: Impact of complete threat vectors against the unattacked Ensemble baseline (0.642). Full paraphrase executes the most severe degradation (Top-1=0.429, EER=0.703), proving the fingerprint is bound to the lexical sur￾face rather than semantic intent. semantic mapping, though TF-IDF retains partial resilience via adjace… view at source ↗
read the original abstract

Authorship attribution research has traditionally focused on long-form, expressive texts; however, interactions with large language models (LLMs) are typically brief and task-driven prompts. This raises a fundamental question: do such prompts contain a stable, author-identifiable, and distinctive signal? We introduce PromptPrint, a systematic study of prompt-based identity, the hypothesis that a user's habitual vocabulary, syntax, and discourse patterns form a learnable behavioral biometric. Using 20,680 real prompts from 1,034 users, we establish three key findings. First, lexical representations significantly outperform semantic encoders, supporting the "lexical stability hypothesis": identity is primarily encoded in surface-level word choice rather than abstract intent. Second, stylometric features exhibit a "uniqueness-consistency paradox": users are highly distinctive across the population, yet behaviorally inconsistent across contexts. Third, adversarial analysis reveals a clear vulnerability spectrum: identity signals are robust to minor lexical perturbations but degrade substantially under semantic paraphrasing. Overall, our results demonstrate strong identification performance at scale, establishing prompt-based identity as a viable behavioral biometric. This work introduces a new perspective on user modeling in LLM interactions, with important implications for security and privacy. Data and code will be released upon the acceptance of our work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces PromptPrint as a study of prompt-based identity, hypothesizing that users' brief, task-driven LLM prompts encode stable, author-identifiable lexical and stylometric signals usable as behavioral biometrics. It reports results from 20,680 real prompts by 1,034 users supporting three findings: lexical representations outperform semantic encoders (lexical stability hypothesis); stylometric features show a uniqueness-consistency paradox (high distinctiveness but low cross-context consistency); and identity signals are robust to minor perturbations but vulnerable to semantic paraphrasing. The work concludes that these results establish strong identification performance at scale and prompt-based identity as a viable biometric, with implications for security and privacy.

Significance. If the empirical claims are supported by properly documented cross-context evaluation and statistical validation, the work would be significant as the first large-scale demonstration of behavioral biometrics in short, task-driven LLM prompts, extending authorship attribution beyond long-form text and highlighting both opportunities and risks for user modeling in LLM platforms. The planned release of data and code is a positive contribution to reproducibility.

major comments (3)
  1. [Abstract] Abstract: The three key findings are asserted without any reported performance metrics (e.g., accuracy, F1, AUC), baselines, statistical tests, or evaluation protocol details. This absence is load-bearing because the central claim of 'strong identification performance at scale' cannot be assessed or reproduced from the provided information.
  2. [Abstract] Abstract (data and findings paragraph): No description is given of how the 20,680 prompts were collected, whether they are balanced across tasks/contexts, how user identities were verified as distinct, or whether identification was evaluated with cross-context splits versus within-context splits. This directly bears on the stress-test concern that task/context dependence may inflate accuracy beyond stable user biometrics, especially given the paper's own report of the uniqueness-consistency paradox.
  3. [Abstract] Abstract: The uniqueness-consistency paradox is presented as a finding yet the conclusion asserts prompt-based identity as a 'viable behavioral biometric.' The inconsistency across contexts appears to undermine the stability required for biometric use; the manuscript must clarify how this paradox is reconciled with the viability claim, including any quantitative consistency metrics.
minor comments (1)
  1. [Abstract] The abstract states 'Data and code will be released upon the acceptance of our work' but provides no link or repository placeholder for reviewers to inspect during review.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments, which highlight important areas for improving the clarity and completeness of our abstract. We agree that several details are missing from the abstract and will make revisions to address these concerns while preserving the core contributions of the work.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The three key findings are asserted without any reported performance metrics (e.g., accuracy, F1, AUC), baselines, statistical tests, or evaluation protocol details. This absence is load-bearing because the central claim of 'strong identification performance at scale' cannot be assessed or reproduced from the provided information.

    Authors: We agree with the referee that the abstract should include key performance metrics to substantiate the claims. The full manuscript reports detailed results including accuracy, F1, and AUC from our identification experiments, along with baselines and statistical validation. We will revise the abstract to include representative metrics and a brief mention of the evaluation protocol to make the claims assessable. revision: yes

  2. Referee: [Abstract] Abstract (data and findings paragraph): No description is given of how the 20,680 prompts were collected, whether they are balanced across tasks/contexts, how user identities were verified as distinct, or whether identification was evaluated with cross-context splits versus within-context splits. This directly bears on the stress-test concern that task/context dependence may inflate accuracy beyond stable user biometrics, especially given the paper's own report of the uniqueness-consistency paradox.

    Authors: The referee correctly identifies that the abstract omits these important details. The manuscript describes the dataset as real prompts from users, with collection details in the Data section, and evaluations include both within and cross-context analyses to address the paradox. We will expand the abstract to briefly describe the data source, note the use of cross-context splits, and mention user identity verification to mitigate concerns about inflation of performance. revision: yes

  3. Referee: [Abstract] Abstract: The uniqueness-consistency paradox is presented as a finding yet the conclusion asserts prompt-based identity as a 'viable behavioral biometric.' The inconsistency across contexts appears to undermine the stability required for biometric use; the manuscript must clarify how this paradox is reconciled with the viability claim, including any quantitative consistency metrics.

    Authors: We appreciate this point and will clarify the reconciliation in the revised abstract. The paradox highlights that while users are distinctive, consistency varies by context; however, our results show sufficient stability in many scenarios for biometric viability, supported by quantitative metrics on consistency (e.g., intra-user similarity scores across contexts). We will add these metrics and explain that viability holds particularly for applications with contextual controls or when combined with other signals. revision: yes

Circularity Check

0 steps flagged

Empirical study with no circular derivations or load-bearing self-citations

full rationale

This is a purely empirical paper reporting identification results from analysis of 20,680 collected user prompts. No equations, theoretical derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided abstract or description. The central claims rest on direct data analysis outcomes rather than any reduction to inputs by construction, satisfying the criteria for a self-contained empirical study with no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review yields no explicit free parameters, mathematical axioms, or independently evidenced invented entities; the central claims rest on the unelaborated premise that real-user prompts encode stable identity signals.

invented entities (1)
  • prompt-based identity no independent evidence
    purpose: Conceptual framing of user prompts as a learnable behavioral biometric
    Introduced as the core hypothesis of the work.

pith-pipeline@v0.9.1-grok · 5753 in / 1223 out tokens · 38155 ms · 2026-06-28T00:57:40.783691+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 5 linked inside Pith

  1. [1]

    A survey of modern authorship attribution methods,

    E. Stamatatos, “A survey of modern authorship attribution methods,”Journal of the American Society for information Science and Technology, vol. 60, no. 3, pp. 538–556, 2009

  2. [2]

    Authorship attri- bution in the wild,

    M. Koppel, J. Schler, and S. Argamon, “Authorship attri- bution in the wild,”Language Resources and Evaluation, vol. 45, no. 1, pp. 83–94, 2011

  3. [3]

    Authorship at- tribution using probabilistic context-free grammars,

    S. Raghavan, A. Kovashka, and R. Mooney, “Authorship at- tribution using probabilistic context-free grammars,” inPro- ceedings of the ACL 2010 conference short papers, pp. 38– 42, 2010

  4. [4]

    Bertaa: Bert fine-tuning for authorship attribution,

    M. Fabien, E. Villatoro-Tello, P. Motlicek, and S. Parida, “Bertaa: Bert fine-tuning for authorship attribution,” inPro- ceedings of the 17th International Conference on Natural Language Processing (ICON), pp. 127–137, 2020

  5. [5]

    Authorship attribution in the era of llms: Problems,

    B. Huang, C. Chen, and K. Shu, “Authorship attribution in the era of llms: Problems,”Methodologies, and Challenges. doi, vol. 10, 2025

  6. [6]

    Learning universal authorship representations,

    R. A. Rivera-Soto, O. E. Miano, J. Ordonez, B. Y . Chen, A. Khan, M. Bishop, and N. Andrews, “Learning universal authorship representations,” inProceedings of the 2021 Con- ference on Empirical Methods in Natural Language Process- ing, pp. 913–919, 2021

  7. [7]

    Authentication via keystroke dy- namics,

    F. Monrose and A. Rubin, “Authentication via keystroke dy- namics,” inProceedings of the 4th ACM Conference on Com- puter and Communications Security, pp. 48–56, 1997

  8. [8]

    On continuous user au- thentication via typing behavior,

    J. Roth, X. Liu, and D. Metaxas, “On continuous user au- thentication via typing behavior,”IEEE Transactions on Im- age Processing, vol. 23, no. 10, pp. 4611–4624, 2014

  9. [9]

    A new biometric technology based on mouse dynamics,

    A. A. E. Ahmed and I. Traore, “A new biometric technology based on mouse dynamics,”IEEE Transactions on depend- able and secure computing, vol. 4, no. 3, pp. 165–179, 2007

  10. [10]

    Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication,

    M. Frank, R. Biedert, E. Ma, I. Martinovic, and D. Song, “Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication,”IEEE transactions on information forensics and security, vol. 8, no. 1, pp. 136–148, 2012

  11. [11]

    Gait recognition using linear time normalization,

    N. V . Boulgouris, K. N. Plataniotis, and D. Hatzinakos, “Gait recognition using linear time normalization,”Pattern Recog- nition, vol. 39, no. 5, pp. 969–979, 2006

  12. [12]

    Universal and transferable adversar- ial attacks on aligned language models,

    A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson, “Universal and transferable adversar- ial attacks on aligned language models,”arXiv preprint arXiv:2307.15043, 2023

  13. [13]

    Ignore previous prompt: At- tack techniques for language models,

    F. Perez and I. Ribeiro, “Ignore previous prompt: At- tack techniques for language models,”arXiv preprint arXiv:2211.09527, 2022

  14. [14]

    Extracting training data from large language models,

    N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert- V oss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, et al., “Extracting training data from large language models,” in30th USENIX security symposium (USENIX Security 21), pp. 2633–2650, 2021

  15. [15]

    Robust de-anonymization of large sparse datasets,

    A. Narayanan and V . Shmatikov, “Robust de-anonymization of large sparse datasets,” in2008 IEEE Symposium on Secu- rity and Privacy (sp 2008), pp. 111–125, IEEE, 2008

  16. [16]

    How unique is your web browser?,

    P. Eckersley, “How unique is your web browser?,” inInterna- tional Symposium on Privacy Enhancing Technologies Sym- posium, pp. 1–18, Springer, 2010

  17. [17]

    Beyond memorization: Violating privacy via inference with large language models,

    R. Staab, M. Vero, M. Balunovi ´c, and M. Vechev, “Beyond memorization: Violating privacy via inference with large language models,”arXiv preprint arXiv:2310.07298, 2023

  18. [18]

    On the state of the art in authorship attribution and authorship verification,

    J. Tyo, B. Dhingra, and Z. C. Lipton, “On the state of the art in authorship attribution and authorship verification,”arXiv preprint arXiv:2209.06869, 2022

  19. [19]

    Same author or just same topic? towards content-independent style rep- resentations,

    A. Wegmann, M. Schraagen, and D. Nguyen, “Same author or just same topic? towards content-independent style rep- resentations,” inProceedings of the 7th Workshop on Repre- sentation Learning for NLP, pp. 249–268, 2022

  20. [20]

    Wildchat: 1m chatgpt interaction logs in the wild,

    W. Zhao, X. Ren, J. Hessel, C. Cardie, Y . Choi, and Y . Deng, “Wildchat: 1m chatgpt interaction logs in the wild,”arXiv preprint arXiv:2405.01470, 2024

  21. [21]

    A statistical interpretation of term speci- ficity and its application in retrieval,

    K. Sparck Jones, “A statistical interpretation of term speci- ficity and its application in retrieval,”Journal of documenta- tion, vol. 28, no. 1, pp. 11–21, 1972

  22. [22]

    Text embeddings by weakly-supervised contrastive pre-training,

    L. Wang, N. Yang, X. Huang, B. Jiao, L. Yang, D. Jiang, R. Majumder, and F. Wei, “Text embeddings by weakly-supervised contrastive pre-training,”arXiv preprint arXiv:2212.03533, 2022

  23. [23]

    S. Bird, E. Klein, and E. Loper,Natural language process- ing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”, 2009

  24. [24]

    Catastrophic interference in connectionist networks: The sequential learning problem,

    M. McCloskey and N. J. Cohen, “Catastrophic interference in connectionist networks: The sequential learning problem,” inPsychology of learning and motivation, vol. 24, pp. 109– 165, Elsevier, 1989

  25. [25]

    Decoupled weight decay regu- larization,

    I. Loshchilov and F. Hutter, “Decoupled weight decay regu- larization,”arXiv preprint arXiv:1711.05101, 2017

  26. [26]

    Liblinear: A library for large linear classification,

    R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.- J. Lin, “Liblinear: A library for large linear classification,” the Journal of machine Learning research, vol. 9, pp. 1871– 1874, 2008

  27. [27]

    Fellbaum,WordNet: An electronic lexical database

    C. Fellbaum,WordNet: An electronic lexical database. MIT press, 1998

  28. [28]

    Pegasus: Pre- training with extracted gap-sentences for abstractive summa- rization,

    J. Zhang, Y . Zhao, M. Saleh, and P. Liu, “Pegasus: Pre- training with extracted gap-sentences for abstractive summa- rization,” inInternational conference on machine learning, pp. 11328–11339, PMLR, 2020

  29. [29]

    Mining the blogosphere: Age, gender and the varieties of self-expression,

    S. Argamon, M. Koppel, J. W. Pennebaker, and J. Schler, “Mining the blogosphere: Age, gender and the varieties of self-expression,”First Monday, 2007

  30. [30]

    The effect of author set size and data size in authorship attribution,

    K. Luyckx and W. Daelemans, “The effect of author set size and data size in authorship attribution,”Literary and linguis- tic Computing, vol. 26, no. 1, pp. 35–55, 2011

  31. [31]

    Reduce & at- tribute: Two-step authorship attribution for large-scale prob- lems,

    M. Tschuggnall, B. Murauer, and G. Specht, “Reduce & at- tribute: Two-step authorship attribution for large-scale prob- lems,” inProceedings of the 23rd Conference on Computa- tional Natural Language Learning (CoNLL), pp. 951–960, 2019

  32. [32]

    What else does your biometric data reveal? a survey on soft biometrics,

    A. Dantcheva, P. Elia, and A. Ross, “What else does your biometric data reveal? a survey on soft biometrics,”IEEE Transactions on Information Forensics and Security, vol. 11, no. 3, pp. 441–467, 2015

  33. [33]

    Inertial sensor-based gait recog- nition: A review,

    S. Sprager and M. B. Juric, “Inertial sensor-based gait recog- nition: A review,”Sensors, vol. 15, no. 9, pp. 22089–22127, 2015

  34. [34]

    Active authentication on mobile devices via stylometry, application usage, web browsing, and gps location,

    L. Fridman, S. Weber, R. Greenstadt, and M. Kam, “Active authentication on mobile devices via stylometry, application usage, web browsing, and gps location,”IEEE Systems Jour- nal, vol. 11, no. 2, pp. 513–521, 2016

  35. [35]

    Isolating authorship from content with seman- tic embeddings and contrastive learning,

    J. Huertas-Tato, A. Gir ´on-Jim´enez, A. Mart ´ın, and D. Ca- macho, “Isolating authorship from content with seman- tic embeddings and contrastive learning,”arXiv preprint arXiv:2411.18472, 2024