PromptPrint: Behavioral Biometrics Through Natural Language Prompting in LLMs

Kartik Narayan; Shaiv Patel; Vishal Patel

arxiv: 2606.06755 · v1 · pith:USIYYDTInew · submitted 2026-06-04 · 💻 cs.CL · cs.ET

PromptPrint: Behavioral Biometrics Through Natural Language Prompting in LLMs

Shaiv Patel , Kartik Narayan , Vishal Patel This is my paper

Pith reviewed 2026-06-28 00:57 UTC · model grok-4.3

classification 💻 cs.CL cs.ET

keywords behavioral biometricsprompt-based identityauthorship attributionLLM user modelinglexical analysisstylometric features

0 comments

The pith

Short prompts to LLMs carry stable lexical signals that identify individual users.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether brief, task-driven prompts to large language models contain enough consistent author-specific patterns to serve as a behavioral biometric. Using data from over a thousand users, it finds that surface word choices outperform deeper meaning representations for identification. It also identifies a trade-off where users' styles are unique across people but vary within the same person depending on the prompt context, and shows that the signal holds against small changes but not against full rephrasing. If correct, this opens a new way to model users in LLM systems with consequences for security and privacy.

Core claim

Prompt-based identity, defined as the hypothesis that a user's habitual vocabulary, syntax, and discourse patterns in short LLM prompts form a learnable behavioral biometric, is established through strong identification performance on a dataset of 20,680 real prompts from 1,034 users, with lexical features proving most effective.

What carries the argument

Lexical stability hypothesis, the claim that identity is primarily encoded in surface-level word choice rather than abstract intent, which is supported by lexical representations outperforming semantic encoders.

If this is right

Lexical representations significantly outperform semantic encoders for user identification.
Stylometric features show users are highly distinctive across the population yet inconsistent across contexts.
Identity signals are robust to minor lexical perturbations but degrade under semantic paraphrasing.
Prompt-based identity enables strong identification performance at scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could enable new forms of user authentication or tracking in LLM platforms based on prompt history.
Privacy concerns arise if prompt data can be used to link anonymous interactions to individuals.
Future work might explore whether task-specific prompts reduce the biometric signal compared to free-form ones.

Load-bearing premise

The collected prompts represent natural, habitual user behavior from distinct individuals whose lexical patterns remain stable enough to serve as an identifiable biometric signal independent of task context or data collection artifacts.

What would settle it

Demonstrating that identification accuracy drops to chance levels when the same users provide prompts in new contexts or after paraphrasing would falsify the viability of prompt-based identity as a biometric.

Figures

Figures reproduced from arXiv: 2606.06755 by Kartik Narayan, Shaiv Patel, Vishal Patel.

**Figure 1.** Figure 1: LLM prompts as a soft biometric for authentication overview. A user’s prompt behavior can capture how they think, ask, and [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Samples prompts from the WildChat-1M dataset. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Left: Uniqueness-consistency trade-off. Each point represents one method (mean ± std, 5 folds). Stylo-NN (red) isolates the paradox: highest d ′ yet lowest Top-1. TF-IDF+LR (orange) achieves optimal verification (d ′ =0.671); Ensemble (blue) achieves optimal identification (Top-1=64.2%). Middle: Top-1/3/5 accuracy vs. ensemble weight α. Plateau between α=0.5–0.9 confirms stability; standardized at α=0.7. … view at source ↗

**Figure 4.** Figure 4: Adversarial robustness modeling. Left: Top-1 accuracy vs. token substitution rates. Right: Impact of complete threat vectors against the unattacked Ensemble baseline (0.642). Full paraphrase executes the most severe degradation (Top-1=0.429, EER=0.703), proving the fingerprint is bound to the lexical surface rather than semantic intent. semantic mapping, though TF-IDF retains partial resilience via adjace… view at source ↗

read the original abstract

Authorship attribution research has traditionally focused on long-form, expressive texts; however, interactions with large language models (LLMs) are typically brief and task-driven prompts. This raises a fundamental question: do such prompts contain a stable, author-identifiable, and distinctive signal? We introduce PromptPrint, a systematic study of prompt-based identity, the hypothesis that a user's habitual vocabulary, syntax, and discourse patterns form a learnable behavioral biometric. Using 20,680 real prompts from 1,034 users, we establish three key findings. First, lexical representations significantly outperform semantic encoders, supporting the "lexical stability hypothesis": identity is primarily encoded in surface-level word choice rather than abstract intent. Second, stylometric features exhibit a "uniqueness-consistency paradox": users are highly distinctive across the population, yet behaviorally inconsistent across contexts. Third, adversarial analysis reveals a clear vulnerability spectrum: identity signals are robust to minor lexical perturbations but degrade substantially under semantic paraphrasing. Overall, our results demonstrate strong identification performance at scale, establishing prompt-based identity as a viable behavioral biometric. This work introduces a new perspective on user modeling in LLM interactions, with important implications for security and privacy. Data and code will be released upon the acceptance of our work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Lexical signals in LLM prompts enable identification but the reported inconsistency questions its use as a stable biometric.

read the letter

The paper applies authorship attribution methods to short LLM prompts and reports three concrete findings from 20k real prompts across 1k users: lexical features beat semantic encoders, a uniqueness-consistency paradox appears, and identity signals show a clear vulnerability spectrum under different attacks. That is the actual new material—extending stylometry to this constrained, task-driven domain and documenting those specific contrasts.

The work is straightforward in laying out the findings and in flagging the paradox itself, which is an honest note on a potential limit. The dataset size is respectable for an empirical study in this area.

The soft spot is the stability premise required for the biometric interpretation. The paradox they report—high distinctiveness but low consistency across contexts—directly suggests that performance may track task vocabulary or collection patterns rather than fixed user habits. Without details on whether evaluation used cross-context splits or balanced tasks, it is difficult to separate those effects. The abstract also states strong identification performance without supplying metrics, baselines, or tests, so the strength of the evidence stays hard to judge from what is shown.

This is for researchers working on user modeling, privacy, or behavioral signals in LLM platforms. A reader looking for new empirical angles on prompt data would find value in the reported contrasts and the paradox. It deserves peer review because the idea is fresh and the dataset is real, even if the methods and numbers need to be examined to assess how far the claims hold.

Referee Report

3 major / 1 minor

Summary. The paper introduces PromptPrint as a study of prompt-based identity, hypothesizing that users' brief, task-driven LLM prompts encode stable, author-identifiable lexical and stylometric signals usable as behavioral biometrics. It reports results from 20,680 real prompts by 1,034 users supporting three findings: lexical representations outperform semantic encoders (lexical stability hypothesis); stylometric features show a uniqueness-consistency paradox (high distinctiveness but low cross-context consistency); and identity signals are robust to minor perturbations but vulnerable to semantic paraphrasing. The work concludes that these results establish strong identification performance at scale and prompt-based identity as a viable biometric, with implications for security and privacy.

Significance. If the empirical claims are supported by properly documented cross-context evaluation and statistical validation, the work would be significant as the first large-scale demonstration of behavioral biometrics in short, task-driven LLM prompts, extending authorship attribution beyond long-form text and highlighting both opportunities and risks for user modeling in LLM platforms. The planned release of data and code is a positive contribution to reproducibility.

major comments (3)

[Abstract] Abstract: The three key findings are asserted without any reported performance metrics (e.g., accuracy, F1, AUC), baselines, statistical tests, or evaluation protocol details. This absence is load-bearing because the central claim of 'strong identification performance at scale' cannot be assessed or reproduced from the provided information.
[Abstract] Abstract (data and findings paragraph): No description is given of how the 20,680 prompts were collected, whether they are balanced across tasks/contexts, how user identities were verified as distinct, or whether identification was evaluated with cross-context splits versus within-context splits. This directly bears on the stress-test concern that task/context dependence may inflate accuracy beyond stable user biometrics, especially given the paper's own report of the uniqueness-consistency paradox.
[Abstract] Abstract: The uniqueness-consistency paradox is presented as a finding yet the conclusion asserts prompt-based identity as a 'viable behavioral biometric.' The inconsistency across contexts appears to undermine the stability required for biometric use; the manuscript must clarify how this paradox is reconciled with the viability claim, including any quantitative consistency metrics.

minor comments (1)

[Abstract] The abstract states 'Data and code will be released upon the acceptance of our work' but provides no link or repository placeholder for reviewers to inspect during review.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments, which highlight important areas for improving the clarity and completeness of our abstract. We agree that several details are missing from the abstract and will make revisions to address these concerns while preserving the core contributions of the work.

read point-by-point responses

Referee: [Abstract] Abstract: The three key findings are asserted without any reported performance metrics (e.g., accuracy, F1, AUC), baselines, statistical tests, or evaluation protocol details. This absence is load-bearing because the central claim of 'strong identification performance at scale' cannot be assessed or reproduced from the provided information.

Authors: We agree with the referee that the abstract should include key performance metrics to substantiate the claims. The full manuscript reports detailed results including accuracy, F1, and AUC from our identification experiments, along with baselines and statistical validation. We will revise the abstract to include representative metrics and a brief mention of the evaluation protocol to make the claims assessable. revision: yes
Referee: [Abstract] Abstract (data and findings paragraph): No description is given of how the 20,680 prompts were collected, whether they are balanced across tasks/contexts, how user identities were verified as distinct, or whether identification was evaluated with cross-context splits versus within-context splits. This directly bears on the stress-test concern that task/context dependence may inflate accuracy beyond stable user biometrics, especially given the paper's own report of the uniqueness-consistency paradox.

Authors: The referee correctly identifies that the abstract omits these important details. The manuscript describes the dataset as real prompts from users, with collection details in the Data section, and evaluations include both within and cross-context analyses to address the paradox. We will expand the abstract to briefly describe the data source, note the use of cross-context splits, and mention user identity verification to mitigate concerns about inflation of performance. revision: yes
Referee: [Abstract] Abstract: The uniqueness-consistency paradox is presented as a finding yet the conclusion asserts prompt-based identity as a 'viable behavioral biometric.' The inconsistency across contexts appears to undermine the stability required for biometric use; the manuscript must clarify how this paradox is reconciled with the viability claim, including any quantitative consistency metrics.

Authors: We appreciate this point and will clarify the reconciliation in the revised abstract. The paradox highlights that while users are distinctive, consistency varies by context; however, our results show sufficient stability in many scenarios for biometric viability, supported by quantitative metrics on consistency (e.g., intra-user similarity scores across contexts). We will add these metrics and explain that viability holds particularly for applications with contextual controls or when combined with other signals. revision: yes

Circularity Check

0 steps flagged

Empirical study with no circular derivations or load-bearing self-citations

full rationale

This is a purely empirical paper reporting identification results from analysis of 20,680 collected user prompts. No equations, theoretical derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided abstract or description. The central claims rest on direct data analysis outcomes rather than any reduction to inputs by construction, satisfying the criteria for a self-contained empirical study with no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review yields no explicit free parameters, mathematical axioms, or independently evidenced invented entities; the central claims rest on the unelaborated premise that real-user prompts encode stable identity signals.

invented entities (1)

prompt-based identity no independent evidence
purpose: Conceptual framing of user prompts as a learnable behavioral biometric
Introduced as the core hypothesis of the work.

pith-pipeline@v0.9.1-grok · 5753 in / 1223 out tokens · 38155 ms · 2026-06-28T00:57:40.783691+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 5 linked inside Pith

[1]

A survey of modern authorship attribution methods,

E. Stamatatos, “A survey of modern authorship attribution methods,”Journal of the American Society for information Science and Technology, vol. 60, no. 3, pp. 538–556, 2009

2009
[2]

Authorship attri- bution in the wild,

M. Koppel, J. Schler, and S. Argamon, “Authorship attri- bution in the wild,”Language Resources and Evaluation, vol. 45, no. 1, pp. 83–94, 2011

2011
[3]

Authorship at- tribution using probabilistic context-free grammars,

S. Raghavan, A. Kovashka, and R. Mooney, “Authorship at- tribution using probabilistic context-free grammars,” inPro- ceedings of the ACL 2010 conference short papers, pp. 38– 42, 2010

2010
[4]

Bertaa: Bert fine-tuning for authorship attribution,

M. Fabien, E. Villatoro-Tello, P. Motlicek, and S. Parida, “Bertaa: Bert fine-tuning for authorship attribution,” inPro- ceedings of the 17th International Conference on Natural Language Processing (ICON), pp. 127–137, 2020

2020
[5]

Authorship attribution in the era of llms: Problems,

B. Huang, C. Chen, and K. Shu, “Authorship attribution in the era of llms: Problems,”Methodologies, and Challenges. doi, vol. 10, 2025

2025
[6]

Learning universal authorship representations,

R. A. Rivera-Soto, O. E. Miano, J. Ordonez, B. Y . Chen, A. Khan, M. Bishop, and N. Andrews, “Learning universal authorship representations,” inProceedings of the 2021 Con- ference on Empirical Methods in Natural Language Process- ing, pp. 913–919, 2021

2021
[7]

Authentication via keystroke dy- namics,

F. Monrose and A. Rubin, “Authentication via keystroke dy- namics,” inProceedings of the 4th ACM Conference on Com- puter and Communications Security, pp. 48–56, 1997

1997
[8]

On continuous user au- thentication via typing behavior,

J. Roth, X. Liu, and D. Metaxas, “On continuous user au- thentication via typing behavior,”IEEE Transactions on Im- age Processing, vol. 23, no. 10, pp. 4611–4624, 2014

2014
[9]

A new biometric technology based on mouse dynamics,

A. A. E. Ahmed and I. Traore, “A new biometric technology based on mouse dynamics,”IEEE Transactions on depend- able and secure computing, vol. 4, no. 3, pp. 165–179, 2007

2007
[10]

Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication,

M. Frank, R. Biedert, E. Ma, I. Martinovic, and D. Song, “Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication,”IEEE transactions on information forensics and security, vol. 8, no. 1, pp. 136–148, 2012

2012
[11]

Gait recognition using linear time normalization,

N. V . Boulgouris, K. N. Plataniotis, and D. Hatzinakos, “Gait recognition using linear time normalization,”Pattern Recog- nition, vol. 39, no. 5, pp. 969–979, 2006

2006
[12]

Universal and transferable adversar- ial attacks on aligned language models,

A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson, “Universal and transferable adversar- ial attacks on aligned language models,”arXiv preprint arXiv:2307.15043, 2023

Pith/arXiv arXiv 2023
[13]

Ignore previous prompt: At- tack techniques for language models,

F. Perez and I. Ribeiro, “Ignore previous prompt: At- tack techniques for language models,”arXiv preprint arXiv:2211.09527, 2022

Pith/arXiv arXiv 2022
[14]

Extracting training data from large language models,

N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert- V oss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, et al., “Extracting training data from large language models,” in30th USENIX security symposium (USENIX Security 21), pp. 2633–2650, 2021

2021
[15]

Robust de-anonymization of large sparse datasets,

A. Narayanan and V . Shmatikov, “Robust de-anonymization of large sparse datasets,” in2008 IEEE Symposium on Secu- rity and Privacy (sp 2008), pp. 111–125, IEEE, 2008

2008
[16]

How unique is your web browser?,

P. Eckersley, “How unique is your web browser?,” inInterna- tional Symposium on Privacy Enhancing Technologies Sym- posium, pp. 1–18, Springer, 2010

2010
[17]

Beyond memorization: Violating privacy via inference with large language models,

R. Staab, M. Vero, M. Balunovi ´c, and M. Vechev, “Beyond memorization: Violating privacy via inference with large language models,”arXiv preprint arXiv:2310.07298, 2023

arXiv 2023
[18]

On the state of the art in authorship attribution and authorship verification,

J. Tyo, B. Dhingra, and Z. C. Lipton, “On the state of the art in authorship attribution and authorship verification,”arXiv preprint arXiv:2209.06869, 2022

arXiv 2022
[19]

Same author or just same topic? towards content-independent style rep- resentations,

A. Wegmann, M. Schraagen, and D. Nguyen, “Same author or just same topic? towards content-independent style rep- resentations,” inProceedings of the 7th Workshop on Repre- sentation Learning for NLP, pp. 249–268, 2022

2022
[20]

Wildchat: 1m chatgpt interaction logs in the wild,

W. Zhao, X. Ren, J. Hessel, C. Cardie, Y . Choi, and Y . Deng, “Wildchat: 1m chatgpt interaction logs in the wild,”arXiv preprint arXiv:2405.01470, 2024

Pith/arXiv arXiv 2024
[21]

A statistical interpretation of term speci- ficity and its application in retrieval,

K. Sparck Jones, “A statistical interpretation of term speci- ficity and its application in retrieval,”Journal of documenta- tion, vol. 28, no. 1, pp. 11–21, 1972

1972
[22]

Text embeddings by weakly-supervised contrastive pre-training,

L. Wang, N. Yang, X. Huang, B. Jiao, L. Yang, D. Jiang, R. Majumder, and F. Wei, “Text embeddings by weakly-supervised contrastive pre-training,”arXiv preprint arXiv:2212.03533, 2022

Pith/arXiv arXiv 2022
[23]

S. Bird, E. Klein, and E. Loper,Natural language process- ing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”, 2009

2009
[24]

Catastrophic interference in connectionist networks: The sequential learning problem,

M. McCloskey and N. J. Cohen, “Catastrophic interference in connectionist networks: The sequential learning problem,” inPsychology of learning and motivation, vol. 24, pp. 109– 165, Elsevier, 1989

1989
[25]

Decoupled weight decay regu- larization,

I. Loshchilov and F. Hutter, “Decoupled weight decay regu- larization,”arXiv preprint arXiv:1711.05101, 2017

Pith/arXiv arXiv 2017
[26]

Liblinear: A library for large linear classification,

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.- J. Lin, “Liblinear: A library for large linear classification,” the Journal of machine Learning research, vol. 9, pp. 1871– 1874, 2008

2008
[27]

Fellbaum,WordNet: An electronic lexical database

C. Fellbaum,WordNet: An electronic lexical database. MIT press, 1998

1998
[28]

Pegasus: Pre- training with extracted gap-sentences for abstractive summa- rization,

J. Zhang, Y . Zhao, M. Saleh, and P. Liu, “Pegasus: Pre- training with extracted gap-sentences for abstractive summa- rization,” inInternational conference on machine learning, pp. 11328–11339, PMLR, 2020

2020
[29]

Mining the blogosphere: Age, gender and the varieties of self-expression,

S. Argamon, M. Koppel, J. W. Pennebaker, and J. Schler, “Mining the blogosphere: Age, gender and the varieties of self-expression,”First Monday, 2007

2007
[30]

The effect of author set size and data size in authorship attribution,

K. Luyckx and W. Daelemans, “The effect of author set size and data size in authorship attribution,”Literary and linguis- tic Computing, vol. 26, no. 1, pp. 35–55, 2011

2011
[31]

Reduce & at- tribute: Two-step authorship attribution for large-scale prob- lems,

M. Tschuggnall, B. Murauer, and G. Specht, “Reduce & at- tribute: Two-step authorship attribution for large-scale prob- lems,” inProceedings of the 23rd Conference on Computa- tional Natural Language Learning (CoNLL), pp. 951–960, 2019

2019
[32]

What else does your biometric data reveal? a survey on soft biometrics,

A. Dantcheva, P. Elia, and A. Ross, “What else does your biometric data reveal? a survey on soft biometrics,”IEEE Transactions on Information Forensics and Security, vol. 11, no. 3, pp. 441–467, 2015

2015
[33]

Inertial sensor-based gait recog- nition: A review,

S. Sprager and M. B. Juric, “Inertial sensor-based gait recog- nition: A review,”Sensors, vol. 15, no. 9, pp. 22089–22127, 2015

2015
[34]

Active authentication on mobile devices via stylometry, application usage, web browsing, and gps location,

L. Fridman, S. Weber, R. Greenstadt, and M. Kam, “Active authentication on mobile devices via stylometry, application usage, web browsing, and gps location,”IEEE Systems Jour- nal, vol. 11, no. 2, pp. 513–521, 2016

2016
[35]

Isolating authorship from content with seman- tic embeddings and contrastive learning,

J. Huertas-Tato, A. Gir ´on-Jim´enez, A. Mart ´ın, and D. Ca- macho, “Isolating authorship from content with seman- tic embeddings and contrastive learning,”arXiv preprint arXiv:2411.18472, 2024

arXiv 2024

[1] [1]

A survey of modern authorship attribution methods,

E. Stamatatos, “A survey of modern authorship attribution methods,”Journal of the American Society for information Science and Technology, vol. 60, no. 3, pp. 538–556, 2009

2009

[2] [2]

Authorship attri- bution in the wild,

M. Koppel, J. Schler, and S. Argamon, “Authorship attri- bution in the wild,”Language Resources and Evaluation, vol. 45, no. 1, pp. 83–94, 2011

2011

[3] [3]

Authorship at- tribution using probabilistic context-free grammars,

S. Raghavan, A. Kovashka, and R. Mooney, “Authorship at- tribution using probabilistic context-free grammars,” inPro- ceedings of the ACL 2010 conference short papers, pp. 38– 42, 2010

2010

[4] [4]

Bertaa: Bert fine-tuning for authorship attribution,

M. Fabien, E. Villatoro-Tello, P. Motlicek, and S. Parida, “Bertaa: Bert fine-tuning for authorship attribution,” inPro- ceedings of the 17th International Conference on Natural Language Processing (ICON), pp. 127–137, 2020

2020

[5] [5]

Authorship attribution in the era of llms: Problems,

B. Huang, C. Chen, and K. Shu, “Authorship attribution in the era of llms: Problems,”Methodologies, and Challenges. doi, vol. 10, 2025

2025

[6] [6]

Learning universal authorship representations,

R. A. Rivera-Soto, O. E. Miano, J. Ordonez, B. Y . Chen, A. Khan, M. Bishop, and N. Andrews, “Learning universal authorship representations,” inProceedings of the 2021 Con- ference on Empirical Methods in Natural Language Process- ing, pp. 913–919, 2021

2021

[7] [7]

Authentication via keystroke dy- namics,

F. Monrose and A. Rubin, “Authentication via keystroke dy- namics,” inProceedings of the 4th ACM Conference on Com- puter and Communications Security, pp. 48–56, 1997

1997

[8] [8]

On continuous user au- thentication via typing behavior,

J. Roth, X. Liu, and D. Metaxas, “On continuous user au- thentication via typing behavior,”IEEE Transactions on Im- age Processing, vol. 23, no. 10, pp. 4611–4624, 2014

2014

[9] [9]

A new biometric technology based on mouse dynamics,

A. A. E. Ahmed and I. Traore, “A new biometric technology based on mouse dynamics,”IEEE Transactions on depend- able and secure computing, vol. 4, no. 3, pp. 165–179, 2007

2007

[10] [10]

Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication,

M. Frank, R. Biedert, E. Ma, I. Martinovic, and D. Song, “Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication,”IEEE transactions on information forensics and security, vol. 8, no. 1, pp. 136–148, 2012

2012

[11] [11]

Gait recognition using linear time normalization,

N. V . Boulgouris, K. N. Plataniotis, and D. Hatzinakos, “Gait recognition using linear time normalization,”Pattern Recog- nition, vol. 39, no. 5, pp. 969–979, 2006

2006

[12] [12]

Universal and transferable adversar- ial attacks on aligned language models,

A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson, “Universal and transferable adversar- ial attacks on aligned language models,”arXiv preprint arXiv:2307.15043, 2023

Pith/arXiv arXiv 2023

[13] [13]

Ignore previous prompt: At- tack techniques for language models,

F. Perez and I. Ribeiro, “Ignore previous prompt: At- tack techniques for language models,”arXiv preprint arXiv:2211.09527, 2022

Pith/arXiv arXiv 2022

[14] [14]

Extracting training data from large language models,

N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert- V oss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, et al., “Extracting training data from large language models,” in30th USENIX security symposium (USENIX Security 21), pp. 2633–2650, 2021

2021

[15] [15]

Robust de-anonymization of large sparse datasets,

A. Narayanan and V . Shmatikov, “Robust de-anonymization of large sparse datasets,” in2008 IEEE Symposium on Secu- rity and Privacy (sp 2008), pp. 111–125, IEEE, 2008

2008

[16] [16]

How unique is your web browser?,

P. Eckersley, “How unique is your web browser?,” inInterna- tional Symposium on Privacy Enhancing Technologies Sym- posium, pp. 1–18, Springer, 2010

2010

[17] [17]

Beyond memorization: Violating privacy via inference with large language models,

R. Staab, M. Vero, M. Balunovi ´c, and M. Vechev, “Beyond memorization: Violating privacy via inference with large language models,”arXiv preprint arXiv:2310.07298, 2023

arXiv 2023

[18] [18]

On the state of the art in authorship attribution and authorship verification,

J. Tyo, B. Dhingra, and Z. C. Lipton, “On the state of the art in authorship attribution and authorship verification,”arXiv preprint arXiv:2209.06869, 2022

arXiv 2022

[19] [19]

Same author or just same topic? towards content-independent style rep- resentations,

A. Wegmann, M. Schraagen, and D. Nguyen, “Same author or just same topic? towards content-independent style rep- resentations,” inProceedings of the 7th Workshop on Repre- sentation Learning for NLP, pp. 249–268, 2022

2022

[20] [20]

Wildchat: 1m chatgpt interaction logs in the wild,

W. Zhao, X. Ren, J. Hessel, C. Cardie, Y . Choi, and Y . Deng, “Wildchat: 1m chatgpt interaction logs in the wild,”arXiv preprint arXiv:2405.01470, 2024

Pith/arXiv arXiv 2024

[21] [21]

A statistical interpretation of term speci- ficity and its application in retrieval,

K. Sparck Jones, “A statistical interpretation of term speci- ficity and its application in retrieval,”Journal of documenta- tion, vol. 28, no. 1, pp. 11–21, 1972

1972

[22] [22]

Text embeddings by weakly-supervised contrastive pre-training,

L. Wang, N. Yang, X. Huang, B. Jiao, L. Yang, D. Jiang, R. Majumder, and F. Wei, “Text embeddings by weakly-supervised contrastive pre-training,”arXiv preprint arXiv:2212.03533, 2022

Pith/arXiv arXiv 2022

[23] [23]

S. Bird, E. Klein, and E. Loper,Natural language process- ing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”, 2009

2009

[24] [24]

Catastrophic interference in connectionist networks: The sequential learning problem,

M. McCloskey and N. J. Cohen, “Catastrophic interference in connectionist networks: The sequential learning problem,” inPsychology of learning and motivation, vol. 24, pp. 109– 165, Elsevier, 1989

1989

[25] [25]

Decoupled weight decay regu- larization,

I. Loshchilov and F. Hutter, “Decoupled weight decay regu- larization,”arXiv preprint arXiv:1711.05101, 2017

Pith/arXiv arXiv 2017

[26] [26]

Liblinear: A library for large linear classification,

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.- J. Lin, “Liblinear: A library for large linear classification,” the Journal of machine Learning research, vol. 9, pp. 1871– 1874, 2008

2008

[27] [27]

Fellbaum,WordNet: An electronic lexical database

C. Fellbaum,WordNet: An electronic lexical database. MIT press, 1998

1998

[28] [28]

Pegasus: Pre- training with extracted gap-sentences for abstractive summa- rization,

J. Zhang, Y . Zhao, M. Saleh, and P. Liu, “Pegasus: Pre- training with extracted gap-sentences for abstractive summa- rization,” inInternational conference on machine learning, pp. 11328–11339, PMLR, 2020

2020

[29] [29]

Mining the blogosphere: Age, gender and the varieties of self-expression,

S. Argamon, M. Koppel, J. W. Pennebaker, and J. Schler, “Mining the blogosphere: Age, gender and the varieties of self-expression,”First Monday, 2007

2007

[30] [30]

The effect of author set size and data size in authorship attribution,

K. Luyckx and W. Daelemans, “The effect of author set size and data size in authorship attribution,”Literary and linguis- tic Computing, vol. 26, no. 1, pp. 35–55, 2011

2011

[31] [31]

Reduce & at- tribute: Two-step authorship attribution for large-scale prob- lems,

M. Tschuggnall, B. Murauer, and G. Specht, “Reduce & at- tribute: Two-step authorship attribution for large-scale prob- lems,” inProceedings of the 23rd Conference on Computa- tional Natural Language Learning (CoNLL), pp. 951–960, 2019

2019

[32] [32]

What else does your biometric data reveal? a survey on soft biometrics,

A. Dantcheva, P. Elia, and A. Ross, “What else does your biometric data reveal? a survey on soft biometrics,”IEEE Transactions on Information Forensics and Security, vol. 11, no. 3, pp. 441–467, 2015

2015

[33] [33]

Inertial sensor-based gait recog- nition: A review,

S. Sprager and M. B. Juric, “Inertial sensor-based gait recog- nition: A review,”Sensors, vol. 15, no. 9, pp. 22089–22127, 2015

2015

[34] [34]

Active authentication on mobile devices via stylometry, application usage, web browsing, and gps location,

L. Fridman, S. Weber, R. Greenstadt, and M. Kam, “Active authentication on mobile devices via stylometry, application usage, web browsing, and gps location,”IEEE Systems Jour- nal, vol. 11, no. 2, pp. 513–521, 2016

2016

[35] [35]

Isolating authorship from content with seman- tic embeddings and contrastive learning,

J. Huertas-Tato, A. Gir ´on-Jim´enez, A. Mart ´ın, and D. Ca- macho, “Isolating authorship from content with seman- tic embeddings and contrastive learning,”arXiv preprint arXiv:2411.18472, 2024

arXiv 2024