pith. sign in

arxiv: 2403.08462 · v3 · submitted 2024-03-13 · 💻 cs.CL · cs.LG

Grammar as a Behavioral Biometric: Using Cognitively Motivated Grammar Models for Authorship Verification

Pith reviewed 2026-05-24 02:56 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords authorship verificationcognitive linguisticsgrammar modelsbehavioral biometricslikelihood ratiotext forensicsdigital forensicsexplainable methods
0
0 comments X

The pith

A cognitively motivated grammar model verifies authorship more accurately than neural networks by computing a likelihood ratio called LambdaG.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that modeling an author's grammar according to Cognitive Linguistics principles and computing LambdaG—the ratio of how likely a text is under that author's grammar versus a reference population's grammar—outperforms seven baseline methods, including neural network approaches, on twelve datasets. This method treats grammar as a behavioral biometric unique to individuals. A sympathetic reader would care because it supplies a simpler, more interpretable alternative to complex black-box systems for determining whether two texts share an author in digital forensics contexts. The approach also proves robust to minor changes in the reference population and yields visualizations that support explainability.

Core claim

LambdaG is defined as the ratio of the likelihood of a document given the candidate author's grammar model to the likelihood given a reference population's grammar model. When the grammar models follow Cognitive Linguistics principles, this ratio delivers superior authorship verification performance across twelve datasets relative to seven baselines that include neural network-based methods. The paper states that the performance advantage arises because the method aligns with theories predicting that a person's grammar functions as a behavioral biometric.

What carries the argument

LambdaG, the ratio of likelihoods of a document under a candidate author's grammar model versus a reference population grammar model; it quantifies how distinctively the text fits the candidate's grammar.

If this is right

  • Authorship verification in digital text forensics can rely on grammar models rather than high-complexity neural methods.
  • The method remains effective even when the reference population varies slightly in composition.
  • Interpretability improves because the grammar models support visualizations of verification decisions.
  • The technique rests on compatibility with Cognitive Linguistics predictions that grammar acts as a behavioral biometric.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Likelihood-ratio methods grounded in cognitive models of language could extend to verifying other stable individual traits in text beyond grammar.
  • The approach might be tested for robustness on very short documents or in languages with different grammatical structures.
  • Hybrid systems could combine LambdaG with non-grammar features while preserving the cognitive grounding.
  • The same modeling strategy might apply to related forensic tasks such as detecting text generated by language models.
  • keywords

Load-bearing premise

That cognitively motivated grammar models can be built to capture stable individual differences in authorship and that the resulting likelihood ratios validly indicate whether two texts share an author.

What would settle it

A new dataset or reference population composition where LambdaG fails to match or exceed the performance of the seven baselines, or where small reference-group changes cause large drops in accuracy.

Figures

Figures reproduced from arXiv: 2403.08462 by Andrea Nini, Lukas Graner, Oren Halvani, Shunichi Ishihara, Sophie Titze, Valerio Gherardi.

Figure 1
Figure 1. Figure 1: Schematic overview of our proposed AV method [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: 95% Confidence Intervals for Accuracy. uncalibrated. This means that although higher values of λG do correctly correspond to Y-cases, the scale of variation does not reflect the expectations of a perfectly calibrated system, where λG = 0 means an inconclusive result, a positive value suggests a Y-case, and a negative value suggests an N-case. When λG is turned into ΛG by fitting a logistic regression on tr… view at source ↗
Figure 3
Figure 3. Figure 3: Variation in Accuracy depending on the number of repetitions, [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The loss in Accuracy (top) and Cllr (bottom) results for cross-corpus comparison, i. e., evaluating on Base Corpus while using reference texts Dref from Reference Corpus. Diagonal bold values denote the original Accuracy and Cllr, respectively. Darker shades denote a greater loss in performance. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The POSNoise algorithm. Details on the notation can be found in [53]. Authorship Verification constitutes a similarity detection problem in which the subject of the similarity determination is the language of the author rather than other document aspects such as the topic [53]. However, a large number of existing AV methods including [1, 23, 25, 29, 83, 84, 86, 95, 102] use features that are directly influ… view at source ↗
read the original abstract

Authorship Verification (AV) is a key area of research in digital text forensics, which addresses the fundamental question of whether two texts were written by the same person. Numerous computational approaches have been proposed over the last two decades in an attempt to address this challenge. However, existing AV methods often suffer from high complexity, low explainability and especially from a lack of clear scientific justification. We propose a simpler method based on modeling the grammar of an author following Cognitive Linguistics principles. These models are used to calculate $\lambda_G$ (LambdaG): the ratio of the likelihoods of a document given the candidate's grammar versus given a reference population's grammar. Our empirical evaluation, conducted on twelve datasets and compared against seven baseline methods, demonstrates that LambdaG achieves superior performance, including against several neural network-based AV methods. LambdaG is also robust to small variations in the composition of the reference population and provides interpretable visualizations, enhancing its explainability. We argue that its effectiveness is due to the method's compatibility with Cognitive Linguistics theories predicting that a person's grammar is a behavioral biometric.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes LambdaG, a method for authorship verification that constructs cognitively motivated grammar models for individual authors and computes the likelihood ratio λ_G of a document under the candidate grammar versus a reference population grammar. It reports that this approach outperforms seven baselines (including neural AV methods) across twelve datasets, is robust to small changes in the reference population, and provides interpretable visualizations, attributing effectiveness to compatibility with Cognitive Linguistics theories that treat grammar as a behavioral biometric.

Significance. If the reported empirical results hold under scrutiny, the work supplies a simpler, more explainable alternative to neural methods in digital text forensics while grounding the approach in established cognitive theories. The multi-dataset evaluation and explicit likelihood-ratio formulation are strengths that could support falsifiable follow-up work; the robustness claim to reference-population composition is also a concrete, testable contribution.

major comments (1)
  1. [Experimental Evaluation] Experimental section: the central claim of superior performance is load-bearing, yet the manuscript provides insufficient detail on the precise train/test splits, statistical significance testing (e.g., paired t-tests or McNemar), and controls for genre or length confounds across the twelve datasets; without these the superiority result cannot be fully assessed.
minor comments (2)
  1. [Method] Notation for λ_G and the reference-population grammar should be defined once in a dedicated subsection rather than introduced piecemeal.
  2. [Results] Figure captions for the grammar visualizations should explicitly state the units on each axis and the exact subset of data used.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the experimental evaluation. We address the single major comment below and will revise the manuscript to incorporate the requested details.

read point-by-point responses
  1. Referee: [Experimental Evaluation] Experimental section: the central claim of superior performance is load-bearing, yet the manuscript provides insufficient detail on the precise train/test splits, statistical significance testing (e.g., paired t-tests or McNemar), and controls for genre or length confounds across the twelve datasets; without these the superiority result cannot be fully assessed.

    Authors: We agree that the experimental section would benefit from greater explicitness to support reproducibility and allow full assessment of the performance claims. In the revised manuscript we will add a dedicated subsection that specifies the exact train/test splits (including any cross-validation folds or hold-out ratios) for each of the twelve datasets. We will also report the results of statistical significance tests, including paired t-tests on performance metrics across repeated runs and McNemar’s test for pairwise comparisons against each baseline. Finally, we will include additional analyses that control for text length and genre confounds, such as performance stratified by length bins and by dataset genre where the data permit. These revisions will directly address the concerns raised. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained and empirically evaluated

full rationale

The paper defines LambdaG explicitly as a likelihood ratio between a candidate grammar model and a reference population grammar, motivated by Cognitive Linguistics principles. Performance is assessed via direct empirical comparison on twelve datasets against seven external baselines (including neural methods), with no reduction of the central result to a fitted parameter renamed as prediction, self-citation chain, or definitional equivalence. The compatibility argument with biometric theories is presented as post-hoc interpretation rather than a load-bearing premise that forces the outcome. No quoted equations or steps exhibit the enumerated circular patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract provides limited information; the method assumes grammar models can be built and likelihoods computed, but specific free parameters and axioms cannot be fully enumerated without the full text.

free parameters (1)
  • Grammar model parameters
    Likely the grammar models are fitted to author texts, but details unknown from abstract.
axioms (1)
  • domain assumption A person's grammar is unique and can be modeled probabilistically based on Cognitive Linguistics principles.
    Central to building the author grammar and population grammar for the likelihood ratio.

pith-pipeline@v0.9.0 · 5740 in / 1177 out tokens · 32510 ms · 2026-05-24T02:56:44.801745+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

138 extracted references · 138 canonical work pages · 1 internal anchor

  1. [1]

    A. O. Agbeyangi, O. Abegunde, and S. I. Eludiora. Authorship Verification of Yorùbá Blog Posts using Character N-grams. In2020 International Conference in Mathematics, Computer Engineering and Computer Science (ICMCECS), pages 1–6, March 2020

  2. [2]

    Colin Aitken, Charles E. H. Berger, John S. Buckleton, Christophe Champod, James Curran, A. P. Dawid, Ian W. Evett, Peter Gill, Joaquin Gonzalez-Rodriguez, Graham Jackson, Ate Kloosterman, Tina Lovelock, David Lucy, Pierre Margot, Louise McKenna, Didier Meuwly, Cedric Neumann, Niamh Nic Daeid, Anders Nordgaard, Roberto Puch-Solis, Birgitta Rasmusson, Mike...

  3. [3]

    Statistics and the Evaluation of Evidence for Forensic Scientists

    Colin Aitken, Franco Taroni, and Silvia Bozza. Statistics and the Evaluation of Evidence for Forensic Scientists. Wiley, Chichester, 01 2021. DOI: 10.1002/9781119245438

  4. [4]

    Al-Khatib and Juman K

    Mahmoud A. Al-Khatib and Juman K. Al-qaoud. Authorship Verification of Opinion Articles in Online Newspapers Using the Idiolect of Author: A Comparative Study.Information, Communi- cation & Society, 24(11):1603–1621, 2021

  5. [5]

    BiBERT-AV: Enhancing Author- ship Verification Through Siamese Networks with Pre-trained BERT and Bi-LSTM

    Amirah Almutairi, BooJoong Kang, and Nawfal Al Hashimy. BiBERT-AV: Enhancing Author- ship Verification Through Siamese Networks with Pre-trained BERT and Bi-LSTM. In Guojun Wang, Haozhe Wang, Geyong Min, Nektarios Georgalas, and Weizhi Meng, editors,Ubiquitous Security - Third International Conference, UbiSec 2023, Exeter, UK, November 1-3, 2023, Revised ...

  6. [6]

    K. A. Apoorva and S. Sangeetha. Deep Neural Network and Model-based Clustering Technique for Forensic Electronic Mail Author Attribution.SN Applied Sciences, 3(3):348, February 2021

  7. [7]

    The Apricity - A European Cultural Community.https://theapricity.com, 2018

    The Apricity. The Apricity - A European Cultural Community.https://theapricity.com, 2018

  8. [8]

    Computational Forensic Authorship Analysis: Promises and Pitfalls

    Shlomo Engelson Argamon. Computational Forensic Authorship Analysis: Promises and Pitfalls. Language and Law / Linguagem e Direito, 5(2):7–37, 2018

  9. [9]

    American Statistical Association Position on Statistical State- ments for Forensic Evidence

    American Statistical Association. American Statistical Association Position on Statistical State- ments for Forensic Evidence. Technical report, American Statistical Association (ASA), 2019

  10. [10]

    Overview of PAN 2024: Multi-Author Writing Style Analy- sis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Author- ship Verification

    Abinew Ali Ayele, Nikolay Babakov, Janek Bevendorff, Xavier Bonet Casals, Berta Chulvi, Daryna Dementieva, Ashaf Elnagar, Dayne Freitag, Maik Fröbe, Damir Korenčić, Maximilian Mayerl, Daniil Moskovskiy, Animesh Mukherjee, Alexander Panchenko, Martin Potthast, Francisco Rangel, Naquee Rizwan, Paolo Rosso, Florian Schneider, Alisa Smirnova, Efstathios Stama...

  11. [11]

    Outside the Cave of Shadows: Using Syntactic Annotation To Enhance Authorship Attribution

    Harald Baayen, Hans Van Halteren, and Fiona Tweedie. Outside the Cave of Shadows: Using Syntactic Annotation To Enhance Authorship Attribution. Literary and Linguistic Computing, 11(3):121–132, 1996

  12. [12]

    Author Identification Using Multi-headed Recurrent Neural Networks

    Douglas Bagnall. Author Identification Using Multi-headed Recurrent Neural Networks. In Cap- pellato et al. [28]

  13. [13]

    Language Is a Complex Adaptive System: Position Paper.Language Learning, 59:1–26, 2009

    Clay Beckner, Nick C Ellis, Richard Blythe, John Holland, Joan Bybee, Jinyun Ke, Morten H Chris- tiansen, Diane Larsen-Freeman, William Croft, and Tom Schoenemann. Language Is a Complex Adaptive System: Position Paper.Language Learning, 59:1–26, 2009. Citation Key: Beckner2009. 34

  14. [14]

    Bias Analysis and Mit- igation in the Evaluation of Authorship Verification

    Janek Bevendorff, Matthias Hagen, Benno Stein, and Martin Potthast. Bias Analysis and Mit- igation in the Evaluation of Authorship Verification. In Anna Korhonen, David R. Traum, and Lluís Màrquez, editors,Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long...

  15. [15]

    Generalizing Unmasking for Short Texts

    Janek Bevendorff, Benno Stein, Matthias Hagen, and Martin Potthast. Generalizing Unmasking for Short Texts. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 654–659, Minneapolis, Minnesota, June 2019. Association for Co...

  16. [16]

    Voight- Kampff

    Janek Bevendorff, Matti Wiegmann, Jussi Karlgren, Luise Dürlich, Evangelia Gogoulou, Aarne Talman, Efstathios Stamatatos, Martin Potthast, and Benno Stein. Overview of the “Voight- Kampff” Generative AI Authorship Verification Task at PAN and ELOQUENT 2024. In Guglielmo Faggioli, Nicola Ferro, Petra Galuščáková, and Alba García Seco Herrera, editors,Worki...

  17. [17]

    José Nilo G. Binongo. Who Wrote the 15th Book of Oz? An Application of Multivariate Analysis to Authorship Attribution.CHANCE, 16(2):9–17, 2003

  18. [18]

    The Importance of Suppressing Domain Style in Authorship Analysis

    Sebastian Bischoff, Niklas Deckers, Marcel Schliebs, Ben Thies, Matthias Hagen, Efstathios Sta- matatos, Benno Stein, and Martin Potthast. The Importance of Suppressing Domain Style in Authorship Analysis. CoRR, abs/2005.14714, 2020

  19. [19]

    Boenninghoff, Steffen Hessler, Dorothea Kolossa, and Robert M

    Benedikt T. Boenninghoff, Steffen Hessler, Dorothea Kolossa, and Robert M. Nickel. Explainable Authorship Verification in Social Media via Attention-based Similarity Learning. In Chaitanya K. Baru, Jun Huan, Latifur Khan, Xiaohua Hu, Ronay Ak, Yuanyuan Tian, Roger S. Barga, Carlo Zaniolo, Kisung Lee, and Yanfang (Fanny) Ye, editors,2019 IEEE International...

  20. [20]

    Boenninghoff, Robert M

    Benedikt T. Boenninghoff, Robert M. Nickel, Steffen Zeiler, and Dorothea Kolossa. Similarity Learning for Authorship Verification in Social Media. InIEEE International Conference on Acous- tics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019, pages 2457–2461. IEEE, 2019

  21. [21]

    Likelihood Ratios for Categorical Evidence: Comparison of LR Models Applied to Gunshot Residue Data.Law, Probability and Risk, 16(2-3):71–90, 09 2017

    Annabel Bolck and Amalia Stamouli. Likelihood Ratios for Categorical Evidence: Comparison of LR Models Applied to Gunshot Residue Data.Law, Probability and Risk, 16(2-3):71–90, 09 2017

  22. [22]

    Different Likelihood Ratio Approaches To Evaluate the Strength of Evidence of MDMA Tablet Comparisons

    Annabel Bolck, Céline Weyermann, Laurence Dujourdy, Pierre Esseiva, and Jorrit Van Den Berg. Different Likelihood Ratio Approaches To Evaluate the Strength of Evidence of MDMA Tablet Comparisons. Forensic Science International, 191(1-3):42–51, 10 2009

  23. [23]

    Authorship Verification for Short Messages Using Stylometry

    Marcelo Luiz Brocardo, Issa Traoré, Sherif Saad, and Isaac Woungang. Authorship Verification for Short Messages Using Stylometry. InInternational Conference on Computer, Information and Telecommunication Systems, CITS 2013, Athens, Greece, May 7-8, 2013, pages 1–6. IEEE, 2013

  24. [24]

    Authorship Verification of E-Mail and Tweet Messages Applied for Continuous Authentication.Journal of Computer and System Sciences, 81(8):1429 – 1440, 2015

    Marcelo Luiz Brocardo, Issa Traore, and Isaac Woungang. Authorship Verification of E-Mail and Tweet Messages Applied for Continuous Authentication.Journal of Computer and System Sciences, 81(8):1429 – 1440, 2015

  25. [25]

    Marcelo Luiz Brocardo, Issa Traoré, Isaac Woungang, and Mohammad S. Obaidat. Authorship Verification using Deep Belief Network Systems.Int. J. Communication Systems, 30(12), 2017

  26. [26]

    Application-Independent Evaluation of Speaker Detection

    Niko Brümmer and Johan du Preez. Application-Independent Evaluation of Speaker Detection. Computer Speech & Language, 20(2):230–275, 2006. Odyssey 2004: The speaker and Language Recognition Workshop

  27. [27]

    Language, Usage and Cognition

    Joan Bybee. Language, Usage and Cognition. Cambridge University Press, Cambridge, UK, 2010

  28. [28]

    Linda Cappellato, Nicola Ferro, Gareth J. F. Jones, and Eric San Juan, editors.Working Notes for CLEF 2015 Conference, Toulouse, France, September 8–11, 2015, volume 1391 ofCEUR Workshop Proceedings. CEUR-WS.org, 2015. 35

  29. [29]

    Authorship Verification, Average Similarity Analysis

    Daniel Castro Castro, Yaritza Adame Arcia, María Pelaez Brioso, and Rafael Muñoz Guillena. Authorship Verification, Average Similarity Analysis. In Proceedings of the International Con- ference Recent Advances in Natural Language Processing, pages 84–90. INCOMA Ltd. Shoumen, BULGARIA, 2015

  30. [30]

    Catoggio, J

    D. Catoggio, J. Bunford, D. Taylor, G. Wevers, K. Ballantyne, and R. Morgan. An Introduc- tory Guide to Evaluative Reporting in Forensic Science.Australian Journal of Forensic Sciences, 51(sup1):S247–S251, February 2019

  31. [31]

    C.E. Chaski. Empirical Evaluations of Language-Based Author Identification Techniques.Forensic Linguistics, 8(1):1–65, 2001

  32. [32]

    Chen and Joshua Goodman

    Stanley F. Chen and Joshua Goodman. An Empirical Study of Smoothing Techniques for Language Modeling. In Aravind K. Joshi and Martha Palmer, editors,34th Annual Meeting of the Association for Computational Linguistics, 24-27 June 1996, University of California, Santa Cruz, California, USA, Proceedings, pages 310–318. Morgan Kaufmann Publishers / ACL, 1996

  33. [33]

    Chandramouli, and K

    Xiaoling Chen, Peng Hao, R. Chandramouli, and K. P. Subbalakshmi. Authorship Similarity Detection from Email Messages. In Proceedings of the 7th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM’11, pages 375–386, Berlin, Heidelberg,

  34. [34]

    Christiansen and Nick Chater

    Morten H. Christiansen and Nick Chater. The Now-or-Never Bottleneck: A Fundamental Con- straint on Language. Behavioral and Brain Sciences, 39:e62, 04 2016. Publisher: Cambridge University Press Citation Key: Christiansen2016

  35. [35]

    All the News 2.0 — 2.7 Million News Articles and Essays from 27 American Publi- cations

    Components. All the News 2.0 — 2.7 Million News Articles and Essays from 27 American Publi- cations. https://components.one/datasets/all-the-news-2-news-articles-dataset , 2017

  36. [36]

    UsingSubsampling To Estimate the Strength of Handwriting Evidence via Score-Based Likelihood Ratios.Forensic Science International, 216(1-3):146–157, 03 2012

    LindaJ.Davis, ChristopherP.Saunders, AmandaHepler, andJoAnnBuscaglia. UsingSubsampling To Estimate the Strength of Handwriting Evidence via Score-Based Likelihood Ratios.Forensic Science International, 216(1-3):146–157, 03 2012

  37. [37]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.arXiv preprint arXiv:1810.04805, 2018

  38. [38]

    Steven H. H. Ding, Benjamin C. M. Fung, and Mourad Debbabi. A Visualizable Evidence-Driven Approach for Authorship Attribution.ACM Trans. Inf. Syst. Secur., 17(3), March 2015

  39. [39]

    Frequency in Language: Memory, Attention and Learning

    Dagmar Divjak. Frequency in Language: Memory, Attention and Learning. Cambridge University Press, Cambridge, UK, 2019

  40. [40]

    Language as a Phenomenon of the Third Kind.Cognitive Linguistics, 31(2):213– 229, 2020

    Ewa Dąbrowska. Language as a Phenomenon of the Third Kind.Cognitive Linguistics, 31(2):213– 229, 2020

  41. [41]

    Drygajlo, M

    A. Drygajlo, M. Jessen, S. Gfrörer, I. Wagner, J. Vermeulen, T. Niemi, and Verlag für Polizeiwis- senschaft. Methodological Guidelines for Best Practice in Forensic Semiautomatic and Automatic Speaker Recognition. Verlag für Polizeiwissenschaft, 2015

  42. [42]

    Authorship Verification for Hired Plagiarism Detection

    Daniel Enriquez, Gage Christensen, Hayden Donovan, Jared Lam, Noah Wong, Sergiu Dascalu, David Feil-Seifer, and Emily Hand. Authorship Verification for Hired Plagiarism Detection. In Proceedings of the 9th International Conference on Applied Computing & Information Technology, ACIT ’22, page 19–24, New York, NY, USA, 2023. Association for Computing Machinery

  43. [43]

    ENFSI guideline for evaluative reporting in forensic science, 2015

    European Network of Forensic Science Institutes. ENFSI guideline for evaluative reporting in forensic science, 2015. Version 3.0

  44. [44]

    Evett, G

    I.w. Evett, G. Jackson, J.A. Lambert, and S. McCrossan. The Impact of the Principles of Evidence Interpretation on the Structure and Content of Statements.Science & Justice, 40(4):233–239, 10 2000

  45. [45]

    CEUR-WS.org, 2014

    Pamela Forner, Roberto Navigli, Dan Tufis, and Nicola Ferro, editors.Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23–26, 2013, volume 1179 ofCEUR Workshop Pro- ceedings. CEUR-WS.org, 2014. 36

  46. [46]

    Language Models and Fusion for Au- thorship Attribution.Information Processing & Management, 56(6):102061, 11 2019

    Olga Fourkioti, Symeon Symeonidis, and Avi Arampatzis. Language Models and Fusion for Au- thorship Attribution.Information Processing & Management, 56(6):102061, 11 2019

  47. [47]

    Al-Khateeb, Martin Potthast, Zinnar Ghasem, Mitul Shukla, and Emma Short

    Ingo Frommholz, Haider M. Al-Khateeb, Martin Potthast, Zinnar Ghasem, Mitul Shukla, and Emma Short. On Textual Analysis and Machine Learning for Cyberstalking Detection.Datenbank- Spektrum, 16(2):127–135, 2016

  48. [48]

    Lane, Steve Croker, Peter C-H

    Fernand Gobet, Peter C.R. Lane, Steve Croker, Peter C-H. Cheng, Gary Jones, Iain Oliver, and Julian M Pine. Chunking Mechanisms in Human Learning.Trends in Cognitive Sciences, 5(6):236– 243, 2001. Citation Key: Gobet2001

  49. [49]

    Recent Trends in Digital Text Forensics and its Evaluation

    Tim Gollub, Martin Potthast, Anna Beyer, Matthias Busse, Francisco Rangel, Paolo Rosso, Efs- tathios Stamatatos, and Benno Stein. Recent Trends in Digital Text Forensics and its Evaluation. In Pamela Forner, Henning Müller, Roberto Paredes, Paolo Rosso, and Benno Stein, editors,Infor- mation Access Evaluation. Multilinguality, Multimodality, and Visualiza...

  50. [50]

    Universal Dependencies and Author Attribution of Short Texts with Syntax Alone

    Robert Gorman. Universal Dependencies and Author Attribution of Short Texts with Syntax Alone. Digital Humanities Quarterly, 16(2), 2022

  51. [51]

    Practice-Oriented Authorship Verification

    Oren Halvani. Practice-Oriented Authorship Verification. PhD thesis, Technical University of Darmstadt, Germany, 2021

  52. [52]

    TextUnitLib: A Python Library That Allows Easy Extraction of a Variety of Text Units Within Texts.https://github.com/Halvani/TextUnitLib, 2024

    Oren Halvani. TextUnitLib: A Python Library That Allows Easy Extraction of a Variety of Text Units Within Texts.https://github.com/Halvani/TextUnitLib, 2024

  53. [53]

    POSNoise: An Effective Countermeasure Against Topic Biases in Authorship Analysis

    Oren Halvani and Lukas Graner. POSNoise: An Effective Countermeasure Against Topic Biases in Authorship Analysis. InProceedings of the 16th International Conference on Availability, Reliability and Security, ARES ’21, New York, NY, USA, 2021. Association for Computing Machinery

  54. [54]

    Cross-Domain Authorship Verification Based on Topic Agnostic Features

    Oren Halvani, Lukas Graner, and Roey Regev. Cross-Domain Authorship Verification Based on Topic Agnostic Features. In Linda Cappellato, Carsten Eickhoff, Nicola Ferro, and Aurélie Névéol, editors,Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, September 22-25, 2020, volume 2696 ofCEUR Workshop Proceedings. C...

  55. [55]

    TAVeer: An Interpretable Topic-Agnostic Author- ship Verification Method

    Oren Halvani, Lukas Graner, and Roey Regev. TAVeer: An Interpretable Topic-Agnostic Author- ship Verification Method. In Melanie Volkamer and Christian Wressnegger, editors,ARES 2020: The 15th International Conference on Availability, Reliability and Security, Virtual Event, Ireland, August 25-28, 2020, pages 41:1–41:10. ACM, 2020

  56. [56]

    Authorship Verification in the Absence of Explicit Features and Thresholds

    Oren Halvani, Lukas Graner, and Inna Vogel. Authorship Verification in the Absence of Explicit Features and Thresholds. In Gabriella Pasi, Benjamin Piwowarski, Leif Azzopardi, and Allan Han- bury, editors,Advances in Information Retrieval, pages 454–465. Springer International Publishing, 2018

  57. [57]

    On the Usefulness of Compression Models for Authorship Verification

    Oren Halvani, Christian Winter, and Lukas Graner. On the Usefulness of Compression Models for Authorship Verification. In Proceedings of the 12th International Conference on Availability, Reliability and Security, ARES ’17, pages 54:1–54:10, New York, NY, USA, 2017. ACM

  58. [58]

    Assessing the Applicability of Authorship VerificationMethods

    Oren Halvani, Christian Winter, and Lukas Graner. Assessing the Applicability of Authorship VerificationMethods. InProceedings of the 14th International Conference on Availability, Reliability and Security, ARES 2019, Canterbury, UK, August 26-29, 2019, pages 38:1–38:10. ACM, 2019

  59. [59]

    Recognition of Compromised Accounts on Twitter

    Rodrigo Augusto Igawa, Alex Marino Gonçalves de Almeida, Bruno Bogaz Zarpelão, and Sylvio Barbon. Recognition of Compromised Accounts on Twitter. In Sean W. M. Siqueira and Sérgio T. Carvalho, editors, Proceedings of the annual conference on Brazilian Symposium on Information Systems, Information Systems: A Computer Socio-Technical Perspective, SBSI 2015,...

  60. [60]

    Khan, Benjamin C

    Farkhund Iqbal, Liaquat A. Khan, Benjamin C. M. Fung, and Mourad Debbabi. E-mail Authorship Verification for Forensic Investigation. In Sung Y. Shin, Sascha Ossowski, Michael Schumacher, Mathew J. Palakal, and Chih-Cheng Hung, editors,Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), Sierre, Switzerland, March 22-26, 2010, pages 1591–1598....

  61. [61]

    A Forensic Authorship Classification in SMS Messages: A Likelihood Ratio Based Approach Using N-gram

    Shunichi Ishihara. A Forensic Authorship Classification in SMS Messages: A Likelihood Ratio Based Approach Using N-gram. In Diego Molla and David Martinez, editors,Proceedings of the Australasian Language Technology Association Workshop 2011, pages 47–56, Canberra, Australia, December 2011

  62. [62]

    Score-Based LikelihoodRatios forLinguisticTextEvidence With aBag-of-Words Model

    ShunichiIshihara. Score-Based LikelihoodRatios forLinguisticTextEvidence With aBag-of-Words Model. Forensic Science International, 327:110980, 2021. Publisher: Elsevier

  63. [63]

    Weight of Authorship Evidence With Multiple Categories of Stylometric Fea- tures: A Multinomial-Based Discrete Model.Science & Justice, 63(2):181–199, March 2023

    Shunichi Ishihara. Weight of Authorship Evidence With Multiple Categories of Stylometric Fea- tures: A Multinomial-Based Discrete Model.Science & Justice, 63(2):181–199, March 2023

  64. [64]

    Likelihood Ratio Estimation for Authorship Text Evidence: An Empirical Comparison of Score- and Feature-Based Methods.Forensic Science International, 334:111268, May 2022

    Shunichi Ishihara and Michael Carne. Likelihood Ratio Estimation for Authorship Text Evidence: An Empirical Comparison of Score- and Feature-Based Methods.Forensic Science International, 334:111268, May 2022

  65. [65]

    Validation in Forensic Text Comparison: Issues and Opportunities.Languages, 9(2):47, February 2024

    Shunichi Ishihara, Sonia Kulkarni, Michael Carne, Sabine Ehrhardt, and Andrea Nini. Validation in Forensic Text Comparison: Issues and Opportunities.Languages, 9(2):47, February 2024. Number: 2 Publisher: Multidisciplinary Digital Publishing Institute

  66. [66]

    Estimating the Strength of Authorship Evidence With a Deep-Learning-Based Approach

    Shunichi Ishihara, Satoru Tsuge, Mitsuyuki Inaba, and Wataru Zaitsu. Estimating the Strength of Authorship Evidence With a Deep-Learning-Based Approach. In Pradeesh Parameswaran, Jennifer Biggs, and David Powers, editors,Proceedings of the The 20th Annual Workshop of the Australasian Language Technology Association, pages183–187, Adelaide, Australia, Dece...

  67. [67]

    Authorship Verifica- tion Applied to Detection of Compromised Accounts on Online Social Networks – A Continuous Approach

    Sylvio Barbon Junior, Rodrigo Augusto Igawa, and Bruno Bogaz Zarpelão. Authorship Verifica- tion Applied to Detection of Compromised Accounts on Online Social Networks – A Continuous Approach. Multim. Tools Appl., 76(3):3213–3233, 2017

  68. [68]

    Overview of the Author Identification Task at PAN 2013

    Patrick Juola and Efstathios Stamatatos. Overview of the Author Identification Task at PAN 2013. In Forner et al. [45]

  69. [69]

    Function Words in Authorship Attribution

    Mike Kestemont. Function Words in Authorship Attribution. From Black Magic to Theory? In Anna Feldman, Anna Kazantseva, and Stan Szpakowicz, editors,Proceedings of the 3rd Workshop on Computational Linguistics for Literature (CLFL),pages59–66, Gothenburg, Sweden, April2014. Association for Computational Linguistics

  70. [70]

    Overview of the Cross-Domain Authorship Verifi- cation Task at PAN 2020

    Mike Kestemont, Enrique Manjavacas, Ilia Markov, Janek Bevendorff, Matti Wiegmann, Efstathios Stamatatos, Martin Potthast, and Benno Stein. Overview of the Cross-Domain Authorship Verifi- cation Task at PAN 2020. In Linda Cappellato, Carsten Eickhoff, Nicola Ferro, and Aurélie Névéol, editors,Working Notes of CLEF 2020 - Conference and Labs of the Evaluat...

  71. [71]

    Overview of the Cross-Domain Authorship Veri- fication Task at PAN 2021

    Mike Kestemont, Enrique Manjavacas, Ilia Markov, Janek Bevendorff, Matti Wiegmann, Efstathios Stamatatos, Benno Stein, and Martin Potthast. Overview of the Cross-Domain Authorship Veri- fication Task at PAN 2021. In Guglielmo Faggioli, Nicola Ferro, Alexis Joly, Maria Maistro, and Florina Piroi, editors,Working Notes Papers of the CLEF 2021 Evaluation Lab...

  72. [72]

    Authenticating the Writings of Julius Caesar.Expert Syst

    Mike Kestemont, Justin Anthony Stover, Moshe Koppel, Folgert Karsdorp, and Walter Daelemans. Authenticating the Writings of Julius Caesar.Expert Syst. Appl., 63:86–96, 2016

  73. [73]

    A Slightly-Modified GI-Based Author-Verifier with Lots of Features (ASGALF)

    Mahmoud Khonji and Youssef Iraqi. A Slightly-Modified GI-Based Author-Verifier with Lots of Features (ASGALF). In Linda Cappellato, Nicola Ferro, Martin Halvey, and Wessel Kraaij, editors, Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15-18, 2014., volume 1180 of CEUR Workshop Proceedings, pages 977–983. CEUR-WS.org, 2014

  74. [74]

    Improved Score Aggregation for Authorship Verification

    Mahmoud Khonji, Youssef Iraqi, and Loubna Mekouar. Improved Score Aggregation for Authorship Verification. Knowledge and Information Systems, 12 2022

  75. [75]

    The Enron Corpus: A New Dataset for Email Classification Research

    Bryan Klimt and Yiming Yang. The Enron Corpus: A New Dataset for Email Classification Research. In Jean-François Boulicaut, Floriana Esposito, Fosca Giannotti, and Dino Pedreschi, editors, Machine Learning: ECML 2004, pages 217–226, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg. 38

  76. [76]

    Kneser and H

    R. Kneser and H. Ney. Improved Backing-off for M-gram Language Modeling. In1995 International Conference on Acoustics, Speech, and Signal Processing, volume 1, pages 181–184 vol.1, 1995

  77. [77]

    UniNE at CLEF 2015 Author Identification: Notebook for PAN at CLEF 2015

    Mirco Kocher and Jacques Savoy. UniNE at CLEF 2015 Author Identification: Notebook for PAN at CLEF 2015. In CLEF (Working Notes), volume 1391 ofCEUR Workshop Proceedings. CEUR-WS.org, 2015

  78. [78]

    A Simple and Efficient Algorithm for Authorship Verification

    Mirco Kocher and Jacques Savoy. A Simple and Efficient Algorithm for Authorship Verification. Journal of the Association for Information Science and Technology, 68(1):259–269, 2017

  79. [79]

    Authorship Verification as a One-Class Classification Prob- lem

    Moshe Koppel and Jonathan Schler. Authorship Verification as a One-Class Classification Prob- lem. In Carla E. Brodley, editor,Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004, volume 69 ofACM International Conference Proceeding Series. ACM, 2004

  80. [80]

    Authorship Attribution in the Wild.Lan- guage Resources and Evaluation, 45(1):83–94, 2011

    Moshe Koppel, Jonathan Schler, and Shlomo Argamon. Authorship Attribution in the Wild.Lan- guage Resources and Evaluation, 45(1):83–94, 2011

Showing first 80 references.