pith. sign in

arxiv: 2605.23651 · v1 · pith:444QW6XYnew · submitted 2026-05-22 · 💻 cs.CL

How Human-Like Are Large Language Models? A Register-Aware Linguistic Evaluation Framework

Pith reviewed 2026-05-25 04:18 UTC · model grok-4.3

classification 💻 cs.CL
keywords large language modelslinguistic evaluationregister variationBiber featuresmaximum mean discrepancyhuman-likenesscorpus linguisticstext generation
0
0 comments X

The pith

Large language models always deviate from human linguistic patterns, but the closest model depends on the register rather than size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a context-aware framework that measures human-likeness of LLM texts by comparing their distributions of linguistic features against human reference corpora for specific registers. It applies a two-sample test using 67 lexico-grammatical features to seven instruction-tuned models across five English datasets. All LLMs show measurable differences from the human baseline in every register examined. Rankings of which model comes closest shift depending on the register and are not explained by differences in model size. This matters because texts can be factually accurate yet still feel unnatural if they violate the expected frequencies and patterns for a given communicative context.

Core claim

LLMs deviate from the human baseline in every tested setup when their texts are compared on lexico-grammatical feature distributions. The model that produces the distribution closest to human writing changes with the register, and this ordering is not dictated by model size.

What carries the argument

A two-sample Maximum Mean Discrepancy comparison between human and LLM corpora, performed separately for each register using the 67 Biber lexico-grammatical features.

If this is right

  • Evaluation of LLM output must be performed register by register rather than with a single aggregate score.
  • Larger models are not guaranteed to produce more human-like language distributions than smaller ones.
  • Different communicative contexts expose different strengths among current open-source models.
  • The framework supplies a quantitative basis for selecting models according to the intended register of use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Fine-tuning on register-specific human data may close the observed gaps more effectively than further scaling.
  • The same method could be applied to measure how well models handle register shifts within a single conversation.
  • Training data that under-represents certain registers likely contributes to the systematic deviations found here.

Load-bearing premise

The 67 Biber features together with the MMD statistic capture the aspects of language production that determine whether a text feels human-like in a given register.

What would settle it

An experiment that finds one model size ranking first across every register would show that closeness is dictated by size after all.

Figures

Figures reproduced from arXiv: 2605.23651 by (2) Department of Digital Humanities, 3, 3), (3) University of Birmingham United Kingdom, 4), (4) Chair of AI-supported Therapy Decisions LMU M\"unchen Munich Germany, 5, (5) Munich Center for Machine Learning (MCML) Munich Germany, 6), (6) Institute of AI for Health Helmholtz Zentrum M\"unchen Neuherberg Germany), Bjoern Eskofier (1, Bj\"orn Nieth (1, Emmanuelle Salin (1) ((1) Department Artificial Intelligence in Biomedical Engineering (AIBE) FAU Erlangen-N\"urnberg Germany, Marianna Gracheva (2), Michaela Mahlberg (2, Social Studies (DHSS) FAU Erlangen-N\"urnberg Germany.

Figure 1
Figure 1. Figure 1: Overview of the proposed evaluation frame [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: MMD2 with a resampled confidence interval for different sample sizes on the XSum dataset. 5.2 Model vs human In [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: MMD2 for all datasets and models to the respective human corpus, where the points indicate the observed MMD2 and the whiskers show the 95% CI resampled on coupled samples from the human and model corpus. The orange line in each plot gives the respective Human-Human MMD2 for the respective datasets with the resampled CI. The models on the y-axis are sorted by their observed MMD2 distance. Because the distan… view at source ↗
Figure 5
Figure 5. Figure 5: MMD2 for the prompt stability experiments to the human reference sample of the BNC2014Spoken. Dots indicate the mean value over all prompts, while the band shows the minimum and maximum observed distance for the respective model under all prompt vari￾ations. except for Llama 8B and Gemma 12B on the Writ￾ingPrompts dataset, use past-tense less frequently. Other nouns occur more frequently in spoken con￾vers… view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the proposed evaluation frame [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Violinplot of Biber dimension 1 on BNC2014Spoken for human and models in the Zero￾Shot setting models of one register can be calculated. The re￾sults are shown in Appendix [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: MMD2 with bootstrapped confidence interval for different sample sizes on all datasets. For BNC2014Spoken error is increasing, since dataset has only 1200 samples, thus a sample size larger 600 will lead to a smaller and larger subset [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Correlation heatmap between the MMD2 between human and AI for the BNC2014Spoken between different prompt variants in the Zero-Shot setting [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Human and model distributions for Biber dimensions in the Zero-Shot setting (BNC2014Spoken). [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Human and model distributions for Biber dimensions in the Zero-Shot setting (S2ORC_ACL). [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Human and model distributions for Biber dimensions in the Zero-Shot setting (wikiHow). [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Human and model distributions for Biber dimensions in the Zero-Shot setting (WritingPrompts). [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Human and model distributions for Biber dimensions in the Zero-Shot setting (XSum). [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Mean of the normalized linguistic features without standardization to the full human dataset, with [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Mean of the normalized linguistic features without standardization to the full human dataset, with the [PITH_FULL_IMAGE:figures/full_fig_p027_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Mean of the normalized linguistic features without standardization to the full human dataset, with the [PITH_FULL_IMAGE:figures/full_fig_p028_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Mean of the normalized linguistic features without standardization to the full human dataset, with the [PITH_FULL_IMAGE:figures/full_fig_p029_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Mean of the normalized linguistic features without standardization to the full human dataset, with the [PITH_FULL_IMAGE:figures/full_fig_p030_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Wasserstein distance for marginal feature distributions between model and human for BNC2014Spoken [PITH_FULL_IMAGE:figures/full_fig_p031_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Wasserstein distance for marginal feature distributions between model and human for S2ORC_ACL in [PITH_FULL_IMAGE:figures/full_fig_p032_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Wasserstein distance for marginal feature distributions between model and human for wikiHow in the [PITH_FULL_IMAGE:figures/full_fig_p033_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Wasserstein distance for marginal feature distributions between model and human for WritingPrompts in [PITH_FULL_IMAGE:figures/full_fig_p034_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Wasserstein distance for marginal feature distributions between model and human for XSum in the [PITH_FULL_IMAGE:figures/full_fig_p035_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Observed MMD distance between different models for BNC2014Spoken in the Zero-Shot setting. The [PITH_FULL_IMAGE:figures/full_fig_p036_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Observed MMD distance between different models for S2ORC_ACL in the Zero-Shot setting. The [PITH_FULL_IMAGE:figures/full_fig_p037_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Observed MMD distance between different models for wikiHow in the Zero-Shot setting. The MMD [PITH_FULL_IMAGE:figures/full_fig_p038_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Observed MMD distance between different models for WritingPrompts in the Zero-Shot setting. The [PITH_FULL_IMAGE:figures/full_fig_p039_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Observed MMD distance between different models for XSum in the Zero-Shot setting. The MMD [PITH_FULL_IMAGE:figures/full_fig_p040_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Sum of the variances of the 67 linguistic features after normalization on the corresponding full human [PITH_FULL_IMAGE:figures/full_fig_p041_29.png] view at source ↗
read the original abstract

While factual correctness and task-performance have been in focus of Large Language Model (LLM) research for a long time, the fundamental question of how human-like generated texts are on a linguistic level has been underexplored. From a corpus-linguistic perspective, language production is inherently context-dependent, with distinct communicative contexts giving rise to differences in frequencies and co-occurrence patterns of linguistic features. A text failing to adhere to these patterns can be content-wise correct, but still be unfavorable to human readers. In this work, we propose a context-aware evaluation framework in which human-likeness is assessed using a two-sample problem between the linguistic feature distribution of a human reference corpus for a given register and a corresponding LLM-generated corpus. We implement this framework using the Maximum Mean Discrepancy (MMD) and the 67 lexico-grammatical features introduced by Biber, which are commonly applied in corpus linguistics. In our experiments, we compare seven instruction-tuned, open-source models across five English-language datasets spanning distinct registers against a human baseline. While across all tested setups, LLMs deviate from the human baseline, which models are closest to human language depends on the register and is not dictated by model size.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a context-aware evaluation framework for assessing the human-likeness of LLM-generated texts using Maximum Mean Discrepancy (MMD) to compare distributions of 67 Biber lexico-grammatical features between human reference corpora and LLM outputs across five distinct English registers. Experiments with seven instruction-tuned open-source LLMs reveal that all models deviate from human baselines, but the model closest to the human distribution varies depending on the register and is not solely determined by model size.

Significance. If the framework's assumptions hold, this work offers a valuable corpus-linguistic approach to LLM evaluation that accounts for register-specific linguistic patterns, moving beyond task performance metrics. The reliance on established Biber features and MMD contributes to the method's transparency and potential for replication in the field.

major comments (1)
  1. [Abstract] The central claim that 'which models are closest to human language depends on the register and is not dictated by model size' is load-bearing on the 67 Biber features plus two-sample MMD being a sufficient statistic for human-likeness (Abstract). The manuscript provides no evidence that these distances align with human judgments of naturalness or discourse-level properties in the tested registers, nor any ablation against expanded feature sets; if the ordering differs from such external validation, the register-dependence conclusion does not follow from the reported MMD values.
minor comments (1)
  1. [Abstract] The abstract states the main finding but does not name the five registers or seven models; adding these would improve immediate readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments. We respond to the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] The central claim that 'which models are closest to human language depends on the register and is not dictated by model size' is load-bearing on the 67 Biber features plus two-sample MMD being a sufficient statistic for human-likeness (Abstract). The manuscript provides no evidence that these distances align with human judgments of naturalness or discourse-level properties in the tested registers, nor any ablation against expanded feature sets; if the ordering differs from such external validation, the register-dependence conclusion does not follow from the reported MMD values.

    Authors: We acknowledge the referee's point that the manuscript does not provide direct evidence linking MMD distances on the Biber feature set to human judgments of naturalness. The 67 features are selected because they are a well-established, replicable set in corpus linguistics for modeling register variation (Biber 1988 and subsequent validation studies). MMD serves as a distribution-level comparator rather than a claim of sufficiency for all aspects of human-likeness. The reported finding is therefore scoped to relative distances within this operationalization: across the five registers, the model minimizing MMD changes and is not monotonically related to parameter count. We agree that external validation would strengthen interpretation. In revision we will (1) temper the abstract wording to emphasize that conclusions concern this specific feature set and metric, (2) add citations to existing literature on the predictive validity of Biber features for perceived register appropriateness, and (3) expand the limitations section to note the absence of human judgment correlation or feature-set ablations as directions for future work. No new experiments are added at this stage. revision: partial

Circularity Check

0 steps flagged

No circularity; direct empirical comparison to external human corpora

full rationale

The paper defines human-likeness via two-sample MMD distances on the fixed, externally established set of 67 Biber lexico-grammatical features between LLM-generated texts and independent human reference corpora for each register. No equations, fitted parameters, self-referential definitions, or load-bearing self-citations appear; the reported register-dependent ordering of models follows immediately from these distance computations without any reduction of outputs to inputs by construction. The approach is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields limited visibility into parameters or assumptions; the framework rests on the domain assumption that Biber features are sufficient proxies for register-specific human language production.

axioms (1)
  • domain assumption Biber's 67 lexico-grammatical features capture the relevant frequency and co-occurrence patterns that distinguish registers in human language production.
    The entire evaluation framework is built on this standard corpus-linguistic premise as stated in the abstract.

pith-pipeline@v0.9.0 · 5900 in / 1182 out tokens · 18953 ms · 2026-05-25T04:18:15.964825+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

136 extracted references · 51 canonical work pages · 15 internal anchors

  1. [1]

    Precision-Recall Curves Using Information Divergence Frontiers , url =

    Josip Djolonga and Mario Lucic and Marco Cuturi and Olivier Bachem and Olivier Bousquet and Sylvain Gelly , bibsource =. Precision-Recall Curves Using Information Divergence Frontiers , url =. The 23rd International Conference on Artificial Intelligence and Statistics,

  2. [2]

    Ghostbuster: Detecting Text Ghostwritten by Large Language Models , url =

    Verma, Vivek and Fleisig, Eve and Tomlin, Nicholas and Klein, Dan , booktitle =. Ghostbuster: Detecting Text Ghostwritten by Large Language Models , url =

  3. [3]

    Manning and Chelsea Finn , bibsource =

    Eric Mitchell and Yoonho Lee and Alexander Khazatsky and Christopher D. Manning and Chelsea Finn , bibsource =. DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature , url =. International Conference on Machine Learning,

  4. [4]

    Wu, Junchao and Yang, Shu and Zhan, Runzhe and Yuan, Yulin and Chao, Lidia Sam and Wong, Derek Fai , doi =. A. Computational Linguistics , language =

  5. [5]

    Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual , editor =

    Krishna Pillutla and Swabha Swayamdipta and Rowan Zellers and John Thickstun and Sean Welleck and Yejin Choi and Za. Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual , editor =

  6. [6]

    Zhao and Kelvin Guu and Adams Wei Yu and Brian Lester and Nan Du and Andrew M

    Jason Wei and Maarten Bosma and Vincent Y. Zhao and Kelvin Guu and Adams Wei Yu and Brian Lester and Nan Du and Andrew M. Dai and Quoc V. Le , bibsource =. Finetuned Language Models are Zero-Shot Learners , url =. The Tenth International Conference on Learning Representations,

  7. [7]

    Linguistic

    Li, Ziqi and Zhang, Qi , language =. Linguistic

  8. [8]

    Long Ouyang and Jeffrey Wu and Xu Jiang and Diogo Almeida and Carroll L. Wainwright and Pamela Mishkin and Chong Zhang and Sandhini Agarwal and Katarina Slama and Alex Ray and John Schulman and Jacob Hilton and Fraser Kelton and Luke Miller and Maddie Simens and Amanda Askell and Peter Welinder and Paul F. Christiano and Jan Leike and Ryan Lowe , bibsourc...

  9. [9]

    Comparing

    Zamaraeva, Olga and. Comparing. Proceedings of the 63rd

  10. [10]

    Bagdasarov, Sergei and Alves, Diego , booktitle =. Like a

  11. [11]

    Differentiating between human-written and

    Georgiou, Georgios P , journal =. Differentiating between human-written and

  12. [12]

    Register

    Myntti, Amanda and Henriksson, Erik and Laippala, Veronika and Pyysalo, Sampo , journal =. Register

  13. [13]

    Persona-

    Truong, Kimberly Le and Fogliato, Riccardo and Heidari, Hoda and Wu, Zhiwei Steven , booktitle =. Persona-

  14. [14]

    Jin, Di and Pan, Eileen and Oufattole, Nassim and Weng, Wei-Hung and Fang, Hanyi and Szolovits, Peter , doi =. What. Applied Sciences , language =

  15. [15]

    Bowman , bibsource =

    Alex Wang and Yada Pruksachatkun and Nikita Nangia and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , bibsource =. SuperGLUE:. Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada , editor =

  16. [16]

    Es, Shahul and James, Jithin and Espinosa Anke, Luis and Schockaert, Steven , booktitle =

  17. [17]

    Measuring Massive Multitask Language Understanding , url =

    Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt , bibsource =. Measuring Massive Multitask Language Understanding , url =. 9th International Conference on Learning Representations,

  18. [19]

    Proceedings of the

    Yadagiri, Annepaka and. Proceedings of the

  19. [20]

    Veirano Pinto, Marcia , doi =. Elena. English Language and Linguistics , language =

  20. [21]

    Register as a predictor of linguistic variation , url =

    Biber,, Douglas , doi =. Register as a predictor of linguistic variation , url =. Corpus Linguistics and Linguistic Theory , language =

  21. [22]

    Register, genre, and style , year =

    Biber, Douglas and Conrad, Susan , doi =. Register, genre, and style , year =

  22. [23]

    Neurobiber:

    Alkiek, Kenan and Wegmann, Anna and Zhu, Jian and Jurgens, David , language =. Neurobiber:

  23. [24]

    and Aroyehun, Segun , journal =

    Zanotto, Sergio E. and Aroyehun, Segun , journal =. Human

  24. [25]

    Comparative linguistic analysis framework of human-written vs

    Culda, Lia Cornelia and Nerişanu, Raluca Andreea and Cristescu, Marian Pompiliu and Mara, Dumitru Alexandru and Bâra, Adela and Oprea, Simona-Vasilica , doi =. Comparative linguistic analysis framework of human-written vs. machine-generated text , url =. Connection Science , language =

  25. [26]

    , journal =

    Paech, Samuel J. , journal =

  26. [27]

    Wang, Zhengxiang and Tripto, Nafis Irtiza and Park, Solha and Li, Zhenzhen and Zhou, Jiawei , language =. Catch

  27. [28]

    M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection , url =

    Wang, Yuxia and Mansurov, Jonibek and Ivanov, Petar and Su, Jinyan and Shelmanov, Artem and Tsvigun, Akim and Whitehouse, Chenxi and Mohammed Afzal, Osama and Mahmoud, Tarek and Sasaki, Toru and Arnold, Thomas and Aji, Alham Fikri and Habash, Nizar and Gurevych, Iryna and Nakov, Preslav , booktitle =. M4: Multi-generator, Multi-domain, and Multi-lingual B...

  28. [30]

    Liu, Jae Q. J. and Hui, Kelvin T. K. and Al Zoubi, Fadi and Zhou, Zing Z. X. and Samartzis, Dino and Yu, Curtis C. H. and Chang, Jeremy R. and Wong, Arnold Y. L. , doi =. The great detectives: humans versus. International Journal for Educational Integrity , language =

  29. [31]

    Stylometry can reveal artificial intelligence authorship, but humans struggle:

    Zaitsu, Wataru and Jin, Mingzhe and Ishihara, Shunichi and Tsuge, Satoru and Inaba, Mitsuyuki , doi =. Stylometry can reveal artificial intelligence authorship, but humans struggle:. PLOS One , language =

  30. [32]

    Stylometry

    Przystalski, Karol and Argasiński, Jan and Grabska-Gradzińska, Iwona and Ochab, Jeremi , doi =. Stylometry

  31. [33]

    Nature , language =

    Shumailov, Ilia and Shumaylov, Zakhar and Zhao, Yiren and Papernot, Nicolas and Anderson, Ross and Gal, Yarin , doi =. Nature , language =

  32. [34]

    Srikant , bibsource =

    Shiyu Liang and Yixuan Li and R. Srikant , bibsource =. Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , url =. 6th International Conference on Learning Representations,

  33. [35]

    Roy and Zoubin Ghahramani , bibsource =

    Gintare Karolina Dziugaite and Daniel M. Roy and Zoubin Ghahramani , bibsource =. Training generative neural networks via Maximum Mean Discrepancy optimization , url =. Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence,

  34. [36]

    Jordan , bibsource =

    Mingsheng Long and Yue Cao and Jianmin Wang and Michael I. Jordan , bibsource =. Learning Transferable Features with Deep Adaptation Networks , url =. Proceedings of the 32nd International Conference on Machine Learning,

  35. [37]

    Zhu, Yongchun and Zhuang, Fuzhen and Wang, Jindong and Ke, Guolin and Chen, Jingwu and Bian, Jiang and Xiong, Hui and He, Qing , doi =. Deep. IEEE Transactions on Neural Networks and Learning Systems , keywords =

  36. [38]

    and Gil, María Victoria and Glaubitz, Christina and Greiner, Maximilian and Holick, Caroline T

    Mirza, Adrian and Alampara, Nawaf and Kunchapu, Sreekanth and Ríos-García, Martiño and Emoekabu, Benedict and Krishnan, Aswanth and Gupta, Tanya and Schilling-Wilhelmi, Mara and Okereke, Macjonathan and Aneesh, Anagha and Asgari, Mehrdad and Eberhardt, Juliane and Elahi, Amir Mohammad and Elbeheiry, Hani M. and Gil, María Victoria and Glaubitz, Christina ...

  37. [40]

    Ho and Christopher R

    Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher R. LegalBench:. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , editor =

  38. [41]

    Bowman , bibsource =

    Alex Wang and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , bibsource =. 7th International Conference on Learning Representations,

  39. [42]

    Variation across speech and writing , year =

    Biber, Douglas , publisher =. Variation across speech and writing , year =

  40. [43]

    Yang, An and Li, Anfeng and Yang, Baosong and Zhang, Beichen and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Gao, Chang and Huang, Chengen and Lv, Chenxu and Zheng, Chujie and Liu, Dayiheng and Zhou, Fan and Huang, Fei and Hu, Feng and Ge, Hao and Wei, Haoran and Lin, Huan and Tang, Jialong and Yang, Jian and Tu, Jianhong and Zhang, Jianwei and Yang, Jia...

  41. [44]

    Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and Al-Dahle, Ahmad and Letman, Aiesha and Mathur, Akhil and Schelten, Alan and Vaughan, Alex and Yang, Amy and Fan, Angela and Goyal, Anirudh and Hartshorn, Anthony and Yang, Aobo and Mitra, Archi and Sravankumar, Archie and Korenev, Artem and Hinsvark, A...

  42. [45]

    and Carey, C

    Team, Gemma and Kamath, Aishwarya and Ferret, Johan and Pathak, Shreya and Vieillard, Nino and Merhej, Ramona and Perrin, Sarah and Matejovicova, Tatiana and Ramé, Alexandre and Rivière, Morgane and Rouillard, Louis and Mesnard, Thomas and Cideron, Geoffrey and Grill, Jean-bastien and Ramos, Sabela and Yvinec, Edouard and Casbon, Michelle and Pot, Etienne...

  43. [46]

    Apertus:

    Apertus, Project and Hernández-Cano, Alejandro and Hägele, Alexander and Huang, Allen Hao and Romanou, Angelika and Solergibert, Antoni-Joan and Pasztor, Barna and Messmer, Bettina and Garbaya, Dhia and Ďurech, Eduard Frank and Hakimi, Ido and Giraldo, Juan García and Ismayilzada, Mete and Foroutan, Negar and Moalla, Skander and Chen, Tiancheng and Sabolč...

  44. [50]

    Koupaee, Mahnaz and Wang, William Yang , journal =

  45. [52]

    and Bergen, Benjamin K

    Chang, Tyler A. and Bergen, Benjamin K. , doi =. Language Model Behavior: A Comprehensive Survey , url =. Computational Linguistics , number =

  46. [53]

    and Macke, Jakob H

    Bischoff, Sebastian and Darcher, Alana and Deistler, Michael and Gao, Richard and Gerken, Franziska and Gloeckler, Manuel and Haxel, Lisa and Kapoor, Jaivardhan and Lappalainen, Janne K. and Macke, Jakob H. and Moss, Guy and Pals, Matthijs and Pei, Felix and Rapp, Rachel and Sağtekin, A. Erdem and Schröder, Cornelius and Schulz, Auguste and Stefanidi, Zin...

  47. [54]

    Ramdas, Aaditya and Garcia, Nicolas and Cuturi, Marco , journal =. On

  48. [55]

    and Rasch, Malte J

    Gretton, Arthur and Borgwardt, Karsten M. and Rasch, Malte J. and Schölkopf, Bernhard and Smola, Alexander , journal =. A kernel two-sample test , url =

  49. [56]

    Personalized Text Generation with Fine-Grained Linguistic Control , url =

    Alhafni, Bashar and Kulkarni, Vivek and Kumar, Dhruv and Raheja, Vipul , booktitle =. Personalized Text Generation with Fine-Grained Linguistic Control , url =

  50. [57]

    Linguistic

    Terčon, Luka and Dobrovoljc, Kaja , language =. Linguistic

  51. [58]

    Reinhart, Alex and Markey, Ben and Laudenbach, Michael and Pantusen, Kachatad and Yurko, Ronald and Weinberg, Gordon and Brown, David West , journal =. Do

  52. [59]

    Benchmark of stylistic variation in

    Milička, Jiří and Marklová, Anna and Cvrček, Václav , journal =. Benchmark of stylistic variation in

  53. [60]

    Applied Corpus Linguistics , language =

    Berber Sardinha, Tony , doi =. Applied Corpus Linguistics , language =

  54. [61]

    Milička, Jiří and Marklová, Anna and Cvrček, Václav , journal =

  55. [62]

    Precision-

    Djolonga, Josip and Lucic, Mario and Cuturi, Marco and Bachem, Olivier and Bousquet, Olivier and Gelly, Sylvain , editor =. Precision-. Proceedings of the. 2020 , pages =

  56. [63]

    Ghostbuster:

    Verma, Vivek and Fleisig, Eve and Tomlin, Nicholas and Klein, Dan , year =. Ghostbuster:. Proceedings of the 2024. doi:10.18653/v1/2024.naacl-long.95 , abstract =

  57. [65]

    Proceedings of the 35th

    Pillutla, Krishna and Swayamdipta, Swabha and Zellers, Rowan and Thickstun, John and Welleck, Sean and Choi, Yejin and Harchaoui, Zaid , year =. Proceedings of the 35th

  58. [66]

    International

    Wei, Jason and Bosma, Maarten and Zhao, Vincent Y and Guu, Kelvin and Yu, Adams Wei and Lester, Brian and Du, Nan and Dai, Andrew M and Le, Quoc V , year =. International

  59. [67]

    Linguistic

    Li, Ziqi and Zhang, Qi , year =. Linguistic

  60. [68]

    Training language models to follow instructions with human feedback , abstract =

    Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul and Leike, Jan and Lowe, Ry...

  61. [69]

    Bagdasarov, Sergei and Alves, Diego , year =. Like a. Proceedings of the

  62. [70]

    Information , author =

    Differentiating between human-written and. Information , author =

  63. [71]

    Register Always Matters: Analysis of LLM Pretraining Data Through the Lens of Language Variation

    Myntti, Amanda and Henriksson, Erik and Laippala, Veronika and Pyysalo, Sampo , month = sep, year =. Register. doi:10.48550/arXiv.2504.01542 , abstract =

  64. [72]

    Persona-

    Truong, Kimberly Le and Fogliato, Riccardo and Heidari, Hoda and Wu, Zhiwei Steven , year =. Persona-. Proceedings of the 2025

  65. [74]

    Es, Shahul and James, Jithin and Espinosa-Anke, Luis and Schockaert, Steven , year =. System

  66. [75]

    Measuring Massive Multitask Language Understanding

    Hendrycks, Dan and Burns, Collin and Basart, Steven and Zou, Andy and Mazeika, Mantas and Song, Dawn and Steinhardt, Jacob , month = jan, year =. Measuring. doi:10.48550/arXiv.2009.03300 , abstract =

  67. [80]

    and Aroyehun, Segun , month = dec, year =

    Zanotto, Sergio E. and Aroyehun, Segun , month = dec, year =. Human. doi:10.48550/arXiv.2412.03025 , abstract =

  68. [81]

    machine-generated text , volume =

    Comparative linguistic analysis framework of human-written vs. machine-generated text , volume =. Connection Science , author =. 2025 , pages =. doi:10.1080/09540091.2025.2507183 , abstract =

  69. [82]

    , month = jan, year =

    Paech, Samuel J. , month = jan, year =. doi:10.48550/arXiv.2312.06281 , abstract =

  70. [83]

    Distinguishing

    Mosca, Edoardo and Abdalla, Mohamed Hesham Ibrahim and Basso, Paolo and Musumeci, Margherita and Groh, Georg , year =. Distinguishing. Proceedings of the 3rd. doi:10.18653/v1/2023.trustnlp-1.17 , abstract =

  71. [84]

    International Journal for Educational Integrity , author =

    The great detectives: humans versus. International Journal for Educational Integrity , author =. 2024 , pages =. doi:10.1007/s40979-024-00155-6 , abstract =

  72. [87]

    Anderson and Yarin Gal , title =

    Nature , author =. 2024 , pages =. doi:10.1038/s41586-024-07566-y , abstract =

  73. [88]

    Liang, Shiyu and Li, Yixuan and Srikant, R , year =

  74. [89]

    Training generative neural networks via Maximum Mean Discrepancy optimization

    Dziugaite, Gintare Karolina and Roy, Daniel M. and Ghahramani, Zoubin , month = may, year =. Training generative neural networks via. doi:10.48550/arXiv.1505.03906 , abstract =

  75. [90]

    Learning Transferable Features with Deep Adaptation Networks

    Long, Mingsheng and Cao, Yue and Wang, Jianmin and Jordan, Michael I. , month = may, year =. Learning. doi:10.48550/arXiv.1502.02791 , abstract =

  76. [93]

    URL https:// doi.org/10.18653/v1/p19-1472

    Zellers, Rowan and Holtzman, Ari and Bisk, Yonatan and Farhadi, Ali and Choi, Yejin , year =. Proceedings of the 57th. doi:10.18653/v1/P19-1472 , abstract =

  77. [94]

    SSRN Electronic Journal , author =

    Legalbench:. SSRN Electronic Journal , author =. doi:10.2139/ssrn.4583531 , abstract =

  78. [95]

    Proceedings of the 2018

    Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel , year =. Proceedings of the 2018. doi:10.18653/v1/W18-5446 , language =

  79. [96]

    Variation across speech and writing , publisher =

    Biber, Douglas , year =. Variation across speech and writing , publisher =

  80. [97]

    Yang, An and Li, Anfeng and Yang, Baosong and Zhang, Beichen and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Gao, Chang and Huang, Chengen and Lv, Chenxu and Zheng, Chujie and Liu, Dayiheng and Zhou, Fan and Huang, Fei and Hu, Feng and Ge, Hao and Wei, Haoran and Lin, Huan and Tang, Jialong and Yang, Jian and Tu, Jianhong and Zhang, Jianwei and Yang, Jia...

Showing first 80 references.