pith. sign in

arxiv: 2606.04177 · v1 · pith:5QCMAQ42new · submitted 2026-06-02 · 💻 cs.CL · cs.AI

A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models

Pith reviewed 2026-06-28 09:58 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords AI-generated text detectionlinguistic featureslexical richnesscross-domain generalizationcross-model generalizationinterpretable detectionLLM output analysistext classification
0
0 comments X

The pith

Classifiers using linguistic features alone can reliably distinguish AI-generated text from human text, with lexical richness measures remaining robust across models and domains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether a set of 284 linguistic features can consistently separate text written by people from text produced by large language models. It runs this test on outputs from 27 different LLMs across 10 writing domains and checks performance when models or domains are held out from training. The results show that simple classifiers built only on these features succeed at the distinction task. Most individual features change how well they work depending on the model family or the type of text, but measures that track lexical richness stay effective in every setting examined. Readers care because the work isolates which signals are stable enough to support explanations that non-experts can understand.

Core claim

Classifiers based solely on linguistic features can reliably distinguish AI-generated from human-written text across outputs from 27 LLMs and ten text domains under cross-model and cross-domain generalization settings. Many previously proposed indicators prove strongly context-dependent. Measures of lexical richness remain robust signals across model families and text domains. The results identify which linguistic signals generalize across contexts and supply a foundation for more reliable, interpretable analyses of AI-generated language.

What carries the argument

Systematic measurement of 284 interpretable linguistic features under cross-model and cross-domain generalization to test their reliability as indicators of machine-generated text.

If this is right

  • Classifiers that use only linguistic features achieve reliable separation of AI-generated and human text.
  • Most previously suggested linguistic indicators lose effectiveness when the model family or text domain changes.
  • Lexical richness measures continue to function as stable signals across all tested model families and domains.
  • The identified stable features can serve as the basis for more interpretable detection methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Detection tools could be simplified by focusing primarily on lexical richness features when generalization to new models is required.
  • The same feature-evaluation approach could be repeated on newer model releases to check whether the robustness pattern persists.
  • Similar large-scale tests on non-English text or on specialized domains such as legal or medical writing would reveal whether the same features remain dominant.

Load-bearing premise

The 10 chosen text domains and 27 LLMs are representative enough of broader model and domain variation to support claims of cross-model and cross-domain generalization.

What would settle it

An experiment on a fresh collection of domains or on LLMs released after the study in which lexical richness measures lose their ability to separate AI-generated from human text would falsify the robustness result.

Figures

Figures reproduced from arXiv: 2606.04177 by Agnieszka Falenska, Esra D\"onmez, Maximilian Maurer, Yassir El Attar.

Figure 1
Figure 1. Figure 1: (left), reveal similar patterns to TB5: clas￾sifiers performing worse for GLM, Eleuther, Big￾Science, and OpenAI (48.1%, 51.0%, 57.6%, and 59.7%), and better for OPT, FLAN, and LLaMA on average (68.4%, 63.7%, 62.8%). Text Domain Effects [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Macro F1 performance changes with cumulative feature area removal. The left pane shows results for [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Ablation results on TB2. The horizontal dashed lines indicate the baseline results (all features) and the bars indicate the performance change in Macro F1 after removing the corresponding feature area. Surface Lexi. Rich. Emotion Psych. Readab. Morpho. POS Depend. Semant. Entities Inform. 0.4 0.5 0.6 0.7 Baseline (0.609) [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Ablation performance ranges on TB5 across the 11 feature areas after removing the corresponding feature area compared to the baseline (dashed red line). shows interesting patterns (see [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Macro F1 results of ablation study, top pane: [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation performance variance across 16 text [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of results of ablation study on [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of F1-Macro scores across feature area ablations for in-distribution (TB3) and out-of [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of results of ablation study [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Change in standard deviation between unseen ( [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Leave-one-out features ablation performance [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Results of the standard deviation between ab [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗
Figure 14
Figure 14. Figure 14: The difference in the performance for the [PITH_FULL_IMAGE:figures/full_fig_p022_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Performance difference (F1-Macro) between lexical-richness-only and full-feature classifiers. [PITH_FULL_IMAGE:figures/full_fig_p023_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Heatmap of pairwise Pearson’s r of TTR distributions between selected model family pairs across text domains. The left columns show GLM paired with smaller models, while the right columns show OpenAI paired with models of varying scale. Cell values indicate the Pearson’s r between the TTR distributions of the two model families within each text domain. all 70 pairs (Figure 15a). The worst drops occur with… view at source ↗
Figure 17
Figure 17. Figure 17: Density distributions of lexical richness features (TTR, hapax legomena, lexical density) across (a) [PITH_FULL_IMAGE:figures/full_fig_p024_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Violin plots for the three lexical richness features: TTR (left), hapax legomena (middle), and lexical [PITH_FULL_IMAGE:figures/full_fig_p025_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Parallel human–AI text pairs from the Yelp domain (AI model: Text-davinci-003) illustrating differences [PITH_FULL_IMAGE:figures/full_fig_p028_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Parallel human-AI text pairs from the XSum domain illustrating differences in lexical richness features. [PITH_FULL_IMAGE:figures/full_fig_p029_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Clustered heatmap of pairwise Wasserstein distances for the top 10 most discriminative linguistic features [PITH_FULL_IMAGE:figures/full_fig_p030_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Pairwise Pearson’s r of TTR feature distributions across model family pairs, with distributions computed over the ten text domains. Pairs are grouped into three categories: Smaller–Smaller (blue; e.g., FLAN-T5 vs. OPT), Smaller–LLaMA/OpenAI (orange; e.g., Eleuther vs. LLaMA), and LLaMA–OpenAI (green). TTR was selected as the focal feature given its pronounced human–AI distributional separation observed in… view at source ↗
read the original abstract

Interpretable linguistic features offer a promising approach for explaining why a given text appears machine-generated, particularly for non-expert users. However, existing findings on which features reliably indicate LLM-generated text remain fragmented across feature sets, models, and text domains. To address this gap, we conduct a large-scale empirical study assessing the robustness of linguistic signals for characterizing AI-generated text. Our analysis covers 284 interpretable linguistic features across outputs from 27 LLMs and ten text domains under cross-model and cross-domain generalization settings. We show that classifiers based solely on linguistic features can reliably distinguish AI-generated from human-written text. However, many previously proposed indicators prove strongly context-dependent, with the exception of measures of lexical richness, which remain robust signals across model families and text domains. These results demonstrate which linguistic signals generalize across contexts and provide a foundation for more reliable, interpretable analyses of AI-generated language.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a large-scale empirical study of 284 linguistic features drawn from outputs of 27 LLMs across 10 text domains. It evaluates their utility for distinguishing AI-generated from human-written text under cross-model and cross-domain generalization settings, concluding that feature-based classifiers can reliably detect AI text overall, but that most previously proposed indicators are context-dependent while measures of lexical richness remain robust across model families and domains.

Significance. If the sampling is representative and the empirical results are statistically substantiated, the work would be significant for consolidating fragmented prior findings on linguistic indicators of machine-generated text and for identifying a small set of generalizable, interpretable features (lexical richness) that could support more reliable detection and explanation.

major comments (2)
  1. [Abstract] Abstract: the headline claim that lexical-richness measures 'remain robust signals across model families and text domains' is load-bearing on the assumption that the chosen 10 domains and 27 LLMs adequately sample the space of possible domains and model families; the manuscript supplies no analysis or justification of domain diversity, genre coverage, or model-family/size distribution, so the observed robustness could be an artifact of the limited sample rather than a general property.
  2. [Experimental setup] Experimental setup (methods description): the abstract states clear empirical findings on classifier reliability and feature robustness, yet provides no information on statistical tests, train/test splits for the cross-model and cross-domain settings, feature-extraction pipelines, or controls for confounds such as domain-topic overlap; without these details the central generalization claims cannot be verified.
minor comments (1)
  1. [Abstract] Abstract: the selection process and categorization of the 284 features are not summarized, which would help readers assess the scope of the feature set.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim that lexical-richness measures 'remain robust signals across model families and text domains' is load-bearing on the assumption that the chosen 10 domains and 27 LLMs adequately sample the space of possible domains and model families; the manuscript supplies no analysis or justification of domain diversity, genre coverage, or model-family/size distribution, so the observed robustness could be an artifact of the limited sample rather than a general property.

    Authors: We agree that the abstract's generalization claim requires explicit support for the sampling choices. In the revised version we will add a dedicated subsection in the Methods that justifies the selection of the 10 domains (spanning news, fiction, academic writing, technical documentation, conversational, and opinion genres) and the 27 LLMs (covering multiple families and parameter scales). We will also insert a Limitations paragraph that acknowledges the finite scope of the sample and the possibility that robustness could be narrower than claimed. These additions will make the headline statement better grounded. revision: yes

  2. Referee: [Experimental setup] Experimental setup (methods description): the abstract states clear empirical findings on classifier reliability and feature robustness, yet provides no information on statistical tests, train/test splits for the cross-model and cross-domain settings, feature-extraction pipelines, or controls for confounds such as domain-topic overlap; without these details the central generalization claims cannot be verified.

    Authors: We will substantially expand the Experimental Setup section to supply the missing methodological details. The revision will explicitly describe the statistical tests performed, the precise train/test partitioning and cross-validation procedures used for the leave-one-model-out and leave-one-domain-out evaluations, the full feature-extraction pipeline (including libraries, preprocessing steps, and parameter settings), and the measures taken to mitigate domain-topic confounds (such as topic-balance checks). These clarifications will render the generalization results reproducible and verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical study with no derivations or self-referential steps

full rationale

The paper performs a large-scale empirical analysis of 284 linguistic features across 27 LLMs and 10 domains, reporting classifier performance under cross-model and cross-domain settings. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described methodology. The central claims rest on observed experimental outcomes rather than any reduction to inputs by construction. The representativeness of the sampled domains and models is an external validity concern, not a circularity issue.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the selected linguistic features can be extracted uniformly and that the chosen models and domains capture the relevant variation in AI-generated text.

free parameters (1)
  • selection of 284 linguistic features
    The specific set of features included is a modeling choice that determines which signals are tested.
axioms (1)
  • domain assumption Linguistic features extracted from text are stable and comparable across different models and domains
    Invoked when claiming cross-model and cross-domain robustness in the abstract.

pith-pipeline@v0.9.1-grok · 5696 in / 1214 out tokens · 24321 ms · 2026-06-28T09:58:43.572594+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

300 extracted references · 208 canonical work pages · 1 internal anchor

  1. [1]

    Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

  2. [2]

    Research on LLM s-Empowered Conversational AI for Sustainable Behaviour Change

    Chen, Ben. Research on LLM s-Empowered Conversational AI for Sustainable Behaviour Change. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

  3. [3]

    Deep Reinforcement Learning of LLM s​ using RLHF

    Levandovsky, Enoch. Deep Reinforcement Learning of LLM s​ using RLHF. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

  4. [4]

    Conversational Collaborative Robots

    Kranti, Chalamalasetti. Conversational Collaborative Robots. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

  5. [5]

    Dialogue System using Large Language Model-based Dynamic Slot Generation

    Hashimoto, Ekai. Dialogue System using Large Language Model-based Dynamic Slot Generation. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

  6. [6]

    Towards Adaptive Human-Agent Collaboration in Real-Time Environments

    Nakae, Kaito. Towards Adaptive Human-Agent Collaboration in Real-Time Environments. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

  7. [7]

    Towards Human-Like Dialogue Systems: Integrating Multimodal Emotion Recognition and Non-Verbal Cue Generation

    Jiang, Jingjing. Towards Human-Like Dialogue Systems: Integrating Multimodal Emotion Recognition and Non-Verbal Cue Generation. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

  8. [8]

    Controlling Dialogue Systems with Graph-Based Structures

    Hilgendorf, Laetitia Mina. Controlling Dialogue Systems with Graph-Based Structures. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

  9. [9]

    Multimodal Agentic Dialogue Systems for Situated Human-Robot Interaction

    Sucal, Virgile. Multimodal Agentic Dialogue Systems for Situated Human-Robot Interaction. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

  10. [10]

    Knowledge Graphs and Representational Models for Dialogue Systems

    Walker, Nicholas Thomas. Knowledge Graphs and Representational Models for Dialogue Systems. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

  11. [11]

    Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.0

  12. [12]

    Fine-Tuning Large Language Models for Relation Extraction within a Retrieval-Augmented Generation Framework

    Efeoglu, Sefika and Paschke, Adrian. Fine-Tuning Large Language Models for Relation Extraction within a Retrieval-Augmented Generation Framework. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.1

  13. [13]

    Benchmarking Table Extraction: Multimodal LLM s vs Traditional OCR

    Nunes, Guilherme and Rolla, Vitor and Pereira, Duarte and Alves, Vasco and Carreiro, Andre and Baptista, M \'a rcia. Benchmarking Table Extraction: Multimodal LLM s vs Traditional OCR. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.2

  14. [14]

    Injecting Structured Knowledge into LLM s via Graph Neural Networks

    Li, Zichao and Ke, Zong and Zhao, Puning. Injecting Structured Knowledge into LLM s via Graph Neural Networks. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.3

  15. [15]

    Regular-pattern-sensitive CRF s for Distant Label Interactions

    Papay, Sean and Klinger, Roman and Pad \'o , Sebastian. Regular-pattern-sensitive CRF s for Distant Label Interactions. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.4

  16. [16]

    From Syntax to Semantics: Evaluating the Impact of Linguistic Structures on LLM -Based Information Extraction

    Swarup, Anushka and Bhandarkar, Avanti and Wilson, Ronald and Pan, Tianyu and Woodard, Damon. From Syntax to Semantics: Evaluating the Impact of Linguistic Structures on LLM -Based Information Extraction. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.5

  17. [17]

    Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models

    Willemsen, Bram and Skantze, Gabriel. Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.6

  18. [18]

    Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis

    Li, Daoyang and Zhao, Haiyan and Zeng, Qingcheng and Du, Mengnan. Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.7

  19. [19]

    Self-Contrastive Loop of Thought Method for Text-to- SQL Based on Large Language Model

    Kang, Fengrui and Tan, Mingxi and Huang, Xianying and Yang, Shiju. Self-Contrastive Loop of Thought Method for Text-to- SQL Based on Large Language Model. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.8

  20. [20]

    Combining Automated and Manual Data for Effective Downstream Fine-Tuning of Transformers for Low-Resource Language Applications

    Isaeva, Ulyana and Astafurov, Danil and Martynov, Nikita. Combining Automated and Manual Data for Effective Downstream Fine-Tuning of Transformers for Low-Resource Language Applications. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.9

  21. [21]

    Seamlessly Integrating Tree-Based Positional Embeddings into Transformer Models for Source Code Representation

    Bartkowiak, Patryk and Grali \'n ski, Filip. Seamlessly Integrating Tree-Based Positional Embeddings into Transformer Models for Source Code Representation. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.10

  22. [22]

    Enhancing AMR Parsing with Group Relative Policy Optimization

    Barta, Botond and Hamerlik, Endre and Nyist, Mil \'a n and Ito, Masato and Acs, Judit. Enhancing AMR Parsing with Group Relative Policy Optimization. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.11

  23. [23]

    Structure Modeling Approach for UD Parsing of Historical M odern J apanese

    Ozaki, Hiroaki and Omura, Mai and Komiya, Kanako and Asahara, Masayuki and Ogiso, Toshinobu. Structure Modeling Approach for UD Parsing of Historical M odern J apanese. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.12

  24. [24]

    BARTABSA ++: Revisiting BARTABSA with Decoder LLM s

    Pfister, Jan and V. BARTABSA ++: Revisiting BARTABSA with Decoder LLM s. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.13

  25. [25]

    Typed- RAG : Type-Aware Decomposition of Non-Factoid Questions for Retrieval-Augmented Generation

    Lee, DongGeon and Park, Ahjeong and Lee, Hyeri and Nam, Hyeonseo and Maeng, Yunho. Typed- RAG : Type-Aware Decomposition of Non-Factoid Questions for Retrieval-Augmented Generation. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.14

  26. [26]

    Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction

    Hellwig, Nils Constantin and Fehle, Jakob and Kruschwitz, Udo and Wolff, Christian. Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.15

  27. [27]

    Can LLM s Interpret and Leverage Structured Linguistic Representations? A Case Study with AMR s

    Raut, Ankush and Zhu, Xiaofeng and Pacheco, Maria Leonor. Can LLM s Interpret and Leverage Structured Linguistic Representations? A Case Study with AMR s. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.16

  28. [28]

    LLM Dependency Parsing with In-Context Rules

    Ginn, Michael and Palmer, Alexis. LLM Dependency Parsing with In-Context Rules. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.17

  29. [29]

    Cognitive Mirroring for D oc RE : A Self-Supervised Iterative Reflection Framework with Triplet-Centric Explicit and Implicit Feedback

    Han, Xu and Wang, Bo and Sun, Yueheng and Zhao, Dongming and Qu, Zongfeng and He, Ruifang and Hou, Yuexian and Hu, Qinghua. Cognitive Mirroring for D oc RE : A Self-Supervised Iterative Reflection Framework with Triplet-Centric Explicit and Implicit Feedback. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)...

  30. [30]

    Cross-Document Event-Keyed Summarization

    Walden, William and Kuchmiichuk, Pavlo and Martin, Alexander and Jin, Chihsheng and Cao, Angela and Sun, Claire and Allen, Curisia and White, Aaron Steven. Cross-Document Event-Keyed Summarization. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.19

  31. [31]

    Transfer of Structural Knowledge from Synthetic Languages

    Budnikov, Mikhail and Yamshchikov, Ivan. Transfer of Structural Knowledge from Synthetic Languages. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.20

  32. [32]

    Language Models are Universal Embedders

    Zhang, Xin and Li, Zehan and Zhang, Yanzhao and Long, Dingkun and Xie, Pengjun and Zhang, Meishan and Zhang, Min. Language Models are Universal Embedders. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.21

  33. [33]

    D ia DP @ XLLM 25: Advancing C hinese Dialogue Parsing via Unified Pretrained Language Models and Biaffine Dependency Scoring

    Duan, Shuoqiu and Chen, Xiaoliang and Miao, Duoqian and Gu, Xu and Li, Xianyong and Du, Yajun. D ia DP @ XLLM 25: Advancing C hinese Dialogue Parsing via Unified Pretrained Language Models and Biaffine Dependency Scoring. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.22

  34. [34]

    LLMSR @ XLLM 25: Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation

    Yuan, Jiahao and Sun, Xingzhe and Yu, Xing and Wang, Jingwen and Du, Dehui and Cui, Zhiqing and Di, Zixiang. LLMSR @ XLLM 25: Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.23

  35. [35]

    S peech EE @ XLLM 25: End-to-End Structured Event Extraction from Speech

    Chaudhuri, Soham and Biswas, Diganta and Saha, Dipanjan and Das, Dipankar and Bandyopadhyay, Sivaji. S peech EE @ XLLM 25: End-to-End Structured Event Extraction from Speech. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.24

  36. [36]

    Luu, Son and Van Nguyen, Kiet

    Pham Hoang Le, Nguyen and Dinh Thien, An and T. Luu, Son and Van Nguyen, Kiet. D oc IE @ XLLM 25: Z ero S emble - Robust and Efficient Zero-Shot Document Information Extraction with Heterogeneous Large Language Model Ensembles. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.25

  37. [37]

    D oc IE @ XLLM 25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations

    Popovic, Nicholas and Kangen, Ashish and Schopf, Tim and F. D oc IE @ XLLM 25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.26

  38. [38]

    LLMSR @ XLLM 25: Integrating Reasoning Prompt Strategies with Structural Prompt Formats for Enhanced Logical Inference

    Tai, Le and Van, Thin. LLMSR @ XLLM 25: Integrating Reasoning Prompt Strategies with Structural Prompt Formats for Enhanced Logical Inference. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.27

  39. [39]

    D oc IE @ XLLM 25: UIEP rompter: A Unified Training-Free Framework for universal document-level information extraction via Structured Prompt

    Qiu, Chengfeng and Zhou, Lifeng and Wei, Kaifeng and Li, Yuke. D oc IE @ XLLM 25: UIEP rompter: A Unified Training-Free Framework for universal document-level information extraction via Structured Prompt. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.28

  40. [40]

    LLMSR @ XLLM 25: SWRV : Empowering Self-Verification of Small Language Models through Step-wise Reasoning and Verification

    Chen, Danchun. LLMSR @ XLLM 25: SWRV : Empowering Self-Verification of Small Language Models through Step-wise Reasoning and Verification. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.29

  41. [41]

    LLMSR @ XLLM 25: An Empirical Study of LLM for Structural Reasoning

    Li, Xinye and Wan, Mingqi and Sui, Dianbo. LLMSR @ XLLM 25: An Empirical Study of LLM for Structural Reasoning. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.30

  42. [42]

    LLMSR @ XLLM 25: A Language Model-Based Pipeline for Structured Reasoning Data Construction

    Xing, Hongrui and Liu, Xinzhang and Jiang, Zhuo and Yang, Zhihao and Yao, Yitong and Wang, Zihan and Deng, Wenmin and Wang, Chao and Song, Shuangyong and Yang, Wang and He, Zhongjiang and Li, Yongxiang. LLMSR @ XLLM 25: A Language Model-Based Pipeline for Structured Reasoning Data Construction. Proceedings of the 1st Joint Workshop on Large Language Model...

  43. [43]

    S peech EE @ XLLM 25: Retrieval-Enhanced Few-Shot Prompting for Speech Event Extraction

    Gedeon, M \'a t \'e. S peech EE @ XLLM 25: Retrieval-Enhanced Few-Shot Prompting for Speech Event Extraction. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.32

  44. [44]

    Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

  45. [45]

    An introduction to computational identification and classification of Upam \= a alaṇk \= a ra

    Jadhav, Bhakti and Dutta, Himanshu and Kanitkar, Shruti and Kulkarni, Malhar and Bhattacharyya, Pushpak. An introduction to computational identification and classification of Upam \= a alaṇk \= a ra. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

  46. [46]

    Aesthetics of S anskrit Poetry from the Perspective of Computational Linguistics: A Case Study Analysis on \'S ikṣ \= a ṣṭaka

    Sandhan, Jivnesh and Barbadikar, Amruta and Maity, Malay and Satuluri, Pavankumar and Sandhan, Tushar and Gupta, Ravi M and Goyal, Pawan and Behera, Laxmidhar. Aesthetics of S anskrit Poetry from the Perspective of Computational Linguistics: A Case Study Analysis on \'S ikṣ \= a ṣṭaka. Computational Sanskrit and Digital Humanities - World Sanskrit Confere...

  47. [47]

    Itaretara Dvandva: A challenge for Dependency Tree semantics

    Kulkarni, Amba and Neelamana, Vasudha. Itaretara Dvandva: A challenge for Dependency Tree semantics. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

  48. [48]

    A Case Study of Handwritten Text Recognition from Pre-Colonial era S anskrit Manuscripts

    Chincholikar, Kartik and Dwivedi, Shagun and Gopalan, Kaushik and Awasthi, Tarinee. A Case Study of Handwritten Text Recognition from Pre-Colonial era S anskrit Manuscripts. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

  49. [49]

    Towards Accent-Aware V edic S anskrit Optical Character Recognition Based on Transformer Models

    Tsukagoshi, Yuzuki and Kuroiwa, Ryo and Ohmukai, Ikki. Towards Accent-Aware V edic S anskrit Optical Character Recognition Based on Transformer Models. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

  50. [50]

    Vedavani: A Benchmark Corpus for ASR on V edic S anskrit Poetry

    Kumar, Sujeet and Ray, Pretam and Beerukuri, Abhinay and Kamoji, Shrey and Jagadeeshan, Manoj Balaji and Goyal, Pawan. Vedavani: A Benchmark Corpus for ASR on V edic S anskrit Poetry. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

  51. [51]

    Compound Type Identification in S anskrit

    Krishnan, Sriram and Satuluri, Pavankumar and Barbadikar, Amruta and Prasanna Venkatesh, T S and Kulkarni, Amba. Compound Type Identification in S anskrit. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

  52. [52]

    IKML : A Markup Language for Collaborative Semantic Annotation of I ndic Texts

    Lakkundi, Chaitanya S and Rajaraman, Gopalakrishnan and Susarla, Sai Rama Krishna. IKML : A Markup Language for Collaborative Semantic Annotation of I ndic Texts. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

  53. [53]

    Challenges in Processing V edic S anskrit: Towards creating a normalized dataset for the Ṛgveda-saṃhit \= a

    Krishnan, Sriram and Gayathri, Sepuri and Kulkarni, Amba. Challenges in Processing V edic S anskrit: Towards creating a normalized dataset for the Ṛgveda-saṃhit \= a. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

  54. [54]

    P \= a ṇḍitya: Visualizing S anskrit Intellectual Networks

    Neill, Tyler. P \= a ṇḍitya: Visualizing S anskrit Intellectual Networks. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

  55. [55]

    Anveshana: A New Benchmark Dataset for Cross-Lingual Information Retrieval on E nglish Queries and S anskrit Documents

    Jagadeeshan, Manoj Balaji and Raj, Prince and Goyal, Pawan. Anveshana: A New Benchmark Dataset for Cross-Lingual Information Retrieval on E nglish Queries and S anskrit Documents. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

  56. [56]

    Concordance of S anskrit Synonyms

    Patel, Dhaval. Concordance of S anskrit Synonyms. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

  57. [57]

    Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

  58. [58]

    Chain-of- M eta W riting: Linguistic and Textual Analysis of How Small Language Models Write Young Students Texts

    Buhnila, Ioana and Cislaru, Georgeta and Todirascu, Amalia. Chain-of- M eta W riting: Linguistic and Textual Analysis of How Small Language Models Write Young Students Texts. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

  59. [59]

    Semantic Masking in a Needle-in-a-haystack Test for Evaluating Large Language Model Long-Text Capabilities

    Shi, Ken and Penn, Gerald. Semantic Masking in a Needle-in-a-haystack Test for Evaluating Large Language Model Long-Text Capabilities. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

  60. [60]

    Reading Between the Lines: A dataset and a study on why some texts are tougher than others

    Khallaf, Nouran and Eugeni, Carlo and Sharoff, Serge. Reading Between the Lines: A dataset and a study on why some texts are tougher than others. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

  61. [61]

    P ara R ev : Building a dataset for Scientific Paragraph Revision annotated with revision instruction

    Jourdan, L \'e ane and Boudin, Florian and Dufour, Richard and Hernandez, Nicolas and Aizawa, Akiko. P ara R ev : Building a dataset for Scientific Paragraph Revision annotated with revision instruction. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

  62. [62]

    Towards an operative definition of creative writing: a preliminary assessment of creativeness in AI and human texts

    Maggi, Chiara and Vitaletti, Andrea. Towards an operative definition of creative writing: a preliminary assessment of creativeness in AI and human texts. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

  63. [63]

    Decoding Semantic Representations in the Brain Under Language Stimuli with Large Language Models

    Sato, Anna and Kobayashi, Ichiro. Decoding Semantic Representations in the Brain Under Language Stimuli with Large Language Models. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

  64. [64]

    Proceedings of the 5th Wordplay: When Language meets Games Workshop (Wordplay 2025). 2025. doi:10.18653/v1/2025.wordplay-1.0

  65. [65]

    Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

  66. [66]

    A Comprehensive Taxonomy of Bias Mitigation Methods for Hate Speech Detection

    Fillies, Jan and Wawerek, Marius and Paschke, Adrian. A Comprehensive Taxonomy of Bias Mitigation Methods for Hate Speech Detection. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

  67. [67]

    Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation

    Antypas, Dimosthenis and Sen, Indira and Perez Almendros, Carla and Camacho-Collados, Jose and Barbieri, Francesco. Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

  68. [68]

    From civility to parity: Marxist-feminist ethics for context-aware algorithmic content moderation

    Oh, Dayei. From civility to parity: Marxist-feminist ethics for context-aware algorithmic content moderation. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

  69. [69]

    A Novel Dataset for Classifying G erman Hate Speech Comments with Criminal Relevance

    Kums, Vincent and Meyer, Florian and Pivit, Luisa and Vedenina, Uliana and Wortmann, Jonas and Siegel, Melanie and Labudde, Dirk. A Novel Dataset for Classifying G erman Hate Speech Comments with Criminal Relevance. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

  70. [70]

    Learning from Disagreement: Entropy-Guided Few-Shot Selection for Toxic Language Detection

    Caselli, Tommaso and Plaza-del-Arco, Flor Miriam. Learning from Disagreement: Entropy-Guided Few-Shot Selection for Toxic Language Detection. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

  71. [71]

    Debiasing Static Embeddings for Hate Speech Detection

    Sun, Ling and Kim, Soyoung and Dong, Xiao and K. Debiasing Static Embeddings for Hate Speech Detection. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

  72. [72]

    Web(er) of Hate: A Survey on How Hate Speech Is Typed

    Wang, Luna and Caines, Andrew and Hutchings, Alice. Web(er) of Hate: A Survey on How Hate Speech Is Typed. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

  73. [73]

    Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLM s for Countering Hate Speech

    Ngueajio, Mikel and Plaza-del-Arco, Flor Miriam and Chung, Yi-Ling and Rawat, Danda and Cercas Curry, Amanda. Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLM s for Countering Hate Speech. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

  74. [74]

    HODIAT : A Dataset for Detecting Homotransphobic Hate Speech in I talian with Aggressiveness and Target Annotation

    Damo, Greta and Cignarella, Alessandra Teresa and Caselli, Tommaso and Patti, Viviana and Nozza, Debora. HODIAT : A Dataset for Detecting Homotransphobic Hate Speech in I talian with Aggressiveness and Target Annotation. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

  75. [75]

    Beyond the Binary: Analysing Transphobic Hate and Harassment Online

    Talas, Anna and Hutchings, Alice. Beyond the Binary: Analysing Transphobic Hate and Harassment Online. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

  76. [76]

    Evading Toxicity Detection with ASCII -art: A Benchmark of Spatial Attacks on Moderation Systems

    Berezin, Sergey and Farahbakhsh, Reza and Crespi, Noel. Evading Toxicity Detection with ASCII -art: A Benchmark of Spatial Attacks on Moderation Systems. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

  77. [77]

    Debunking with Dialogue? Exploring AI -Generated Counterspeech to Challenge Conspiracy Theories

    Lisker, Mareike and Gottschalk, Christina and Mihaljevi \'c , Helena. Debunking with Dialogue? Exploring AI -Generated Counterspeech to Challenge Conspiracy Theories. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

  78. [78]

    M isinfo T ele G raph: Network-driven Misinformation Detection for G erman Telegram Messages

    Kalkbrenner, Lu and Solopova, Veronika and Zeiler, Steffen and Nickel, Robert and Kolossa, Dorothea. M isinfo T ele G raph: Network-driven Misinformation Detection for G erman Telegram Messages. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

  79. [79]

    Catching Stray Balls: Football, fandom, and the impact on digital discourse

    Hill, Mark. Catching Stray Balls: Football, fandom, and the impact on digital discourse. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

  80. [80]

    e , Justina and Rimkien \

    Mandravickait \. e , Justina and Rimkien \. e , Egl \. e and Petkevi c ius, Mindaugas and Songailait \. e , Milita and Zaranka, Eimantas and Krilavi c ius, Tomas. Exploring Hate Speech Detection Models for L ithuanian Language. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

Showing first 80 references.