Beyond the Final Actor: Modeling the Dual Roles of Creator and Editor for Fine-Grained LLM-Generated Text Detection
Pith reviewed 2026-05-21 09:23 UTC · model grok-4.3
The pith
A detection method builds separate models of a text's original creator logic and later editor style to classify four types of human-LLM combinations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RACE models the distinct signatures of creator and editor by constructing an RST-based logic graph for the creator's foundation and extracting EDU-level features for the editor's style, which together enable accurate four-class classification of LLM-generated text with low false alarms.
What carries the argument
RACE (Rhetorical Analysis for Creator-Editor Modeling), which separates creator logic via RST graphs from editor style via EDU features to handle four-class detection.
Load-bearing premise
Distinct and consistent signatures of the creator's logical structure and the editor's stylistic edits can be reliably extracted via RST graphs and EDU-level features across varied texts and models.
What would settle it
A new test set containing texts from previously unseen LLMs and human editors where RACE loses its reported advantage over the twelve baselines or produces higher false alarms would falsify the central performance claim.
Figures
read the original abstract
The misuse of large language models (LLMs) requires precise detection of synthetic text. Existing works mainly follow binary or ternary classification settings, which can only distinguish pure human/LLM text or collaborative text at best. This remains insufficient for the nuanced regulation, as the LLM-polished human text and humanized LLM text often trigger different policy consequences. In this paper, we explore fine-grained LLM-generated text detection under a rigorous four-class setting. To handle such complexities, we propose RACE (Rhetorical Analysis for Creator-Editor Modeling), a fine-grained detection method that characterizes the distinct signatures of creator and editor. Specifically, RACE utilizes Rhetorical Structure Theory (RST) to construct a logic graph for the creator's foundation while extracting Elementary Discourse Unit (EDU)-level features for the editor's style. Experiments show that RACE outperforms 12 baselines in identifying fine-grained types with low false alarms, offering a policy-aligned solution for LLM regulation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces RACE (Rhetorical Analysis for Creator-Editor Modeling) for fine-grained four-class LLM-generated text detection. It models the creator's role via an RST-based logic graph capturing the foundational structure and the editor's role via EDU-level stylistic features. The central claim is that this dual-role decomposition allows distinguishing pure human text, pure LLM text, LLM-polished human text, and humanized LLM text, with experiments showing superiority over 12 baselines and low false positive rates.
Significance. Should the dual signatures prove separable and the empirical gains robust, this contributes a policy-aligned approach to LLM regulation by enabling finer distinctions than binary or ternary detectors. The application of established RST and EDU concepts to this new task is a strength, providing interpretable features rather than black-box classification. Credit is due for tackling the four-class setting which matches real-world collaborative scenarios.
major comments (2)
- [§3.2] §3.2 (RST Graph Construction): The logic graph is constructed by applying RST parsing directly to the final text. Editor modifications can insert, delete, or re-link rhetorical relations (e.g., Evidence to Elaboration) and alter nuclearity at the EDU level. No controlled comparison of pre-edit vs. post-edit graphs or ablation isolating creator-only structure is provided, so the claimed separation of creator foundation from editor style remains unverified and load-bearing for the four-class claim.
- [§4.3] §4.3 (Experimental Results): The reported outperformance over 12 baselines lacks per-class F1 scores, statistical significance tests, or ablation results removing either the RST graph or EDU features. Without these, it is unclear whether gains stem from the dual-role decomposition or simply from using a richer combined feature set.
minor comments (2)
- [Abstract] Abstract: 'Low false alarms' is stated without numerical values or the precise metric (e.g., false positive rate at fixed recall).
- [Figure 2] Figure 2: The EDU feature extraction diagram would benefit from explicit arrows showing how editor-style features are aggregated separately from the RST graph nodes.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments on our manuscript introducing RACE for four-class fine-grained LLM-generated text detection. We address each major comment point by point below, providing clarifications and committing to revisions that strengthen the empirical support for our dual-role modeling approach.
read point-by-point responses
-
Referee: [§3.2] §3.2 (RST Graph Construction): The logic graph is constructed by applying RST parsing directly to the final text. Editor modifications can insert, delete, or re-link rhetorical relations (e.g., Evidence to Elaboration) and alter nuclearity at the EDU level. No controlled comparison of pre-edit vs. post-edit graphs or ablation isolating creator-only structure is provided, so the claimed separation of creator foundation from editor style remains unverified and load-bearing for the four-class claim.
Authors: We appreciate the referee's emphasis on rigorously verifying the separation between creator and editor signatures. While RST parsing is indeed applied to the final text, our design rationale is that core rhetorical relations and nuclearity at the document level predominantly encode the creator's logical foundation, whereas editor interventions manifest more prominently in local EDU-level stylistic variations. To directly address the concern, we will add a controlled comparison in the revised manuscript: for the LLM-polished human and humanized LLM classes, we will compute and report differences in RST graphs between pre-edit (original) and post-edit versions, quantifying changes in relations and nuclearity. We will also include an ablation using only the RST graph features to isolate the creator component's contribution to four-class performance. revision: yes
-
Referee: [§4.3] §4.3 (Experimental Results): The reported outperformance over 12 baselines lacks per-class F1 scores, statistical significance tests, or ablation results removing either the RST graph or EDU features. Without these, it is unclear whether gains stem from the dual-role decomposition or simply from using a richer combined feature set.
Authors: We agree that these additional analyses are essential to substantiate the benefits of the dual-role decomposition. In the revised version, we will report per-class F1 scores for RACE and all 12 baselines. We will also add statistical significance testing (via paired t-tests across five random seeds) for the reported improvements. Furthermore, we will include ablation studies: (1) RACE without the RST graph (EDU features only), (2) RACE without EDU features (RST graph only), and (3) comparison against a simple concatenation baseline. These results will clarify that performance gains derive from modeling the distinct creator and editor roles rather than feature richness alone. revision: yes
Circularity Check
No circularity: established RST/EDU applied to new task without self-referential reduction
full rationale
The paper introduces RACE for four-class LLM text detection by applying Rhetorical Structure Theory to build logic graphs for the creator and EDU-level features for the editor. These are standard, externally defined frameworks from prior discourse analysis literature, not derived or fitted within the paper itself. No equations, parameters, or self-citations are shown to reduce the core method or claims back to the inputs by construction. Experiments compare against 12 external baselines, confirming the approach remains self-contained against independent benchmarks rather than tautological.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Rhetorical Structure Theory can reliably represent the logical foundation of text created by a human or model.
- domain assumption Elementary Discourse Unit level features capture editor-specific stylistic modifications.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
RACE utilizes Rhetorical Structure Theory (RST) to construct a logic graph for the creator's foundation while extracting Elementary Discourse Unit (EDU)-level features for the editor's style.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
RACE ... outperforms 12 baselines in identifying fine-grained types with low false alarms.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
How close is chatgpt to human experts? comparison corpus, evaluation, and detection,
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection.Preprint, arXiv:2301.07597. Xun Guo, Shan Zhang, Yongxin He, Ting Zhang, Wanquan Feng, Haibin Huang, and Chongyang Ma. 2024. DeTeCtive: detecting AI-generated text via multi-level contrastive learning. InProceedings of the 38th International Conference on Neural Informatio...
-
[2]
InThe Twelfth International Conference on Learning Representations
Few-Shot Detection of Machine-Generated Text using Style Representations. InThe Twelfth International Conference on Learning Representations. Jinyan Su, Terry Zhuo, Di Wang, and Preslav Nakov. 2023. DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text. InFindings of the Association for Computational Linguistics: EMN...
-
[3]
Real, Fake, or Manipulated? Detecting Machine-Influenced Text. InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 15022–15037. Association for Computational Linguistics. YuxiaWang, JonibekMansurov, PetarIvanov, JinyanSu, ArtemShelmanov, AkimTsvigun, ChenxiWhitehouse, Osama Mohammed Afzal, Tarek Mahmoud, Toru Sasaki, Thomas Arno...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.