Adapts change point detection to segment human-LLM co-authored text using weighted and generalized algorithms with minimax optimality and strong empirical results against baselines.
hub
How close is ChatGPT to human experts? Comparison corpus, evaluation, and detection
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
dataset 2polarities
use dataset 2representative citing papers
RACE uses Rhetorical Structure Theory for creator logic graphs and Elementary Discourse Unit features for editor style to outperform baselines on four-class LLM text detection.
MELD is a multi-task AI-text detector using auxiliary heads, uncertainty-weighted losses, EMA distillation, and pairwise ranking that reaches 99.9% TPR at 1% FPR on a new held-out benchmark while remaining competitive on the RAID leaderboard.
Newer LLMs exhibit reduced syntactic and lexical diversity in English news text generation compared to older models, as measured by HPSG grammar and diversity metrics from ecology and information theory, while human-authored text shows little change.
PUPPET jointly optimizes LLM outputs for high detectability and task performance via RL rewards from a detector and a task evaluator, outperforming watermarking on tasks while matching detectability.
A two-layer certification framework decouples knowledge validity from human authorship to accommodate AI-enabled research in existing publication systems.
A unified synthetic data generation pipeline produces unlimited annotated multimodal video data across multiple tasks, enabling models trained mostly on synthetic data to generalize effectively to real-world video understanding benchmarks.
Feature-augmented DeBERTa-v3-base with attention-based fusion reaches 85.9% balanced accuracy on the multi-domain M4 benchmark under fixed-threshold evaluation, outperforming zero-shot baselines by up to 7.22 points.
BART-large outperforms Mistral-7B in AI-to-human style transfer with higher reference similarity scores and far fewer parameters, while showing that marker shift can reflect overshoot rather than accurate transfer.
This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.
citing papers explorer
-
Segmenting Human-LLM Co-authored Text via Change Point Detection
Adapts change point detection to segment human-LLM co-authored text using weighted and generalized algorithms with minimax optimality and strong empirical results against baselines.
-
Beyond the Final Actor: Modeling the Dual Roles of Creator and Editor for Fine-Grained LLM-Generated Text Detection
RACE uses Rhetorical Structure Theory for creator logic graphs and Elementary Discourse Unit features for editor style to outperform baselines on four-class LLM text detection.
-
MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text
MELD is a multi-task AI-text detector using auxiliary heads, uncertainty-weighted losses, EMA distillation, and pairwise ranking that reaches 99.9% TPR at 1% FPR on a new held-out benchmark while remaining competitive on the RAID leaderboard.
-
More Aligned, Less Diverse? Analyzing the Grammar and Lexicon of Two Generations of LLMs
Newer LLMs exhibit reduced syntactic and lexical diversity in English news text generation compared to older models, as measured by HPSG grammar and diversity metrics from ecology and information theory, while human-authored text shows little change.
-
LLM Output Detectability and Task Performance Can be Jointly Optimized
PUPPET jointly optimizes LLM outputs for high detectability and task performance via RL rewards from a detector and a task evaluator, outperforming watermarking on tasks while matching detectability.
-
Rethinking Publication: A Certification Framework for AI-Enabled Research
A two-layer certification framework decouples knowledge validity from human authorship to accommodate AI-enabled research in existing publication systems.
-
All in One: A Unified Synthetic Data Pipeline for Multimodal Video Understanding
A unified synthetic data generation pipeline produces unlimited annotated multimodal video data across multiple tasks, enabling models trained mostly on synthetic data to generalize effectively to real-world video understanding benchmarks.
-
Feature-Augmented Transformers for Robust AI-Text Detection Across Domains and Generators
Feature-augmented DeBERTa-v3-base with attention-based fusion reaches 85.9% balanced accuracy on the multi-domain M4 benchmark under fixed-threshold evaluation, outperforming zero-shot baselines by up to 7.22 points.
-
Please Make it Sound like Human: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer
BART-large outperforms Mistral-7B in AI-to-human style transfer with higher reference similarity scores and far fewer parameters, while showing that marker shift can reflect overshoot rather than accurate transfer.
-
A Survey of Large Language Models
This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.