Gives the first high-rate public-key PRC constructions for edit channels by reducing from Hamming-robust PRCs, achieving rates near 1 for large alphabets and near 1/2 for binary, plus Singleton-bound rates for polynomial alphabets.
hub
Can AI-Generated Text be Reliably Detected?
32 Pith papers cite this work. Polarity classification is still indexing.
abstract
Large Language Models (LLMs) perform impressively well in various applications. However, the potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use. Consequently, the reliable detection of AI-generated text has become a critical area of research. AI text detectors have shown to be effective under their specific settings. In this paper, we stress-test the robustness of these AI text detectors in the presence of an attacker. We introduce recursive paraphrasing attack to stress test a wide range of detection schemes, including the ones using the watermarking as well as neural network-based detectors, zero shot classifiers, and retrieval-based detectors. Our experiments conducted on passages, each approximately 300 tokens long, reveal the varying sensitivities of these detectors to our attacks. Our findings indicate that while our recursive paraphrasing method can significantly reduce detection rates, it only slightly degrades text quality in many cases, highlighting potential vulnerabilities in current detection systems in the presence of an attacker. Additionally, we investigate the susceptibility of watermarked LLMs to spoofing attacks aimed at misclassifying human-written text as AI-generated. We demonstrate that an attacker can infer hidden AI text signatures without white-box access to the detection method, potentially leading to reputational risks for LLM developers. Finally, we provide a theoretical framework connecting the AUROC of the best possible detector to the Total Variation distance between human and AI text distributions. This analysis offers insights into the fundamental challenges of reliable detection as language models continue to advance. Our code is publicly available at https://github.com/vinusankars/Reliability-of-AI-text-detectors.
hub tools
citation-role summary
citation-polarity summary
roles
background 4representative citing papers
A canary injection protocol for linking observed AI agent behavior to the responsible account at the hosting vendor, with robust variants for adversarial filtering.
Unsupervised style representations learned via paraphrase inversion enable competitive few-shot and zero-shot AI-text detection with better generalization to unseen LLMs than supervised baselines.
Introduces Lexical Alignment Score and Triangulated Preference Shift metrics to automatically identify lexical overuse in LLMs and attribute portions to preference learning stages via windowed prevalence on PubMed data.
Base model text evades AI detectors better than instruction-tuned text, and the HIP method strengthens this trade-off across model sizes.
PeerPrism benchmark demonstrates that state-of-the-art LLM detectors conflate surface text style with intellectual contribution and fail on hybrid human-AI peer reviews.
Triospect combines statistical, content, and expression views to detect AI text more robustly, reporting AUROC gains of 22.3% and 9.1% on two attacked benchmarks across 17 attacks and 17 models.
Steering LLM residual streams with random sparse vectors creates detectable self-recognition fingerprints that enable over 98% accurate attribution of generated text to specific models without degrading output quality.
Introduces a triangulation-based metric to quantify lexical shifts attributable to preference tuning without requiring manual curation of examples.
Analysis of news text in 34 languages shows cross-lingual convergence on AI-associated lemmas and increased prevalence of top AI-overused items after ChatGPT's release.
Agentic AI eliminates the fidelity-scale tradeoff in deception, enabling the Infinite Impostor attack that hijacks trusted relationships at mass scale and requiring a shift to suspect-by-default security based on evaluating actions rather than actors.
SteganoPrompt embeds a hidden instruction in assignment prompts via the Unicode Tags block so that LLMs add a detectable signature to responses when the prompt is pasted verbatim.
Recursive paraphrasing attacks substantially lower detection rates for multiple AI text detectors with only minor quality loss, while a theoretical analysis ties best-case AUROC to total variation distance between human and AI distributions.
LLM agents make collective belief dynamics programmable, with simulations showing coordinated agents induce stable belief shifts, and four structural properties that complicate detection and defense.
LLM-generated political discourse across crises is fluent yet caricatured: more negative, less emotionally varied, more structurally regular, and lexically abstract than observed online populations.
A new battery of 30 cognitive tasks demonstrates that process-level behavioral features distinguish humans from frontier AI agents better than performance metrics (mean AUC 0.88), with process-specific fine-tuning improving mimicry but limited cross-task transfer.
BREW uses block voting and window-shifting verification to reach TPR 0.965 and FPR 0.02 under 10% synonym substitution, addressing high false-positive issues in prior multi-bit LLM watermarking.
DSIPA is a zero-shot black-box detector that uses sentiment distribution consistency and preservation metrics to identify LLM text, reporting up to 49.89% F1 gains over baselines across domains and models.
Adaptive Stealing improves watermark theft efficiency from LLMs via Position-Based Seal Construction and Adaptive Selection modules that dynamically choose optimal attack perspectives.
AI-generated text detectors achieve high benchmark accuracy by exploiting unstable dataset-specific linguistic features, as evidenced by cross-domain degradation and differing SHAP explanations across corpora.
RedNote-Vibe supplies a longitudinal dataset of AI versus human lifestyle posts from 2020 to mid-2025 plus the PLAD detection framework that applies cognitive psychology signatures for improved AI-text identification.
Users in r/isthisAI and r/RealOrAI employ 12 evolving strategies for AI detection that shift with model capabilities and online trends.
Reveals hidden human-like spans in machine-generated texts that raise detection complexity and proposes a stacked enhancement framework with hard-EM optimization to improve detectors across LLMs.
A multi-level framework that models local and global relations among token detection scores to improve machine-generated text detection with low overhead.