pith. machine review for the scientific record. sign in

arxiv: 2605.09915 · v1 · submitted 2026-05-11 · 💻 cs.CL · cs.AI· cs.CY

Recognition: no theorem link

Position: Academic Conferences are Potentially Facing Denominator Gaming Caused by Fully Automated Scientific Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:57 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CY
keywords denominator gamingAI agentsacademic conferencessubmission volumeacceptance ratesreviewer burnoutscientific publishing
0
0 comments X

The pith

Malicious actors could deploy AI agents to flood conferences with low-quality papers, inflating submission counts and overwhelming reviews under stable acceptance rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that top AI conferences' habit of holding acceptance rates roughly steady even as submissions grow exponentially creates an opening for a new kind of attack. A malicious actor could use fully automated AI agents to generate and submit thousands of superficially plausible but low-quality papers, not to get those papers accepted but simply to swell the total number of submissions. With a fixed acceptance rate, more papers overall would then clear the bar, raising the chance that a small number of targeted legitimate papers also get through while reviewers become exhausted and review quality drops. The authors examine how feasible this tactic is today and what it would do to the publishing system. Readers should care because it turns a routine policy choice into a structural weakness that could degrade the entire review process and encourage automated paper mills.

Core claim

The central claim is that Agentic Denominator Gaming is a viable systemic threat: malicious actors can use AI agents to mass-produce and submit low-quality papers solely to enlarge the submission denominator, which, under a stable acceptance rate, systematically raises the publication probability of a small set of legitimate papers while exhausting reviewer capacity and degrading review quality.

What carries the argument

Agentic Denominator Gaming: the deliberate use of fully automated AI agents to generate and submit large volumes of superficially plausible low-quality papers whose sole purpose is to increase total submissions and thereby dilute the acceptance pool.

If this is right

  • Reviewer burnout increases because the same number of reviewers must handle a larger submission load.
  • Average review quality falls as reviewers have less time per paper.
  • Industrialized automated agent mills appear to produce papers at scale.
  • Durable defense requires changes to conference policies and incentives, not only technical detection tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Conferences might need per-author or per-institution submission caps to limit volume manipulation.
  • Moving from percentage-based acceptance rates to absolute quality thresholds could reduce the incentive to game the denominator.
  • Widespread use of such agents could accelerate development of AI-assisted review systems as a practical response.
  • If unchecked, the practice might reduce overall trust in conference proceedings as reliable indicators of research quality.

Load-bearing premise

Fully automated AI agents can already or will soon produce large volumes of papers that look plausible enough to avoid quick detection, and conferences will continue using roughly fixed acceptance rates even as submission numbers rise sharply.

What would settle it

A clear falsifier would be if upcoming major conferences show no detectable surge in low-quality AI-generated submissions despite rising AI capabilities, or if they respond to volume growth by lowering acceptance rates rather than accepting more papers.

Figures

Figures reproduced from arXiv: 2605.09915 by Hang Zheng, Jiachen Zhu, Jianghao Lin, Rong Shan, Te Gao, Weinan Zhang, Yong Yu, Yunjia Xi, Zeyu Zheng.

Figure 1
Figure 1. Figure 1: An illustration of the mechanism of Agentic Conference Denominator Gaming. Assume a conference maintains a stable acceptance rate of around 25%. If the denominator increases from 4 to 12, the number of accepted papers must rise from 1 to 3. Without an influx of new high-quality submissions, this mechanism forces the conference to lower its quality threshold, effectively rescuing human-authored papers that … view at source ↗
Figure 2
Figure 2. Figure 2: The structure of our position paper. controllable denominator that shapes each paper’s marginal acceptance probability. A malicious actor could exploit this norm by deploying AI agents to generate and submit large volumes of plausible-but-low-value manuscripts at negli￾gible marginal cost. Crucially, these submissions are not intended to be accepted; instead, they function as strategic noise that inflates … view at source ↗
Figure 3
Figure 3. Figure 3: Acceptance rates and submission numbers of some AI conferences from 2019 to 2024. The displayed conferences include ACL, EMNLP, CVPR, ICCV, ICML, NeurIPS, ICLR, COLT, UAI, AISTATS, AAAI, IJCAI, KDD, SIGIR, WWW, INTERSPEECH, ICASSP and ICRA. Some of them are highlighted due to fast submission growth yet remarkably stable acceptance rate. set of targeted manuscripts that they want published. From a game-theo… view at source ↗
Figure 4
Figure 4. Figure 4: A proof-of-concept threat model for Agentic Confer￾ence Denominator Gaming. It can be achieved by a multi-agent pipeline, consisting of two agents: research agent for plausible-but￾low-value paper generation and submission agent for automated OpenReview Flooding. 4. Broader Impacts and Systemic Risks The successful execution of Agentic Conference Denomi￾nator Gaming would have profound and damaging conse￾q… view at source ↗
read the original abstract

The implicit policy of maintaining relatively stable acceptance rates at top AI conferences, despite exponentially growing submissions, introduces a critical structural vulnerability. This position paper characterizes a new systemic threat we term Agentic Denominator Gaming, in which a malicious actor deploys AI agents to generate and submit a large volume of superficially plausible but low-quality papers. Crucially, their objective is not the acceptance of low-quality papers, but rather to inflate the submission denominator and overwhelm reviewing capacity. Under a relatively stable acceptance rate, this dilution can systematically increase the publication probability of a small, targeted set of legitimate papers. We analyze the practical feasibility of this threat and its broader consequences, including intensified reviewer burnout, degraded review quality, and the emergence of industrialized automated agent mills. Finally, we propose and evaluate a range of mitigation strategies, and argue that durable protection will require system-level policy and incentive reforms, rather than relying primarily on technical detection alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that top AI conferences' practice of maintaining stable acceptance rates amid exponentially rising submissions creates a structural vulnerability to 'Agentic Denominator Gaming.' A malicious actor could use fully automated AI agents to submit large volumes of superficially plausible but low-quality papers, inflating the submission denominator to overwhelm reviewing capacity. Under stable percentage-based acceptance, this is argued to increase the publication probability of a small targeted set of legitimate papers (not by getting the low-quality ones accepted, but by diluting the pool). The manuscript analyzes feasibility, discusses consequences including reviewer burnout and automated agent mills, evaluates mitigation strategies, and advocates for system-level policy reforms over reliance on technical detection.

Significance. If the core mechanism holds, the position identifies a serious systemic risk to peer review integrity in fast-growing fields, with potential to accelerate burnout and erode trust in conference outcomes. The paper earns credit for proactively framing the threat, naming the phenomenon, and outlining a range of mitigations, even as a position piece without quantitative models or simulations.

major comments (2)
  1. [Threat mechanism / Agentic Denominator Gaming definition] The central claim in the threat characterization (abstract and main argument) that denominator inflation via low-quality AI submissions will increase acceptance odds for targeted legitimate papers relies on reviewer overload as the operative channel. However, no model, simulation, or formal analysis is provided to show how overload produces selective benefit for the actor's papers rather than uniform degradation of review quality or simply raising total acceptances while preserving merit-based thresholds. This selection-dynamics assumption is load-bearing and unaddressed.
  2. [Feasibility and practical analysis] The feasibility analysis lacks any quantitative estimates, scaling arguments, or references to empirical AI paper-generation capabilities that would substantiate the ability to produce large volumes of superficially plausible papers at low detection risk. Without this, the immediacy and practicality of the threat remain difficult to evaluate.
minor comments (2)
  1. [Abstract] The abstract states that mitigation strategies are 'proposed and evaluated,' but the evaluation details (e.g., any criteria or comparative assessment) are not summarized, which would aid reader understanding.
  2. [Introduction / Terminology] The invented term 'Agentic Denominator Gaming' is used without an explicit formal definition or comparison to analogous concepts such as Sybil attacks or submission flooding in other domains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments on our position paper. We have carefully reviewed the major concerns regarding the threat mechanism and feasibility analysis. Below we respond point by point, clarifying our conceptual approach as a position piece while indicating revisions made to address the feedback.

read point-by-point responses
  1. Referee: The central claim in the threat characterization (abstract and main argument) that denominator inflation via low-quality AI submissions will increase acceptance odds for targeted legitimate papers relies on reviewer overload as the operative channel. However, no model, simulation, or formal analysis is provided to show how overload produces selective benefit for the actor's papers rather than uniform degradation of review quality or simply raising total acceptances while preserving merit-based thresholds. This selection-dynamics assumption is load-bearing and unaddressed.

    Authors: We appreciate the referee's identification of this key assumption. As a position paper, our intent is to characterize a structural vulnerability arising from stable acceptance rates amid rising submissions, rather than to deliver a formal model or simulation. The proposed mechanism is that a malicious actor submits a large volume of low-quality papers alongside a small number of high-quality targeted papers; under a fixed acceptance percentage, the inflated denominator increases the absolute number of acceptances. If the low-quality papers are rejected on merit (or detected), the additional acceptances can accrue to strong papers, including the actor's targeted ones, without those papers needing to outcompete an unchanged pool. We acknowledge, however, that overload could instead produce uniform degradation of review quality or non-selective effects, and that this dynamic is not formally demonstrated. In the revised manuscript we have added a new subsection explicitly discussing the selection-dynamics assumptions, alternative outcomes under overload, and the limitations of the conceptual argument. We agree that empirical modeling would be valuable future work but lies outside the scope of this position piece. revision: partial

  2. Referee: The feasibility analysis lacks any quantitative estimates, scaling arguments, or references to empirical AI paper-generation capabilities that would substantiate the ability to produce large volumes of superficially plausible papers at low detection risk. Without this, the immediacy and practicality of the threat remain difficult to evaluate.

    Authors: We agree that quantitative grounding would improve the evaluation of immediacy. The original manuscript provides a qualitative feasibility discussion drawing on documented advances in LLM-based scientific text generation and agentic workflows. In response to this comment, the revised version incorporates additional references to empirical studies on AI-generated research content, rough order-of-magnitude estimates for the compute and cost required to produce and submit thousands of papers, and a brief assessment of current detection limitations based on existing tools. These additions substantiate the practicality argument while preserving the paper's focus on systemic policy implications rather than technical implementation details. revision: yes

Circularity Check

0 steps flagged

No circularity: position paper argument relies on external trends and policy assumptions, not self-referential derivations

full rationale

The paper advances a conceptual position on 'Agentic Denominator Gaming' by linking observed exponential submission growth, stable acceptance-rate policies, and emerging AI agent capabilities to a hypothesized vulnerability. No equations, fitted parameters, or predictions appear in the provided text. The central claim is not derived from any internal model that reduces to its own inputs; instead, it rests on external observations of conference behavior and AI progress. Self-citations, if present in the full manuscript, are not load-bearing for the core argument per the guidelines, as the reasoning does not invoke uniqueness theorems or ansatzes from prior author work to close a loop. This is a standard non-finding for a non-technical position paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The position rests on two key domain assumptions about conference behavior and AI capabilities, with the new term 'Agentic Denominator Gaming' introduced as an invented framing device that has no independent empirical support outside the paper.

axioms (2)
  • domain assumption Top AI conferences maintain relatively stable acceptance rates despite exponentially growing submissions.
    Stated explicitly as the implicit policy that creates the structural vulnerability.
  • domain assumption AI agents can generate and submit large volumes of superficially plausible but low-quality papers at scale.
    Assumed as practically feasible based on current AI progress, without detailed technical validation in the abstract.
invented entities (1)
  • Agentic Denominator Gaming no independent evidence
    purpose: To name and characterize the strategy of using AI agents to inflate submission volume for the purpose of diluting acceptance probabilities.
    New term coined by the authors to frame the threat; no independent evidence or prior usage provided.

pith-pipeline@v0.9.0 · 5482 in / 1507 out tokens · 99025 ms · 2026-05-12T04:57:53.519354+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 3 internal anchors

  1. [1]

    Position: T he current AI conference model is unsustainable! diagnosing the crisis of centralized AI conference, 2025

    Chen, N., Duan, M., Lin, A. H., Wang, Q., Wu, J., and He, B. Position: The current ai conference model is unsustainable! diagnosing the crisis of centralized ai conference.arXiv preprint arXiv:2508.04586,

  2. [2]

    S., Saberi, M., Saha, S., and Feizi, S

    Cheng, Y ., Sadasivan, V . S., Saberi, M., Saha, S., and Feizi, S. Adversarial paraphrasing: A universal at- tack for humanizing ai-generated text.arXiv preprint arXiv:2506.07001,

  3. [3]

    Guidance on ai detection and why we’re disabling turnitin’s ai detector

    Coley, M. Guidance on ai detection and why we’re disabling turnitin’s ai detector. https://www.vanderbilt .edu/brightspace/2023/08/16/guidance -on-ai-detection-and-why-were-disabli ng-turnitins-ai-detector/,

  4. [4]

    URL https://www.ft c.gov/news-events/news/press-release s/2019/10/devumi-owner-ceo-settle-ftc -charges-they-sold-fake-indicators-s ocial-media-influence-cosmetics-firm. FTC. FTC Brings First-Ever Cases Under the BOTS Act, January

  5. [5]

    Gao, Y ., Xu, G., Li, L., Luo, X., Wang, C., and Sui, Y

    URL https://www.ftc.gov/ne ws-events/news/press-releases/2021/0 1/ftc-brings-first-ever-cases-under-b ots-act. Gao, Y ., Xu, G., Li, L., Luo, X., Wang, C., and Sui, Y . Demystifying the underground ecosystem of account reg- istration bots. InProceedings of the 30th ACM Joint European Software Engineering Conference and Sympo- sium on the Foundations of So...

  6. [6]

    Skillprobe: Security auditing for emerging agent skill marketplaces via multi-agent collaboration.arXiv preprint arXiv:2603.21019, 2026

    Guo, Z., Chen, Z., Nie, X., Lin, J., Zhou, Y ., and Zhang, W. Skillprobe: Security auditing for emerging agent skill marketplaces via multi-agent collaboration.arXiv preprint arXiv:2603.21019,

  7. [7]

    L., Chen, N., Gong, Y ., and He, B

    Hou, J., Huikai, A. L., Chen, N., Gong, Y ., and He, B. Paperdebugger: A plugin-based multi-agent system for in-editor academic writing, review, and editing.arXiv preprint arXiv:2512.02589,

  8. [8]

    Deep research agents: A systematic examination and roadmap.arXiv preprint arXiv:2506.18096, 2025

    Huang, Y ., Chen, Y ., Zhang, H., Li, K., Zhou, H., Fang, M., Yang, L., Li, X., Shang, L., Xu, S., et al. Deep research agents: A systematic examination and roadmap.arXiv preprint arXiv:2506.18096,

  9. [9]

    Icml 2026 call for papers

    ICML26CFP. Icml 2026 call for papers. https://ic ml.cc/Conferences/2026/CallForPapers ,

  10. [10]

    Ijcai-ecai 2026 call for papers (main track)

    IJCAI26CFP. Ijcai-ecai 2026 call for papers (main track). https://2026.ijcai.org/ijcai-ecai-202 6-call-for-papers-main-track/,

  11. [11]

    Submission tsunami at neurips 2025: Is peer review about to collapse? https://forum.cspaper.or g/topic/76/submission-tsunami-at-neu rips-2025-is-peer-review-about-to-c ollapse,

    Joanne. Submission tsunami at neurips 2025: Is peer review about to collapse? https://forum.cspaper.or g/topic/76/submission-tsunami-at-neu rips-2025-is-peer-review-about-to-c ollapse,

  12. [12]

    Position: The ai conference peer review crisis demands author feedback and reviewer rewards.arXiv preprint arXiv:2505.04966,

    Kim, J., Lee, Y ., and Lee, S. Position: The ai conference peer review crisis demands author feedback and reviewer rewards.arXiv preprint arXiv:2505.04966,

  13. [13]

    doi: 10.17148/IJARCCE.2024.13901. URL https: //www.researchgate.net/publication/3 83800199_Assessing_Security_Vulnerab ilities_in_University_Student_Manage ment_Information_Systems_SMIS_and_Th eir_Impact_on_Student_Data_Security . Accessed: 2026-05-09. Lee, Y ., Ferber, D., Rood, J. E., Regev, A., and Kather, J. N. How ai agents will change cancer researc...

  14. [14]

    Stop ddos attacking the research community with ai-generated survey papers.arXiv preprint arXiv:2510.09686, 2025a

    Lin, J., Shan, R., Zhu, J., Xi, Y ., Yu, Y ., and Zhang, W. Stop ddos attacking the research community with ai-generated survey papers.arXiv preprint arXiv:2510.09686, 2025a. Lin, J., Zhu, J., Zhou, Z., Xi, Y ., Liu, W., Yu, Y ., and Zhang, W. Superplatforms have to attack ai agents, 2025b. URL https://arxiv.org/abs/2505.17861. Lu, C., Lu, C., Lange, R. T...

  15. [15]

    Madaan, A

    URL https://arxiv.org/abs/2601.14171. Madaan, A. The ai data validation imperative: Guarding against adversarial attacks. https://econone.co m/data-analytics/resources/blogs/ai-a dversarial-attacks/,

  16. [16]

    Mathewson, T. G. Ai detection tools falsely accuse inter- national students of cheating. https://themarkup. org/machine-learning/2023/08/14/ai-d etection-tools-falsely-accuse-interna tional-students-of-cheating,

  17. [17]

    Neurips 2025 policy on the use of large language models

    NeurIPS25CFP. Neurips 2025 policy on the use of large language models. https://neurips.cc/Confe rences/2025/LLM,

  18. [18]

    Paul Esau, Nazar Shmatko, A. A. Gptzero finds over 50 new hallucinations in iclr 2026 submissions. https: //gptzero.me/news/iclr-2026/,

  19. [19]

    doi: 10.1142/11326

    ISBN 978-981-120-477-5. doi: 10.1142/11326. URL https://doi.org/10.1142/11326. Popkov, A. A. and Barrett, T. S. Ai vs academia: Experi- mental study on ai text detectors’ accuracy in behavioral health academic writing.Accountability in Research, 32 (7):1072–1088,

  20. [20]

    Towards scientific intelligence: A survey of llm-based scientific agents.arXiv preprint arXiv:2503.24047, 2025

    Ren, S., Jian, P., Ren, Z., Leng, C., Xie, C., and Zhang, J. Towards scientific intelligence: A survey of llm-based scientific agents.arXiv preprint arXiv:2503.24047,

  21. [21]

    Agentrxiv: Towards collaborative au- tonomous research,

    Schmidgall, S. and Moor, M. Agentrxiv: Towards collab- orative autonomous research, 2025.URL: http://arxiv. org/abs/2503.18102. doi,

  22. [22]

    Schmidgall, Y

    Schmidgall, S., Su, Y ., Wang, Z., Sun, X., Wu, J., Yu, X., Liu, J., Liu, Z., and Barsoum, E. Agent laboratory: Using llm agents as research assistants, 2025.URL https://arxiv. org/abs/2501.04227. Shah, N. B., Bok, M., Liu, X., and McCallum, A. Identity theft in ai conference peer review.Communications of the ACM, 68(12):32–34,

  23. [23]

    Galactica: A Large Language Model for Science

    Taylor, R., Kardas, M., Cucurull, G., Scialom, T., Hartshorn, A., Saravia, E., Poulton, A., Kerkez, V ., and Stojnic, R. Galactica: A large language model for science.arXiv preprint arXiv:2211.09085,

  24. [24]

    Hope, signals, and silicon: A game-theoretic model of the pre-doctoral academic labor market in the age of ai.arXiv preprint arXiv:2511.00068,

    Wang, S. Hope, signals, and silicon: A game-theoretic model of the pre-doctoral academic labor market in the age of ai.arXiv preprint arXiv:2511.00068,

  25. [25]

    Under review

    Wei, J., Yang, Y ., Zhang, X., Chen, Y ., Zhuang, X., Gao, Z., Zhou, D., Wang, G., Gao, Z., Cao, J., et al. From ai for science to agentic science: A survey on autonomous scientific discovery.arXiv preprint arXiv:2508.14111,

  26. [26]

    Who pays for conference reviews? https:// matt-welsh.blogspot.com/2010/03/who-p ays-for-conference-reviews.html,

    Welsh, M. Who pays for conference reviews? https:// matt-welsh.blogspot.com/2010/03/who-p ays-for-conference-reviews.html,

  27. [27]

    A survey of llm-based deep search agents: Paradigm, optimization, evaluation, and challenges.arXiv preprint arXiv:2508.05668, 2025

    Xi, Y ., Lin, J., Xiao, Y ., Zhou, Z., Shan, R., Gao, T., Zhu, J., Liu, W., Yu, Y ., and Zhang, W. A survey of llm-based deep search agents: Paradigm, optimization, evaluation, and challenges.arXiv preprint arXiv:2508.05668,

  28. [28]

    The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

    Yamada, Y ., Lange, R. T., Lu, C., Hu, S., Lu, C., Foerster, J., Clune, J., and Ha, D. The ai scientist-v2: Workshop-level 11 Position: Agentic Conference Denominator Gaming automated scientific discovery via agentic tree search. arXiv preprint arXiv:2504.08066,

  29. [29]

    Paper copilot: Tracking the evolution of peer review in ai conferences.arXiv preprint arXiv:2510.13201, 2025a

    Yang, J., Wei, Q., and Pei, J. Paper copilot: Tracking the evolution of peer review in ai conferences.arXiv preprint arXiv:2510.13201, 2025a. Yang, Y ., Chai, H., Song, Y ., Qi, S., Wen, M., Li, N., Liao, J., Hu, H., Lin, J., Chang, G., Liu, W., Wen, Y ., Yu, Y ., and Zhang, W. A survey of ai agent protocols, 2025b. URLhttps://arxiv.org/abs/2504.16736. Zh...

  30. [30]

    Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

    Zhou, C., Chai, H., Chen, W., Guo, Z., Shan, R., Song, Y ., Xu, T., Yang, Y ., Yu, A., Zhang, W., et al. Exter- nalization in llm agents: A unified review of memory, skills, protocols and harness engineering.arXiv preprint arXiv:2604.08224,

  31. [31]

    More Discussions A.1

    A. More Discussions A.1. Attack Incentives One may question why the attacker bears the costs them- selves while the benefits are broadly shared among all legit- imate authors. Our responses to this concern are as follows: First, it is possible that the benefits of such an attack may partially accrue to other authors. However, these benefits are neither un...

  32. [32]

    Higher detection scores correlate with an increased probability of AI involvement. Tables 2 presents the detection scores for both accepted and non-accepted (including rejected and withdrawn) papers across different years, with the Average Annual Growth Rate (AAGR) reported to highlight the trend over time. Our key observations are as follows: • Higher AI...