AI reviews for all 22,977 AAAI-26 papers were preferred by authors and PC members over human reviews on accuracy and suggestions and outperformed baselines at spotting weaknesses.
https://arxiv.org/abs/2306.00622
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
MESSALA is a new LLM framework that produces report evaluations closer to veteran SOC practitioners than prior LLM methods by combining a custom checklist with granularization guidelines and multi-perspective scoring.
ShinkaEvolve improves sample efficiency in LLM-driven program evolution via parent sampling, code novelty rejection-sampling, and bandit LLM ensemble selection, achieving new SOTA circle packing with 150 samples and gains on math reasoning and competitive programming tasks.
A maximum likelihood model estimates 6.5-16.9% of peer-review text at ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023 was substantially modified by LLMs, with elevated rates in low-confidence and deadline-close submissions.
AI peer reviewers for POMP analyses show jagged performance: strong on technical error detection and invalid inference but weak on interpretive errors, narrative coherence, and domain-informed critique.
Peer review reports in AI conferences have grown longer and more standardized after LLMs, with increased emphasis on surface-level clarity and summaries at the expense of deeper critiques on originality and replicability.
ARIS is a three-layer open-source system that uses cross-model adversarial collaboration plus claim-auditing pipelines to make LLM-driven research workflows more reliable.
citing papers explorer
-
AI-Assisted Peer Review at Scale: The AAAI-26 AI Review Pilot
AI reviews for all 22,977 AAAI-26 papers were preferred by authors and PC members over human reviews on accuracy and suggestions and outperformed baselines at spotting weaknesses.
-
LLMs, You Can Evaluate It! Design of Multi-perspective Report Evaluation for Security Operation Centers
MESSALA is a new LLM framework that produces report evaluations closer to veteran SOC practitioners than prior LLM methods by combining a custom checklist with granularization guidelines and multi-perspective scoring.
-
ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution
ShinkaEvolve improves sample efficiency in LLM-driven program evolution via parent sampling, code novelty rejection-sampling, and bandit LLM ensemble selection, achieving new SOTA circle packing with 150 samples and gains on math reasoning and competitive programming tasks.
-
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
A maximum likelihood model estimates 6.5-16.9% of peer-review text at ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023 was substantially modified by LLMs, with elevated rates in low-confidence and deadline-close submissions.
-
Jagged AI in Scientific Peer Review: Evidence from POMP Data Analysis
AI peer reviewers for POMP analyses show jagged performance: strong on technical error detection and invalid inference but weak on interpretive errors, narrative coherence, and domain-informed critique.
-
Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI
Peer review reports in AI conferences have grown longer and more standardized after LLMs, with increased emphasis on surface-level clarity and summaries at the expense of deeper critiques on originality and replicability.
-
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration
ARIS is a three-layer open-source system that uses cross-model adversarial collaboration plus claim-auditing pipelines to make LLM-driven research workflows more reliable.