AI for Auto-Research: Roadmap & User Guide

arxiv: 2605.18661 · v1 · pith:RA5SONDLnew · submitted 2026-05-18 · 💻 cs.AI

AI for Auto-Research: Roadmap & User Guide

Lingdong Kong , Xian Sun , Wei Chow , Linfeng Li , Kevin Qinghong Lin , Xuan Billy Zhang , Song Wang , Rong Li

show 12 more authors

Qing Wu Wei Gao Yingshuo Wang Shaoyuan Xie Jiachen Liu Leigang Qu Shijie Li Lai Xing Ng Benoit R. Cottereau Ziwei Liu Tat-Seng Chua Wei Tsang Ooi

This is my paper

classification 💻 cs.AI

keywords researchexperimentsagentsend-to-endfrontierideasprojectreview

0 comments p. Extension

pith:RA5SONDL Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{RA5SONDL}

Prints a linked pith:RA5SONDL badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

AI-assisted research is crossing a threshold: fully automated systems can now generate research papers for as little as $15, while long-horizon agents can execute experiments, draft manuscripts, and simulate critique with minimal human input. Yet this productivity frontier exposes a deeper integrity problem: under scientific pressure, even frontier LLMs still fabricate results, miss hidden errors, and fail to judge novelty reliably. Studying developments through April 2026, we present an end-to-end analysis of AI across the complete research lifecycle, organized into four epistemological phases: Creation (idea generation, literature review, coding & experiments, tables & figures), Writing (paper writing), Validation (peer review, rebuttal & revision), and Dissemination (posters, slides, videos, social media, project pages, and interactive agents). We identify a sharp, stage-dependent boundary between reliable assistance and unreliable autonomy: AI excels at structured, retrieval-grounded, and tool-mediated tasks, but remains fragile for genuinely novel ideas, research-level experiments, and scientific judgment. Generated ideas often degrade after implementation, research code lags far behind pattern-matching benchmarks, and end-to-end autonomous systems have not yet consistently reached major-venue acceptance standards. We further show that greater automation can obscure rather than eliminate failure modes, making human-governed collaboration the most credible deployment paradigm. Finally, we provide a structured taxonomy, benchmark suite, and tool inventory, cross-stage design principles, and a practitioner-oriented playbook, with resources maintained at our project page.

This paper has not been read by Pith yet.

AI for Auto-Research: Roadmap & User Guide

discussion (0)