SurveyLens: A Discipline-Aware Benchmark for Automatic Survey Generation

Beichen Guo; Haochen Shi; Haoyang Li; Jia Gu; Jian Wang; Ruosong Yang; Senzhang Wang; Shuaiqi Liu; Zhiyuan Wen

arxiv: 2602.11238 · v2 · pith:B5HMNEIMnew · submitted 2026-02-11 · 💻 cs.CL

SurveyLens: A Discipline-Aware Benchmark for Automatic Survey Generation

Beichen Guo , Zhiyuan Wen , Jia Gu , Haochen Shi , Jian Wang , Senzhang Wang , Haoyang Li , Ruosong Yang

show 1 more author

Shuaiqi Liu

This is my paper

classification 💻 cs.CL

keywords systemsacrossagentsdeepdiscipline-awaredisciplinesresearchsurvey

0 comments

read the original abstract

Automatic Survey Generation (ASG) aims to produce comprehensive literature surveys by retrieving, organizing, and synthesizing academic papers. Despite rapid progress in specialized ASG frameworks and Deep Research agents, existing evaluations largely center on Computer Science or rely on generic criteria, leaving it unclear whether current systems satisfy the survey standards of diverse disciplines. We introduce SurveyLens, the first discipline-aware ASG benchmark. SurveyLens comprises SurveyLens-1k, a curated dataset of 1,000 human-written surveys across 10 disciplines, and a dual-lens framework that combines discipline-aware rubric scoring with reference-based alignment to human-written surveys. Evaluating 11 state-of-the-art systems across vanilla LLMs, ASG systems, and Deep Research agents, we find that Deep Research agents are the only paradigm robust across all 10 disciplines, ASG systems lead on structural planning, and all paradigms remain weak on reference quality, providing practical guidance for discipline-specific tool selection and future ASG design.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness
cs.AI 2026-06 unverdicted novelty 6.0

Xcientist externalizes research synthesis and validation in AI scientists via contract-governed artifacts to maintain traceable trajectories and avoid claim drift across three domains.