{"total":12,"items":[{"citing_arxiv_id":"2606.09105","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Graph2Idea:Retrieval-Augmented Scientific Idea Generation with Graph-Structured Contexts","primary_cat":"cs.AI","submitted_at":"2026-06-08T06:58:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Graph2Idea builds dynamic knowledge graphs from retrieved literature to supply compact, relational contexts that guide LLMs in generating novel, feasible, and high-quality scientific ideas, outperforming flat-text baselines on automatic metrics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00644","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment","primary_cat":"cs.AI","submitted_at":"2026-05-30T09:41:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ForeSci is a temporally controlled benchmark with 500 tasks for assessing LLM agents on forward-looking AI research judgments in four domains using cutoff-aligned knowledge bases.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30961","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"EvoGens: A Population-Based Heuristic Search Framework for Scientific Idea Generation","primary_cat":"cs.CL","submitted_at":"2026-05-29T07:56:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"EvoGens uses rank-based mutation, semantic-aware crossover, and lightweight evaluation to evolve populations of LLM-generated scientific ideas, boosting novelty and diversity metrics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22878","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research","primary_cat":"cs.AI","submitted_at":"2026-05-20T16:03:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"SciAtlas builds a large-scale multi-disciplinary academic knowledge graph and a neuro-symbolic retrieval system to support automated scientific research tasks such as literature review and idea positioning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18661","ref_index":102,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AI for Auto-Research: Roadmap & User Guide","primary_cat":"cs.AI","submitted_at":"2026-05-18T17:08:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"First,iterative refinementuses feedback loops to improve idea specificity and reduce shallow novelty. ResearchAgent [10] incorporates academic graph feedback to refine generated ideas, SciMON [209] iteratively compares candidate ideas against prior work to mitigate the tendency of direct LLM prompting toward shallow contributions, and Chain of Ideas [102] organizes literature into progressive reasoning chains that outperform simple prompting baselines. Second,learned quality signalsintroduce explicit scoring or optimization objectives. Spark [168] combines retrieval-augmented generation with a judge model trained on600K OpenReview reviews to estimate creativity, DeepInnovator [39] trains a14B model under a \"Next Idea Prediction\" paradigm and reports80-94%win"},{"citing_arxiv_id":"2605.14790","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Graphs of Research: Citation Evolution Graphs as Supervision for Research Idea Generation","primary_cat":"cs.CL","submitted_at":"2026-05-14T12:57:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"GoR extracts citation DAGs using position, frequency, predecessor links and time, then fine-tunes Qwen2.5-7B on 498 seed papers to generate ideas, claiming SOTA over gpt-4o baselines via LLM judges.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.28158","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists","primary_cat":"cs.AI","submitted_at":"2026-04-30T17:44:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Intern-Atlas constructs a methodological evolution graph with 9.4 million edges from 1.03 million AI papers to capture how methods emerge, adapt, and transition, enabling better idea evaluation and generation for AI-driven research.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09793","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"GIANTS: Generative Insight Anticipation from Scientific Literature","primary_cat":"cs.CL","submitted_at":"2026-04-10T18:13:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"GIANTS-4B, trained with RL on a new 17k-example benchmark of parent-to-child paper insights, achieves 34% relative improvement over gemini-3-pro in LM-judge similarity and is rated higher-impact by a citation predictor.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2508.21720","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PosterForest: Hierarchical Multi-Agent Collaboration for Scientific Poster Generation","primary_cat":"cs.AI","submitted_at":"2025-08-29T15:36:06+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.21035","ref_index":61,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis","primary_cat":"cs.AI","submitted_at":"2025-07-28T17:55:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming prior methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"The programming agent is powered by Claude Sonnet 4 [5], selected for its strong agentic coding abilities. OpenAI o3 [80], known for its robust rea- soning capabilities, serves dual roles-guiding the planning logic of programming agents and enabling the Code Reviewer to detect bugs and suggest targeted fixes. Gemini 2.5 Pro [27], one of the top- performing models on the GPQA [88] and HLE [61] benchmarks, serves as the backbone of the Domain Expert agent, providing broad and accurate scientific knowledge with particular strength in biology. Task Orchestration The PI agent centrally orchestrates analysis workflows by adhering to the depen- dency structure intrinsic to gene expression analysis. For each GTA task, it identifies cohorts that have"},{"citing_arxiv_id":"2507.11810","ref_index":86,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evolving Roles of LLMs in Scientific Innovation: Assistant, Collaborator, Scientist, and Evaluator","primary_cat":"cs.DL","submitted_at":"2025-07-16T00:11:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"0 [197] 36M papers, 1.3M patents Biomedicine Translational-pathway tracing; entity-level in- novation discovery NLPeer [32] 11,515 reviews, 5,672 papers NLP, ML Review-score prediction; sentence-role classifi- cation; guided skimming ASAP-Review [213] 28,119 reviews, 8,877 papers ML Aspect-aware review generation; bias and fair- ness analysis PEERSUM [86] 14,993 triples ML Meta-review summarization; conflict-aware generation MOPRD [94] 6,578 papers, 22,483 com- ments Biology, Chem, CS, Med Structured comment generation; cross- disciplinary discourse modelling ReviewCritique [28] 100 papers, 440 reviews NLP Review-quality assessment; deficiency detec- tion; meta-reviewing REVIEWER2 [40] 27,805 papers, 99,727 re-"},{"citing_arxiv_id":"2504.19678","ref_index":145,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review","primary_cat":"cs.AI","submitted_at":"2025-04-28T11:08:22+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.","context_count":1,"top_context_role":"other","top_context_polarity":"background","context_text":"the biomedical domain, platforms like GeneAgent [141] and frameworks such as PRefLexOR [142] demonstrate enhanced reliability through self-verification and iterative refinement. Moreover, innovative solutions for research ideation, exem- plified by SurveyX [143] and Chain-of-Ideas [144], as well as specialized frameworks for synthetic data generation [145] and chemical reasoning [146], collectively underscore the significant strides made in leveraging autonomous AI agents for complex, real-world tasks. Table V presents an overview of AI Agent frameworks. A. AI Agent frameworks AI agent frameworks represent a transformative paradigm in developing intelligent systems, combining the power of large language models with modular tools and utilities to"}],"limit":50,"offset":0}