FARS deployed at scale produced 166 AI/ML papers across 67 topics that received 282 structured human reviews indicating some review-worthy outputs alongside recurring failure modes.
Mlr-copilot: Autonomous machine learning research based on large language models agents
12 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
FML-Bench shows a simple greedy hill-climber nearly matches tree search on dense-opportunity tasks while an adaptive agent that broadens search on stagnation outperforms six baselines across 18 tasks.
Camyla autonomously generates research proposals, experiments, and manuscripts in medical image segmentation, outperforming baselines on 24 of 31 recent datasets while producing 40 human-reviewed papers.
OrchestrXR uses multi-agent orchestration with structured schemas to generate Unity XR study prototypes from ideas, supported by a user study with 12 researchers indicating effective support and intent preservation.
OPD-Evolver uses on-policy self-distillation in fast interaction and slow attribution loops to build agents with holistic memory competence, outperforming prior systems by up to 11.5% and allowing a 9B model to compete with much larger ones.
LLMs given only research questions from 1000 arXiv CS papers recommend a narrower set of methods than the original papers, with effective model-entity diversity dropping from 1232 to 59-96 and stronger agreement among LLMs than with papers.
MLReplicate benchmark evaluates six autonomous systems on 45 manuscripts from ICML 2025 papers, finding that automated reviews accept flawed outputs with fabricated claims while human review exposes methodological failures, and that the cheapest system outperforms the most expensive by a wide margin
ResearchEVO automates the discover-then-explain cycle by evolving algorithms via fitness-driven LLM co-evolution and generating grounded, anti-hallucination research papers through sentence-level RAG.
PRISM-XR adds edge-based sensitive-data filtering and quick registration to MLLM-driven XR collaboration, reporting 90% request accuracy, sub-0.3s registration, and over 90% sensitive-object filtering in a 28-person study.
A survey organizing AI-powered research automation into five workflow stages, defining AutoResearch and Vibe Research, and proposing five evaluation dimensions while noting domain-conditioned limits on autonomy.
The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.
The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.
citing papers explorer
No citing papers match the current filters.