ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

· 2026 · cs.SE · arXiv 2605.03042

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open full Pith review browse 6 citing papers arXiv PDF

abstract

This report describes ARIS (Auto-Research-in-sleep), an open-source research harness for autonomous research, including its architecture, assurance mechanisms, and early deployment experience. The performance of agent systems built on LLMs depends on both the model weights and the harness around them, which governs what information to store, retrieve, and present to the model. For long-horizon research workflows, the central failure mode is not a visible breakdown but a plausible unsupported success: a long-running agent can produce claims whose evidential support is incomplete, misreported, or silently inherited from the executor's framing. Therefore, we present ARIS as a research harness that coordinates machine-learning research workflows through cross-model adversarial collaboration as a default configuration: an executor model drives forward progress while a reviewer from a different model family is recommended to critique intermediate artifacts and request revisions. ARIS has three architectural layers. The execution layer provides more than 65 reusable Markdown-defined skills, model integrations via MCP, a persistent research wiki for iterative reuse of prior findings, and deterministic figure generation. The orchestration layer coordinates five end-to-end workflows with adjustable effort settings and configurable routing to reviewer models. The assurance layer includes a three-stage process for checking whether experimental claims are supported by evidence: integrity verification, result-to-claim mapping, and claim auditing that cross-checks manuscript statements against the claim ledger and raw evidence, as well as a five-pass scientific-editing pipeline, mathematical-proof checks, and visual inspection of the rendered PDF. A prototype self-improvement loop records research traces and proposes harness improvements that are adopted only after reviewer approval.

representative citing papers

FARS: A Fully Automated Research System Deployed at Scale

cs.AI · 2026-06-30 · unverdicted · novelty 7.0

FARS deployed at scale produced 166 AI/ML papers across 67 topics that received 282 structured human reviews indicating some review-worthy outputs alongside recurring failure modes.

One Reflection Is Not Enough: Self-Correcting Autonomous Research via Multi-Hypothesis Failure Attribution

cs.AI · 2026-06-30 · unverdicted · novelty 6.0

SAGE with MHFA improves failure recovery in autonomous research agents, raising metrics-bearing outputs from 42% to 92% on a 12-topic benchmark versus single-reflection baselines.

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

ResearchClawBench is a new benchmark that evaluates autonomous AI research agents on 40 tasks grounded in published papers using expert rubrics, finding that top systems score only 20-26 out of 100.

Clarus: Coordinating Autonomous Research Agents toward Web-Scale Scientific Collaboration

cs.AI · 2026-06-29 · unverdicted · novelty 5.0

Clarus is a four-layer collaboration infrastructure with a project-agent-resource model that reformulates research as an open, traceable, multi-participant process.

Parametric Skills

cs.CL · 2026-06-29 · unverdicted · novelty 5.0

ParametricSkills uses a hypernetwork to turn textual skills into LoRA adapters, outperforming in-context learning by 6.44 points on average across six SWE subtasks with higher BERT Score and F1.

ResearchLoop: An Evidence-Gated Control Plane for AI-Assisted Research

cs.AI · 2026-05-27 · unverdicted · novelty 3.0

ResearchLoop defines a protocol and state model for evidence-gated AI-assisted computational research and reports experiments across nine versions including self-hosting and task ablations.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Parametric Skills cs.CL · 2026-06-29 · unverdicted · none · ref 24 · internal anchor
ParametricSkills uses a hypernetwork to turn textual skills into LoRA adapters, outperforming in-context learning by 6.44 points on average across six SWE subtasks with higher BERT Score and F1.

ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

fields

years

verdicts

representative citing papers

citing papers explorer