Heragent: Rethinking the automated environment deployment via hierarchical test pyramid

Xiang Li, Siyu Lu, Federica Sarro, Claire Le Goues, He Ye · 2026 · arXiv 2602.07871

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

DeployBench: Benchmarking LLM Agents for Research Artifact Deployment

cs.SE · 2026-06-03 · unverdicted · novelty 7.0

DeployBench is a new benchmark of 51 research-artifact deployment tasks where four LLMs with OpenHands achieve 7.8-51% pass rates, with failures mostly from agents stopping after weaker self-checks than the paper requires.

BootstrapAgent: Distilling Repository Setup into Reusable Agent Knowledge

cs.SE · 2026-05-15 · unverdicted · novelty 7.0

BootstrapAgent distills repository bootstrapping heuristics into a persistent .bootstrap contract via multi-agent evidence extraction, Docker verification, and trace-driven repair, reporting 92.9% success and efficiency gains on three benchmarks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

DeployBench: Benchmarking LLM Agents for Research Artifact Deployment cs.SE · 2026-06-03 · unverdicted · none · ref 26
DeployBench is a new benchmark of 51 research-artifact deployment tasks where four LLMs with OpenHands achieve 7.8-51% pass rates, with failures mostly from agents stopping after weaker self-checks than the paper requires.
BootstrapAgent: Distilling Repository Setup into Reusable Agent Knowledge cs.SE · 2026-05-15 · unverdicted · none · ref 24
BootstrapAgent distills repository bootstrapping heuristics into a persistent .bootstrap contract via multi-agent evidence extraction, Docker verification, and trace-driven repair, reporting 92.9% success and efficiency gains on three benchmarks.

Heragent: Rethinking the automated environment deployment via hierarchical test pyramid

fields

years

verdicts

representative citing papers

citing papers explorer