EvoSkill evolves agent skills via failure analysis and Pareto frontier selection, raising exact-match accuracy 7.3% on OfficeQA and 12.1% on SealQA with 5.3% zero-shot transfer to BrowseComp.
Introducing officeqa: A benchmark for end-to-end grounded reasoning, December 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
EvoSkill: Automated Skill Discovery for Multi-Agent Systems
EvoSkill evolves agent skills via failure analysis and Pareto frontier selection, raising exact-match accuracy 7.3% on OfficeQA and 12.1% on SealQA with 5.3% zero-shot transfer to BrowseComp.