Introducing officeqa: A benchmark for end-to-end grounded reasoning, December 2025

The Mosaic Research Team · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

EvoSkill: Automated Skill Discovery for Multi-Agent Systems

cs.AI · 2026-03-03 · unverdicted · novelty 5.0

EvoSkill evolves agent skills via failure analysis and Pareto frontier selection, raising exact-match accuracy 7.3% on OfficeQA and 12.1% on SealQA with 5.3% zero-shot transfer to BrowseComp.

citing papers explorer

Showing 1 of 1 citing paper.

EvoSkill: Automated Skill Discovery for Multi-Agent Systems cs.AI · 2026-03-03 · unverdicted · none · ref 11
EvoSkill evolves agent skills via failure analysis and Pareto frontier selection, raising exact-match accuracy 7.3% on OfficeQA and 12.1% on SealQA with 5.3% zero-shot transfer to BrowseComp.

Introducing officeqa: A benchmark for end-to-end grounded reasoning, December 2025

fields

years

verdicts

representative citing papers

citing papers explorer