A systematic study across five domains finds model-generated skills yield average gains but non-uniform negative transfer, with a meta-skill improving extraction quality.
Spreadsheetbench: Towards challenging real world spreadsheet manipulation.Advances in Neural Information Processing Systems, 37:94871–94908
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
fields
cs.AI 3representative citing papers
Finch is a new benchmark with 172 composite workflows and 384 tasks from real enterprise data that shows top AI models like GPT-5.1 Pro pass only 38.4% of workflows under human evaluation.
citing papers explorer
-
From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills
A systematic study across five domains finds model-generated skills yield average gains but non-uniform negative transfer, with a meta-skill improving extraction quality.
-
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows
Finch is a new benchmark with 172 composite workflows and 384 tasks from real enterprise data that shows top AI models like GPT-5.1 Pro pass only 38.4% of workflows under human evaluation.
- SkillOpt: Executive Strategy for Self-Evolving Agent Skills