MAGE uses a four-subgraph co-evolutionary knowledge graph plus dual bandits to externalize and retrieve experience for stable self-evolution of frozen language-model agents, showing gains on nine diverse benchmarks.
Finqa: A dataset of numerical reasoning over financial data
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
dataset 1
citation-polarity summary
fields
cs.AI 2verdicts
UNVERDICTED 2roles
dataset 1polarities
use dataset 1representative citing papers
Finch is a new benchmark with 172 composite workflows and 384 tasks from real enterprise data that shows top AI models like GPT-5.1 Pro pass only 38.4% of workflows under human evaluation.
citing papers explorer
-
MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs
MAGE uses a four-subgraph co-evolutionary knowledge graph plus dual bandits to externalize and retrieve experience for stable self-evolution of frozen language-model agents, showing gains on nine diverse benchmarks.
-
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows
Finch is a new benchmark with 172 composite workflows and 384 tasks from real enterprise data that shows top AI models like GPT-5.1 Pro pass only 38.4% of workflows under human evaluation.