SeqMem-Eval reveals that high final accuracy in sequential LLM memory tasks often coexists with substantial forgetting and negative transfer, exposing stability-adaptability trade-offs hidden by standard aggregate metrics.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Magentic-One is a modular multi-agent system that matches state-of-the-art performance on GAIA, AssistantBench, and WebArena using an orchestrator-led team of specialized agents.
citing papers explorer
-
Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory
SeqMem-Eval reveals that high final accuracy in sequential LLM memory tasks often coexists with substantial forgetting and negative transfer, exposing stability-adaptability trade-offs hidden by standard aggregate metrics.
-
Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
Magentic-One is a modular multi-agent system that matches state-of-the-art performance on GAIA, AssistantBench, and WebArena using an orchestrator-led team of specialized agents.