MASPrism attributes failures in multi-agent systems by ranking candidates from prefill-stage NLL and attention signals of a 0.6B SLM, beating baselines by up to 33.41% Top-1 accuracy and proprietary LLMs by up to 89.5% relative improvement while processing traces in 2.66 seconds.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
A new benchmark for 0-to-1 CLI tool generation shows state-of-the-art LLMs achieve under 43% success rate with black-box equivalence testing against real oracles.
EvoDev introduces an iterative feature-driven framework with a DAG-based Feature Map for context propagation that improves LLM agent performance on end-to-end software development tasks by 56.8% over the best baseline.
citing papers explorer
-
MASPrism: Lightweight Failure Attribution for Multi-Agent Systems Using Prefill-Stage Signals
MASPrism attributes failures in multi-agent systems by ranking candidates from prefill-stage NLL and attention signals of a 0.6B SLM, beating baselines by up to 33.41% Top-1 accuracy and proprietary LLMs by up to 89.5% relative improvement while processing traces in 2.66 seconds.
-
Evaluating LLM-Based 0-to-1 Software Generation in End-to-End CLI Tool Scenarios
A new benchmark for 0-to-1 CLI tool generation shows state-of-the-art LLMs achieve under 43% success rate with black-box equivalence testing against real oracles.
-
EvoDev: An Iterative Feature-Driven Framework for End-to-End Software Development with LLM-based Agents
EvoDev introduces an iterative feature-driven framework with a DAG-based Feature Map for context propagation that improves LLM agent performance on end-to-end software development tasks by 56.8% over the best baseline.