Strained coherence flagged by Claude judge on 44 coding trajectories predicts failure (94% vs 46%, p=0.003), with partial replication on second model.
RAFFLES: Reasoning-based attribution of faults for LLM systems.arXiv preprint arXiv:2509.06822,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
StepFinder turns execution logs into temporal semantic sequences via LLMs then uses temporal modeling plus attention to attribute failures to specific steps more accurately and 79% faster than direct LLM methods on the Who&When benchmark.
citing papers explorer
-
StepFinder: A Temporal Semantic Framework for Failure Attribution in Multi-Agent Systems
StepFinder turns execution logs into temporal semantic sequences via LLMs then uses temporal modeling plus attention to attribute failures to specific steps more accurately and 79% faster than direct LLM methods on the Who&When benchmark.