Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution cs.CL · 2026-05-19