WMF-AM: Probing LLM Working Memory via Depth-Parameterized Cumulative State Tracking

· 2026 · cs.AI · arXiv 2603.27343

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Existing large language models (LLMs) evaluations use fixed-difficulty benchmarks that cannot adapt as models improve, and rarely isolate specific cognitive processes. We introduce Working Memory Fidelity-Active Manipulation (WMF-AM), a probe of cumulative state tracking, the ability to maintain and update intermediate results across K sequential operations within a single query, without a scratchpad. Unlike multi-step agent benchmarks that stress task orchestration, WMF-AM isolates within-pass cumulative load by parameterizing depth K. The core probe uses arithmetic accumulation on 28 models from 12 families (0.5B to frontier); a matched non-arithmetic extension (permissions, schedules, inventories) confirms the design generalizes beyond arithmetic. Three construct-isolation ablations confirm that cumulative load, not arithmetic skill or entity tracking, drives difficulty. We release WMF-AM as a lightweight, recalibratable diagnostic for characterizing where models degrade under cumulative load. Code and data can be accessed at https://github.com/dengzhe-hou/WMF-AM

representative citing papers

PathCal: State-Aware Reflection-Marker Calibration for Efficient Reasoning

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

PathCal calibrates reasoning paths by type-aware soft rebalancing of reflection-marker logits at uncertain states, yielding better efficiency-performance trade-offs on six benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

PathCal: State-Aware Reflection-Marker Calibration for Efficient Reasoning cs.AI · 2026-05-21 · unverdicted · none · ref 24 · internal anchor
PathCal calibrates reasoning paths by type-aware soft rebalancing of reflection-marker logits at uncertain states, yielding better efficiency-performance trade-offs on six benchmarks.

WMF-AM: Probing LLM Working Memory via Depth-Parameterized Cumulative State Tracking

fields

years

verdicts

representative citing papers

citing papers explorer