Decoder-only transformers fail to base verification decisions solely on current search state in cumulative traces because of scattered retrieval and history entanglement; Selective State Attention enforces state-only decisions via a fixed mask.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2representative citing papers
QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.
citing papers explorer
-
Can Transformers Learn to Verify During Backtracking Search?
Decoder-only transformers fail to base verification decisions solely on current search state in cumulative traces because of scattered retrieval and history entanglement; Selective State Attention enforces state-only decisions via a fixed mask.
-
Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis
QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.