arXiv preprint arXiv:2312.17044 (2024)

Zhao, L · 2024 · arXiv 2312.17044

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication

cs.LG · 2026-03-30 · unverdicted · novelty 8.0

Long-range dependency in integer multiplication is a mirage from 1D representation; a 2D grid reduces it to local 3x3 operations, letting a 321-parameter neural cellular automaton generalize perfectly to inputs 683 times longer than training while Transformers fail.

On the Spatiotemporal Dynamics of Generalization in Neural Networks

cs.LG · 2026-02-02 · unverdicted · novelty 6.0

Deriving a neural cellular automaton from locality, symmetry, and stability postulates produces 100% accurate addition generalization from 16-digit to 1-million-digit inputs.

LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers

cs.CV · 2025-04-19 · unverdicted · novelty 6.0

LOOPE learns a patch ordering for positional embeddings in ViTs and introduces the Three Cell Experiment benchmark that shows 30-35% gaps in positional retention versus the usual 4-6%.

A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

Applies optimal transport to bound OOD generalization error in Transformers via Lipschitz continuity and TC^0 circuit depth lower bounds for Dyck-k backtracking, supported by evaluations on 54 configurations.

Positional Encoding in Transformer-Based Time Series Models: A Survey

cs.LG · 2025-02-17 · unverdicted · novelty 3.0

A survey of positional encoding methods in transformer-based time series models that evaluates fixed, learnable, relative, and hybrid approaches on classification tasks and links effectiveness to data characteristics.

Robust Filter Attention: Self-Attention as Precision-Weighted State Estimation

cs.LG · 2025-09-04

citing papers explorer

Showing 6 of 6 citing papers.

On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication cs.LG · 2026-03-30 · unverdicted · none · ref 18
Long-range dependency in integer multiplication is a mirage from 1D representation; a 2D grid reduces it to local 3x3 operations, letting a 321-parameter neural cellular automaton generalize perfectly to inputs 683 times longer than training while Transformers fail.
On the Spatiotemporal Dynamics of Generalization in Neural Networks cs.LG · 2026-02-02 · unverdicted · none · ref 10
Deriving a neural cellular automaton from locality, symmetry, and stability postulates produces 100% accurate addition generalization from 16-digit to 1-million-digit inputs.
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers cs.CV · 2025-04-19 · unverdicted · none · ref 32
LOOPE learns a patch ordering for positional embeddings in ViTs and introduces the Three Cell Experiment benchmark that shows 30-35% gaps in positional retention versus the usual 4-6%.
A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits cs.LG · 2026-05-19 · unverdicted · none · ref 34
Applies optimal transport to bound OOD generalization error in Transformers via Lipschitz continuity and TC^0 circuit depth lower bounds for Dyck-k backtracking, supported by evaluations on 54 configurations.
Positional Encoding in Transformer-Based Time Series Models: A Survey cs.LG · 2025-02-17 · unverdicted · none · ref 27
A survey of positional encoding methods in transformer-based time series models that evaluates fixed, learnable, relative, and hybrid approaches on classification tasks and links effectiveness to data characteristics.
Robust Filter Attention: Self-Attention as Precision-Weighted State Estimation cs.LG · 2025-09-04 · unreviewed · ref 102

arXiv preprint arXiv:2312.17044 (2024)

fields

years

verdicts

representative citing papers

citing papers explorer