arXiv preprint arXiv:2311.12997 , year=

Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks , author= · arXiv 2311.12997

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

BERT learns shortcut solutions that impair generalization and forward transfer in continual LEGO, while ALBERT learns loop-like solutions for better performance, yet both fail at cross-experience composition, with ALBERT rescued by mixed-data training.

Learning to Theorize the World from Observation

cs.LG · 2026-05-05 · unverdicted · novelty 6.0

NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.

There Will Be a Scientific Theory of Deep Learning

stat.ML · 2026-04-23 · unverdicted · novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

citing papers explorer

Showing 3 of 3 citing papers.

Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning cs.LG · 2026-05-06 · unverdicted · none · ref 12
BERT learns shortcut solutions that impair generalization and forward transfer in continual LEGO, while ALBERT learns loop-like solutions for better performance, yet both fail at cross-experience composition, with ALBERT rescued by mixed-data training.
Learning to Theorize the World from Observation cs.LG · 2026-05-05 · unverdicted · none · ref 286
NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
There Will Be a Scientific Theory of Deep Learning stat.ML · 2026-04-23 · unverdicted · none · ref 289
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

arXiv preprint arXiv:2311.12997 , year=

fields

years

verdicts

representative citing papers

citing papers explorer