hub

Chain of thought empowers transformers to solve inherently serial problems

Chain of thought empowers transformers to solve inherently serial problems , author= · 2024 · arXiv 2402.12875

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 2 support 1

representative citing papers

Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete

cs.LG · 2026-06-01 · unverdicted · novelty 8.0

Sliding-window transformers without positional encodings are Turing complete because the sliding window breaks permutation symmetry and suffices to simulate Post machines via a constant-size histogram state.

Transformers Provably Learn to Internalize Chain-of-Thought

cs.LG · 2026-05-27 · unverdicted · novelty 8.0

L-layer transformers under Log-ICoT curriculum provably learn k-parity with poly(n) samples and log k stages, matching explicit CoT efficiency without inference overhead.

Forecasting Future Behavior as a Learning Task

cs.AI · 2026-06-09 · unverdicted · novelty 7.0

Behavior Forecasters trained on LRM trajectories outperform larger models in predicting repeatability and input sensitivity at low cost.

Agentic Transformers Provably Learn to Search via Reinforcement Learning

cs.LG · 2026-05-29 · unverdicted · novelty 7.0

In a stochastic k-ary tree, a two-head transformer learns randomized DFS via policy gradient under depth-wise curriculum, generalizes to deeper trees, and adapts to imbalanced goals via discounting.

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

Training Transformers as a Universal Computer

cs.AI · 2026-04-28 · unverdicted · novelty 7.0

A transformer trained on random meaningless MicroPy programs generalizes to execute diverse human-written programs, providing empirical evidence it can act as a universal computer.

Training Large Language Models to Reason in a Continuous Latent Space

cs.CL · 2024-12-09 · unverdicted · novelty 7.0

Coconut lets LLMs perform reasoning directly in continuous latent space by recycling hidden states as inputs, outperforming standard chain-of-thought on search-intensive logical tasks with better accuracy-efficiency trade-offs.

Learned Relay Representations for Forward-Thinking Discrete Diffusion Models

cs.LG · 2026-05-21 · unverdicted · novelty 6.0 · 2 refs

Learned Relay Representations add a differentiable per-token channel to masked diffusion models so they can propagate latent information across iterative denoising steps, yielding better coding performance and up to 32% lower latency on Fast-dLLM v2 than standard supervised finetuning.

Ensemble Monitoring for AI Control: Diverse Signals Outweigh More Compute

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

Diverse ensembles of prompted and fine-tuned GPT-4.1-Mini monitors achieve 2.4x better detection of flawed code solutions than homogeneous ensembles on adversarial inputs.

The Power of Power Law: Asymmetry Enables Compositional Reasoning

cs.AI · 2026-04-24 · unverdicted · novelty 6.0

Power-law data sampling creates beneficial asymmetry in the loss landscape that lets models acquire high-frequency skill compositions first, enabling more efficient learning of rare long-tail skills than uniform distributions.

HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.

Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning

cs.RO · 2026-02-09 · unverdicted · novelty 6.0

R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.

NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning

cs.LG · 2026-05-06 · unverdicted · novelty 5.0

Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.

LLM Reasoning Is Latent, Not the Chain of Thought

cs.AI · 2026-04-17 · unverdicted · novelty 5.0

LLM reasoning is primarily mediated by latent-state trajectories rather than by explicit surface chain-of-thought outputs.

The Serial Scaling Hypothesis

cs.LG · 2025-07-16 · unverdicted · novelty 5.0

The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.

Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?

cs.AI · 2025-03-23 · conditional · novelty 5.0

LLMs show accuracy drops of 0.3% to 5.9% on GSM8K math problems when culturally adapted to six countries while keeping math operations identical, with statistical significance confirmed by McNemar tests.

Measuring AI Reasoning: A Guide for Researchers

cs.AI · 2026-05-04 · unverdicted · novelty 4.0

Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.

Internalized Reasoning for Long-Context Visual Document Understanding

cs.CV · 2026-03-31

citing papers explorer

Showing 18 of 18 citing papers.

Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete cs.LG · 2026-06-01 · unverdicted · none · ref 7
Sliding-window transformers without positional encodings are Turing complete because the sliding window breaks permutation symmetry and suffices to simulate Post machines via a constant-size histogram state.
Transformers Provably Learn to Internalize Chain-of-Thought cs.LG · 2026-05-27 · unverdicted · none · ref 27
L-layer transformers under Log-ICoT curriculum provably learn k-parity with poly(n) samples and log k stages, matching explicit CoT efficiency without inference overhead.
Forecasting Future Behavior as a Learning Task cs.AI · 2026-06-09 · unverdicted · none · ref 90
Behavior Forecasters trained on LRM trajectories outperform larger models in predicting repeatability and input sensitivity at low cost.
Agentic Transformers Provably Learn to Search via Reinforcement Learning cs.LG · 2026-05-29 · unverdicted · none · ref 20
In a stochastic k-ary tree, a two-head transformer learns randomized DFS via policy gradient under depth-wise curriculum, generalizes to deeper trees, and adapts to imbalanced goals via discounting.
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media cs.CL · 2026-05-16 · unverdicted · none · ref 95
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
Training Transformers as a Universal Computer cs.AI · 2026-04-28 · unverdicted · none · ref 12
A transformer trained on random meaningless MicroPy programs generalizes to execute diverse human-written programs, providing empirical evidence it can act as a universal computer.
Training Large Language Models to Reason in a Continuous Latent Space cs.CL · 2024-12-09 · unverdicted · none · ref 20
Coconut lets LLMs perform reasoning directly in continuous latent space by recycling hidden states as inputs, outperforming standard chain-of-thought on search-intensive logical tasks with better accuracy-efficiency trade-offs.
Learned Relay Representations for Forward-Thinking Discrete Diffusion Models cs.LG · 2026-05-21 · unverdicted · none · ref 7 · 2 links
Learned Relay Representations add a differentiable per-token channel to masked diffusion models so they can propagate latent information across iterative denoising steps, yielding better coding performance and up to 32% lower latency on Fast-dLLM v2 than standard supervised finetuning.
Ensemble Monitoring for AI Control: Diverse Signals Outweigh More Compute cs.AI · 2026-05-14 · unverdicted · none · ref 66
Diverse ensembles of prompted and fine-tuned GPT-4.1-Mini monitors achieve 2.4x better detection of flawed code solutions than homogeneous ensembles on adversarial inputs.
The Power of Power Law: Asymmetry Enables Compositional Reasoning cs.AI · 2026-04-24 · unverdicted · none · ref 32
Power-law data sampling creates beneficial asymmetry in the loss landscape that lets models acquire high-frequency skill compositions first, enabling more efficient learning of rare long-tail skills than uniform distributions.
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering cs.AI · 2026-04-22 · unverdicted · none · ref 135
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning cs.RO · 2026-02-09 · unverdicted · none · ref 49
R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.
NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning cs.LG · 2026-05-06 · unverdicted · none · ref 42
Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.
LLM Reasoning Is Latent, Not the Chain of Thought cs.AI · 2026-04-17 · unverdicted · none · ref 4
LLM reasoning is primarily mediated by latent-state trajectories rather than by explicit surface chain-of-thought outputs.
The Serial Scaling Hypothesis cs.LG · 2025-07-16 · unverdicted · none · ref 59
The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts? cs.AI · 2025-03-23 · conditional · none · ref 11
LLMs show accuracy drops of 0.3% to 5.9% on GSM8K math problems when culturally adapted to six countries while keeping math operations identical, with statistical significance confirmed by McNemar tests.
Measuring AI Reasoning: A Guide for Researchers cs.AI · 2026-05-04 · unverdicted · none · ref 41
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.
Internalized Reasoning for Long-Context Visual Document Understanding cs.CV · 2026-03-31 · unreviewed · ref 27

Chain of thought empowers transformers to solve inherently serial problems

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer