and Papailiopoulos, Dimitris , year =

Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D Lee, Dimitris Papailiopoulos · 2023 · arXiv 2301.13196

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Training Transformers as a Universal Computer

cs.AI · 2026-04-28 · unverdicted · novelty 7.0

A transformer trained on random meaningless MicroPy programs generalizes to execute diverse human-written programs, providing empirical evidence it can act as a universal computer.

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

Stabilizing Extrapolation in Looped Transformers via Learned Stochastic Stopping

cs.LG · 2026-06-29 · unverdicted · novelty 6.0

Stochastic loop counts during training of looped transformers reduce OOD variance on binary addition, Dyck-1, Unique Set and Copy tasks, with learned RL-Halting further improving the accuracy-stability trade-off.

Fixed-Point Masked Generative Modeling

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

FP-MGMs with consistency loss and three-state reuse (CoFRe) reduce parameters by up to 38.8% and improve low-budget perplexity and FID versus standard masked generative models on text and images.

State Stream Transformer (SST) V2: Parallel Training of Nonlinear Recurrence for Latent Space Reasoning

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

SST V2 introduces parallel-trainable nonlinear recurrence in latent space to let transformers reason continuously across positions, delivering +15 points on GPQA-Diamond and halving remaining GSM8K errors over matched baselines.

TabICL: A Tabular Foundation Model for In-Context Learning on Large Data

cs.LG · 2025-02-08 · unverdicted · novelty 6.0

TabICL scales in-context learning to large tabular data via column-then-row attention for row embeddings followed by a transformer, matching TabPFNv2 speed and performance while outperforming it and CatBoost on datasets over 10K samples.

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

cs.SE · 2024-03-12 · unverdicted · novelty 6.0

LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

cs.SE · 2024-01-05 · accept · novelty 6.0

CRUXEval benchmark shows current code models including GPT-4 achieve at most 81% on input and output prediction for short Python functions, exposing gaps not captured by HumanEval.

SMolLM: Small Language Models Learn Small Molecular Grammar

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

A 53K-parameter weight-shared transformer generates novel valid SMILES at 95% rate on ZINC-250K and resolves constraints hierarchically via bracket, ring, and valence stages as shown by probing and ablation.

citing papers explorer

Showing 9 of 9 citing papers.

Training Transformers as a Universal Computer cs.AI · 2026-04-28 · unverdicted · none · ref 5
A transformer trained on random meaningless MicroPy programs generalizes to execute diverse human-written programs, providing empirical evidence it can act as a universal computer.
Massive Activations in Large Language Models cs.CL · 2024-02-27 · unverdicted · none · ref 31
Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
Stabilizing Extrapolation in Looped Transformers via Learned Stochastic Stopping cs.LG · 2026-06-29 · unverdicted · none · ref 11
Stochastic loop counts during training of looped transformers reduce OOD variance on binary addition, Dyck-1, Unique Set and Copy tasks, with learned RL-Halting further improving the accuracy-stability trade-off.
Fixed-Point Masked Generative Modeling cs.LG · 2026-05-29 · unverdicted · none · ref 24
FP-MGMs with consistency loss and three-state reuse (CoFRe) reduce parameters by up to 38.8% and improve low-budget perplexity and FID versus standard masked generative models on text and images.
State Stream Transformer (SST) V2: Parallel Training of Nonlinear Recurrence for Latent Space Reasoning cs.LG · 2026-04-30 · unverdicted · none · ref 14
SST V2 introduces parallel-trainable nonlinear recurrence in latent space to let transformers reason continuously across positions, delivering +15 points on GPQA-Diamond and halving remaining GSM8K errors over matched baselines.
TabICL: A Tabular Foundation Model for In-Context Learning on Large Data cs.LG · 2025-02-08 · unverdicted · none · ref 180
TabICL scales in-context learning to large tabular data via column-then-row attention for row embeddings followed by a transformer, matching TabPFNv2 speed and performance while outperforming it and CatBoost on datasets over 10K samples.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code cs.SE · 2024-03-12 · unverdicted · none · ref 97
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution cs.SE · 2024-01-05 · accept · none · ref 2
CRUXEval benchmark shows current code models including GPT-4 achieve at most 81% on input and output prediction for short Python functions, exposing gaps not captured by HumanEval.
SMolLM: Small Language Models Learn Small Molecular Grammar cs.LG · 2026-05-07 · unverdicted · none · ref 67
A 53K-parameter weight-shared transformer generates novel valid SMILES at 95% rate on ZINC-250K and resolves constraints hierarchically via bracket, ring, and valence stages as shown by probing and ablation.

and Papailiopoulos, Dimitris , year =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer