and Papailiopoulos, Dimitris , year =

Giannou, Angeliki, Rajput, Shashank, Sohn, Jy-yong, Lee, Kangwook, Lee, Jason D · 2023 · arXiv 2301.13196

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Training Transformers as a Universal Computer

cs.AI · 2026-04-28 · unverdicted · novelty 7.0

A transformer trained on random meaningless MicroPy programs generalizes to execute diverse human-written programs, providing empirical evidence it can act as a universal computer.

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

State Stream Transformer (SST) V2: Parallel Training of Nonlinear Recurrence for Latent Space Reasoning

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

SST V2 introduces parallel-trainable nonlinear recurrence in latent space to let transformers reason continuously across positions, delivering +15 points on GPQA-Diamond and halving remaining GSM8K errors over matched baselines.

TabICL: A Tabular Foundation Model for In-Context Learning on Large Data

cs.LG · 2025-02-08 · unverdicted · novelty 6.0

TabICL scales in-context learning to large tabular data via column-then-row attention for row embeddings followed by a transformer, matching TabPFNv2 speed and performance while outperforming it and CatBoost on datasets over 10K samples.

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

cs.SE · 2024-03-12 · unverdicted · novelty 6.0

LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

cs.SE · 2024-01-05 · accept · novelty 6.0

CRUXEval benchmark shows current code models including GPT-4 achieve at most 81% on input and output prediction for short Python functions, exposing gaps not captured by HumanEval.

SMolLM: Small Language Models Learn Small Molecular Grammar

cs.LG · 2026-05-07

citing papers explorer

Showing 7 of 7 citing papers.

Training Transformers as a Universal Computer cs.AI · 2026-04-28 · unverdicted · none · ref 5
A transformer trained on random meaningless MicroPy programs generalizes to execute diverse human-written programs, providing empirical evidence it can act as a universal computer.
Massive Activations in Large Language Models cs.CL · 2024-02-27 · unverdicted · none · ref 31
Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
State Stream Transformer (SST) V2: Parallel Training of Nonlinear Recurrence for Latent Space Reasoning cs.LG · 2026-04-30 · unverdicted · none · ref 14
SST V2 introduces parallel-trainable nonlinear recurrence in latent space to let transformers reason continuously across positions, delivering +15 points on GPQA-Diamond and halving remaining GSM8K errors over matched baselines.
TabICL: A Tabular Foundation Model for In-Context Learning on Large Data cs.LG · 2025-02-08 · unverdicted · none · ref 180
TabICL scales in-context learning to large tabular data via column-then-row attention for row embeddings followed by a transformer, matching TabPFNv2 speed and performance while outperforming it and CatBoost on datasets over 10K samples.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code cs.SE · 2024-03-12 · unverdicted · none · ref 97
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution cs.SE · 2024-01-05 · accept · none · ref 2
CRUXEval benchmark shows current code models including GPT-4 achieve at most 81% on input and output prediction for short Python functions, exposing gaps not captured by HumanEval.
SMolLM: Small Language Models Learn Small Molecular Grammar cs.LG · 2026-05-07 · unreviewed · ref 67

and Papailiopoulos, Dimitris , year =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer