Title resolution pending

Andrej Karpathy , title = · 2025

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2 baseline 1 dataset 1

citation-polarity summary

background 1 baseline 1 unclear 1 use dataset 1

representative citing papers

Tokenisation via Convex Relaxations

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

ConvexTok uses convex relaxation of tokenization to a linear program, improving intrinsic metrics, bits-per-byte, and some downstream tasks while certifying near-optimality within 1% at typical vocabulary sizes.

LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

LOSCAR-SGD combines local updates, sparse model averaging, and communication-computation overlap with a delay-corrected merge rule, providing convergence rates for smooth non-convex objectives under worker heterogeneity.

Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Ringmaster LMO extends delay-thresholding from ASGD to LMO-based momentum updates, providing convergence guarantees under (L0, L1)-smoothness and time-complexity bounds that recover optimal rates in the Euclidean case.

Simply Stabilizing the Loop via Fully Looped Transformer

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Fully Looped Transformer stabilizes looped training up to 12 iterations via distributed inter-loop signals and attention injection, improving downstream performance by up to 13.2%.

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

cs.AI · 2026-05-15 · unverdicted · novelty 6.0

Multi-agent LLM systems discover new Transformer and hybrid architectures that outperform Llama 3.2 at 1B scale and approach human SOTA on long-range benchmarks.

Solve the Loop: Attractor Models for Language and Reasoning

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Attractor Models solve for fixed points in transformer embeddings using implicit differentiation to enable stable iterative refinement, delivering better perplexity, accuracy, and efficiency than standard or looped transformers.

Muon Does Not Converge on Convex Lipschitz Functions

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Muon does not converge on convex Lipschitz functions regardless of learning rate, while error feedback restores theoretical convergence but degrades performance on CIFAR-10 and nanoGPT tasks.

Can Muon Fine-tune Adam-Pretrained Models?

cs.LG · 2026-05-11 · unverdicted · novelty 4.0

Constraining fine-tuning updates with LoRA mitigates performance degradation when switching from Adam to Muon on pretrained models.

citing papers explorer

Showing 8 of 8 citing papers.

Tokenisation via Convex Relaxations cs.CL · 2026-05-21 · unverdicted · none · ref 14
ConvexTok uses convex relaxation of tokenization to a linear program, improving intrinsic metrics, bits-per-byte, and some downstream tasks while certifying near-optimality within 1% at typical vocabulary sizes.
LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging cs.LG · 2026-05-20 · unverdicted · none · ref 4
LOSCAR-SGD combines local updates, sparse model averaging, and communication-computation overlap with a delay-corrected merge rule, providing convergence rates for smooth non-convex objectives under worker heterogeneity.
Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method cs.LG · 2026-05-18 · unverdicted · none · ref 3
Ringmaster LMO extends delay-thresholding from ASGD to LMO-based momentum updates, providing convergence guarantees under (L0, L1)-smoothness and time-complexity bounds that recover optimal rates in the Euclidean case.
Simply Stabilizing the Loop via Fully Looped Transformer cs.LG · 2026-05-11 · unverdicted · none · ref 50
Fully Looped Transformer stabilizes looped training up to 12 iterations via distributed inter-loop signals and attention injection, improving downstream performance by up to 13.2%.
Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design cs.AI · 2026-05-15 · unverdicted · none · ref 78
Multi-agent LLM systems discover new Transformer and hybrid architectures that outperform Llama 3.2 at 1B scale and approach human SOTA on long-range benchmarks.
Solve the Loop: Attractor Models for Language and Reasoning cs.LG · 2026-05-12 · unverdicted · none · ref 37
Attractor Models solve for fixed points in transformer embeddings using implicit differentiation to enable stable iterative refinement, delivering better perplexity, accuracy, and efficiency than standard or looped transformers.
Muon Does Not Converge on Convex Lipschitz Functions cs.LG · 2026-05-09 · unverdicted · none · ref 69
Muon does not converge on convex Lipschitz functions regardless of learning rate, while error feedback restores theoretical convergence but degrades performance on CIFAR-10 and nanoGPT tasks.
Can Muon Fine-tune Adam-Pretrained Models? cs.LG · 2026-05-11 · unverdicted · none · ref 8
Constraining fine-tuning updates with LoRA mitigates performance degradation when switching from Adam to Muon on pretrained models.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer