pith. sign in

Title resolution pending

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 2 baseline 1 dataset 1

citation-polarity summary

years

2026 8

verdicts

UNVERDICTED 8

clear filters

representative citing papers

Tokenisation via Convex Relaxations

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

ConvexTok uses convex relaxation of tokenization to a linear program, improving intrinsic metrics, bits-per-byte, and some downstream tasks while certifying near-optimality within 1% at typical vocabulary sizes.

Simply Stabilizing the Loop via Fully Looped Transformer

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Fully Looped Transformer stabilizes looped training up to 12 iterations via distributed inter-loop signals and attention injection, improving downstream performance by up to 13.2%.

Solve the Loop: Attractor Models for Language and Reasoning

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Attractor Models solve for fixed points in transformer embeddings using implicit differentiation to enable stable iterative refinement, delivering better perplexity, accuracy, and efficiency than standard or looped transformers.

Muon Does Not Converge on Convex Lipschitz Functions

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Muon does not converge on convex Lipschitz functions regardless of learning rate, while error feedback restores theoretical convergence but degrades performance on CIFAR-10 and nanoGPT tasks.

Can Muon Fine-tune Adam-Pretrained Models?

cs.LG · 2026-05-11 · unverdicted · novelty 4.0

Constraining fine-tuning updates with LoRA mitigates performance degradation when switching from Adam to Muon on pretrained models.

citing papers explorer

Showing 8 of 8 citing papers.

  • Tokenisation via Convex Relaxations cs.CL · 2026-05-21 · unverdicted · none · ref 14

    ConvexTok uses convex relaxation of tokenization to a linear program, improving intrinsic metrics, bits-per-byte, and some downstream tasks while certifying near-optimality within 1% at typical vocabulary sizes.

  • LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging cs.LG · 2026-05-20 · unverdicted · none · ref 4

    LOSCAR-SGD combines local updates, sparse model averaging, and communication-computation overlap with a delay-corrected merge rule, providing convergence rates for smooth non-convex objectives under worker heterogeneity.

  • Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method cs.LG · 2026-05-18 · unverdicted · none · ref 3

    Ringmaster LMO extends delay-thresholding from ASGD to LMO-based momentum updates, providing convergence guarantees under (L0, L1)-smoothness and time-complexity bounds that recover optimal rates in the Euclidean case.

  • Simply Stabilizing the Loop via Fully Looped Transformer cs.LG · 2026-05-11 · unverdicted · none · ref 50

    Fully Looped Transformer stabilizes looped training up to 12 iterations via distributed inter-loop signals and attention injection, improving downstream performance by up to 13.2%.

  • Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design cs.AI · 2026-05-15 · unverdicted · none · ref 78

    Multi-agent LLM systems discover new Transformer and hybrid architectures that outperform Llama 3.2 at 1B scale and approach human SOTA on long-range benchmarks.

  • Solve the Loop: Attractor Models for Language and Reasoning cs.LG · 2026-05-12 · unverdicted · none · ref 37

    Attractor Models solve for fixed points in transformer embeddings using implicit differentiation to enable stable iterative refinement, delivering better perplexity, accuracy, and efficiency than standard or looped transformers.

  • Muon Does Not Converge on Convex Lipschitz Functions cs.LG · 2026-05-09 · unverdicted · none · ref 69

    Muon does not converge on convex Lipschitz functions regardless of learning rate, while error feedback restores theoretical convergence but degrades performance on CIFAR-10 and nanoGPT tasks.

  • Can Muon Fine-tune Adam-Pretrained Models? cs.LG · 2026-05-11 · unverdicted · none · ref 8

    Constraining fine-tuning updates with LoRA mitigates performance degradation when switching from Adam to Muon on pretrained models.