pith. sign in

hub

International conference on machine learning , pages=

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

hub tools

years

2026 14 2025 1

verdicts

UNVERDICTED 15

representative citing papers

Scaling Limits of Long-Context Transformers

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

For uniform keys on the d-dimensional sphere, softmax attention becomes selective at inverse temperature scaling β_n* ≍ n^{2/(d-1)}, with explicit limiting laws for attention weights and outputs in each regime.

Search Your Block Floating Point Scales!

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.

Continuity Laws for Sequential Models

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

S4 models exhibit stable time-continuity unlike sensitive S6 models, with task continuity predicting performance and enabling temporal subsampling for better efficiency.

ZAYA1-8B Technical Report

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

The Recurrent Transformer: Greater Effective Depth and Efficient Decoding

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

Recurrent Transformers add per-layer recurrent memory via self-attention on own activations plus a tiling algorithm that reduces training memory traffic, yielding better C4 pretraining cross-entropy than parameter-matched standard transformers with fewer layers.

citing papers explorer

Showing 15 of 15 citing papers.