pith. sign in

Dynamic chunking for end-to-end hierarchical sequence modeling, 2025

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

citation-role summary

background 2 method 1

citation-polarity summary

years

2026 11

clear filters

representative citing papers

MultiHashFormer: Hash-based Generative Language Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.

Neural Field Tokenizations with Hierarchy and Spatial Locality Priors

cs.LG · 2026-06-06 · unverdicted · novelty 7.0

LH-NeF learns tokenized neural-field representations via a locality-preserving hierarchical encoder, achieving 42× lower memory and 133× larger batches than modality-agnostic meta-learning baselines while matching or exceeding performance on reconstruction and downstream tasks.

Training Transformers for KV Cache Compressibility

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Training transformers with KV sparsification during continued pretraining produces representations that admit better post-hoc KV cache compression, improving quality under memory budgets for long-context tasks.

The Efficiency Gap in Byte Modeling

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

Byte modeling incurs greater scaling overhead for masked diffusion than autoregressive models because the diffusion objective destroys local byte contiguity needed to resolve semantics.

Efficient Pre-Training with Token Superposition

cs.CL · 2026-05-07 · unverdicted · novelty 5.0 · 2 refs

Token-Superposition Training combines multiple tokens into bags for multi-hot cross-entropy pre-training followed by a recovery phase, yielding up to 2.5x reduction in training time at 10B scale under equal-loss conditions.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Training Transformers for KV Cache Compressibility cs.LG · 2026-05-07 · unverdicted · none · ref 22 · 2 links

    Training transformers with KV sparsification during continued pretraining produces representations that admit better post-hoc KV cache compression, improving quality under memory budgets for long-context tasks.