Structured Recurrent Mixers provide a dual parallel-recurrent representation for sequence models, claiming superior training efficiency, information capacity, and inference throughput over linear complexity alternatives.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
LANG combines language-adaptive hint guidance, progressive decay, and difficulty-tailored learning horizons in RL to boost non-English reasoning performance while preserving language consistency.
Toeplitz MLP Mixers replace attention with masked Toeplitz multiplications for sub-quadratic complexity while retaining more sequence information and outperforming on copying and in-context tasks.
Multilingual pooling for quality classifiers outperforms monolingual baselines in rank stability and accuracy for LLM pretraining data selection across high- and low-resource languages.
citing papers explorer
-
Structured Recurrent Mixers for Massively Parallelized Sequence Generation
Structured Recurrent Mixers provide a dual parallel-recurrent representation for sequence models, claiming superior training efficiency, information capacity, and inference throughput over linear complexity alternatives.
-
LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance
LANG combines language-adaptive hint guidance, progressive decay, and difficulty-tailored learning horizons in RL to boost non-English reasoning performance while preserving language consistency.
-
Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models
Toeplitz MLP Mixers replace attention with masked Toeplitz multiplications for sub-quadratic complexity while retaining more sequence information and outperforming on copying and in-context tasks.
-
Toward Cross-Lingual Quality Classifiers for Multilingual Pretraining Data Selection
Multilingual pooling for quality classifiers outperforms monolingual baselines in rank stability and accuracy for LLM pretraining data selection across high- and low-resource languages.