Structured Recurrent Mixers provide a dual parallel-recurrent representation for sequence models, claiming superior training efficiency, information capacity, and inference throughput over linear complexity alternatives.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Toeplitz MLP Mixers replace attention with masked Toeplitz multiplications for sub-quadratic complexity while retaining more sequence information and outperforming on copying and in-context tasks.
Gemma introduces open 2B and 7B LLMs derived from Gemini technology that beat comparable open models on 11 of 18 text tasks and come with safety assessments.
Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.
citing papers explorer
-
Structured Recurrent Mixers for Massively Parallelized Sequence Generation
Structured Recurrent Mixers provide a dual parallel-recurrent representation for sequence models, claiming superior training efficiency, information capacity, and inference throughput over linear complexity alternatives.
-
Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models
Toeplitz MLP Mixers replace attention with masked Toeplitz multiplications for sub-quadratic complexity while retaining more sequence information and outperforming on copying and in-context tasks.