LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).
Efficient transformers: A survey
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Kuramoto synchronization dynamics implement a provably unique and globally attractive attention mechanism that replaces softmax for physical substrates and shows competitive empirical performance.
Sparsity-guided distillation enables replacing attention layers in ViTs with simpler sequential modules, with sparser layers showing smaller performance drops.
The paper calls for life cycle assessment to capture embodied hardware costs and full pipeline operational costs in AI development and deployment.
This survey organizes LLM optimizer literature into categories and argues the field is shifting toward rigorous, multi-factor comparisons of convergence, memory, stability, and complexity.
citing papers explorer
-
Attention by Synchronization in Coupled Oscillator Networks
Kuramoto synchronization dynamics implement a provably unique and globally attractive attention mechanism that replaces softmax for physical substrates and shows competitive empirical performance.
-
From Sparsity to Simplicity: Enabling Simpler Sequential Replacements via Sparse Attention Distillation
Sparsity-guided distillation enables replacing attention layers in ViTs with simpler sequential modules, with sparser layers showing smaller performance drops.
-
Evaluation of ML Resource Utilization Requires Model Life Cycle Assessment
The paper calls for life cycle assessment to capture embodied hardware costs and full pipeline operational costs in AI development and deployment.
-
Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
This survey organizes LLM optimizer literature into categories and argues the field is shifting toward rigorous, multi-factor comparisons of convergence, memory, stability, and complexity.
- Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations