pith. sign in

hub

Long range arena: A benchmark for efficient transformers

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

hub tools

citation-role summary

background 4

citation-polarity summary

roles

background 4

polarities

background 4

clear filters

representative citing papers

Fast Cross-Operator Optimization of Attention Dataflow

cs.AR · 2026-04-03 · unverdicted · novelty 7.0

MMEE encodes dataflow decisions in matrix form for fast exhaustive search, delivering 40-69% lower latency and energy use than prior methods while running 64-343x faster.

Pretraining Recurrent Networks without Recurrence

cs.LG · 2026-06-04 · unverdicted · novelty 6.0

SMT reduces RNN training to supervised learning on memory transitions (m_t, x_{t+1}) to m_{t+1} obtained from a Transformer encoder, enabling time-parallel training with O(1) gradient paths.

Continuity Laws for Sequential Models

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

S4 models exhibit stable time-continuity unlike sensitive S6 models, with task continuity predicting performance and enabling temporal subsampling for better efficiency.

Differentiable Filtering for Learning Hidden Markov Models

cs.LG · 2025-11-13 · unverdicted · novelty 6.0

Belief Net learns HMM parameters by implementing the forward filter as a decoder-only neural network whose weights are the logits of the initial, transition, and emission distributions, trained end-to-end with autoregressive loss.

The Falcon Series of Open Language Models

cs.CL · 2023-11-28 · conditional · novelty 6.0

Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.

citing papers explorer

Showing 1 of 1 citing paper after filters.