pith. sign in

hub

In: Gurevych, I., Miyao, Y

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

hub tools

citation-role summary

background 1 dataset 1

citation-polarity summary

representative citing papers

Structured Recurrent Mixers for Massively Parallelized Sequence Generation

cs.CL · 2026-05-09 · conditional · novelty 6.0 · 2 refs

Structured Recurrent Mixers enable algebraic switching between parallel training and recurrent inference representations, yielding higher throughput, concurrency, and training efficiency than comparable linear-complexity models on language tasks.

Architecture Determines Observability of Transformers

cs.LG · 2026-04-27 · unverdicted · novelty 6.0 · 2 refs

Architecture and training determine whether transformers retain a readable internal signal that lets activation monitors catch errors missed by output confidence.

Should We Still Pretrain Encoders with Masked Language Modeling?

cs.CL · 2025-07-01 · accept · novelty 6.0

Controlled ablations of 38 models find MLM superior to CLM on representation benchmarks while CLM offers better data efficiency and stability; a biphasic CLM-then-MLM schedule is optimal under fixed compute and improves when initialized from pretrained CLM models.

Gated Delta Networks: Improving Mamba2 with Delta Rule

cs.CL · 2024-12-09 · unverdicted · novelty 5.0

Gated DeltaNet integrates gating and delta rules into linear transformers, outperforming Mamba2 and DeltaNet on language modeling, reasoning, retrieval, and long-context tasks.

PaLM 2 Technical Report

cs.CL · 2023-05-17 · unverdicted · novelty 5.0

PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.

citing papers explorer

Showing 18 of 18 citing papers.