pith. sign in

ArXiv preprint , title =

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.CL 2

years

2024 2

roles

background 1

polarities

background 1

representative citing papers

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

cs.CL · 2024-10-23 · conditional · novelty 6.0

Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.

Gated Delta Networks: Improving Mamba2 with Delta Rule

cs.CL · 2024-12-09 · unverdicted · novelty 5.0

Gated DeltaNet integrates gating and delta rules into linear transformers, outperforming Mamba2 and DeltaNet on language modeling, reasoning, retrieval, and long-context tasks.

citing papers explorer

Showing 2 of 2 citing papers.

  • Scaling Diffusion Language Models via Adaptation from Autoregressive Models cs.CL · 2024-10-23 · conditional · none · ref 11

    Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.

  • Gated Delta Networks: Improving Mamba2 with Delta Rule cs.CL · 2024-12-09 · unverdicted · none · ref 210

    Gated DeltaNet integrates gating and delta rules into linear transformers, outperforming Mamba2 and DeltaNet on language modeling, reasoning, retrieval, and long-context tasks.