pith. sign in

hub

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , year=

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

hub tools

citation-role summary

background 1

citation-polarity summary

fields

cs.CL 8 cs.LG 4

years

2026 11 2024 1

roles

background 1

polarities

background 1

representative citing papers

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

cs.CL · 2024-10-23 · conditional · novelty 6.0

Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.

The Efficiency Gap in Byte Modeling

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

Byte modeling incurs greater scaling overhead for masked diffusion than autoregressive models because the diffusion objective destroys local byte contiguity needed to resolve semantics.

citing papers explorer

Showing 12 of 12 citing papers.