pith. sign in

2309.10818 , archivePrefix =

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

citation-role summary

background 1 dataset 1

citation-polarity summary

fields

cs.CL 5 cs.LG 4

representative citing papers

A3 : an Analytical Low-Rank Approximation Framework for Attention

cs.CL · 2025-05-19 · conditional · novelty 6.0

A3 splits Transformer layers into QK, OV, and MLP components and derives analytical low-rank approximations that reduce hidden dimensions while minimizing each component's functional loss, yielding better perplexity than prior low-rank methods on LLaMA models.

Refresh-Scaling the Memory of Balanced Adam

cs.LG · 2026-05-11 · unverdicted · novelty 5.0 · 2 refs

Setting β in balanced Adam to achieve a refresh count R_β ≈1000 based on effective learning horizon T_ES improves validation robustness over fixed-β baselines across 11 vision and language experiments.

SEDD: Scalable and Efficient Dataset Deduplication with GPUs

cs.CL · 2025-01-02 · unverdicted · novelty 5.0

SEDD delivers a distributed GPU deduplication system that reports up to 158x speedup over CPU baselines and 7.8x over NeMo Curator on 30M documents while preserving MinHash fidelity above 0.95 Jaccard.

A Survey of Large Language Models

cs.CL · 2023-03-31 · accept · novelty 3.0

This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

citing papers explorer

Showing 9 of 9 citing papers.