hub

Noam Shazeer and Mitchell Stern

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi · 2021 · DOI 10.1145/3474381

19 Pith papers cite this work. Polarity classification is still indexing.

19 Pith papers citing it

open at publisher browse 19 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

ConQuR: Corner Aligned Activation Quantization via Optimized Rotations for LLMs

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

ConQuR is a post-training rotation calibration technique that aligns activations to hypercube corners via Procrustes optimization and online updates, delivering competitive LLM quantization performance without end-to-end training or offline activation storage.

ReasonXL: Shifting LLM Reasoning Language Without Sacrificing Performance

cs.CL · 2026-04-14 · unverdicted · novelty 7.0

A new parallel reasoning dataset enables LLMs to shift reasoning to non-English languages via SFT and RLVR while matching or exceeding baseline performance.

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

cs.LG · 2025-02-07 · unverdicted · novelty 7.0

A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.

Multilingual Knowledge Transfer under Data Constraints via Lexical Interventions

cs.CL · 2026-05-22 · unverdicted · novelty 6.0

LINK improves cross-lingual knowledge transfer via lexical substitutions in English pretraining data, yielding notable downstream gains and up to 2x training speedup across eight languages and five model sizes.

UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

UB-SMoE balances expert utilization in heterogeneous federated SMoE fine-tuning via Dynamic Modulated Routing and Universal Pseudo-Gradient, delivering up to 45% compute reduction and 8.7x performance gains for low-resource clients over prior LoRA-rank methods.

SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

SAMoRA is a parameter-efficient fine-tuning framework that uses semantic-aware routing and task-adaptive scaling within a Mixture of LoRA Experts to improve multi-task performance and generalization over prior methods.

How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source Data

cs.CL · 2026-04-15 · unverdicted · novelty 6.0

Rephrasing web text into structured formats such as tables, math problems, FAQs, and tutorials produces higher-quality synthetic pretraining data than curated web baselines or prior synthetic methods, as demonstrated by trillion-token experiments and the resulting FinePhrase dataset that reduces gen

TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models

cs.LG · 2026-04-07 · unverdicted · novelty 6.0

TalkLoRA equips MoE-LoRA experts with a communication module that smooths routing dynamics and improves performance on language tasks under similar parameter budgets.

MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale Deployment

cs.LG · 2026-03-16 · unverdicted · novelty 6.0

MobileLLM-Flash creates 350M-1.4B parameter LLMs via latency-guided search and attention skipping, delivering up to 1.8x faster prefill and 1.6x faster decode on mobile CPUs with comparable or better quality.

Ultra-Low-Dimensional Prompt Tuning via Random Projection

cs.CL · 2025-02-06 · unverdicted · novelty 6.0

ULPT optimizes prompts in ultra-low dimensions with frozen random up-projection to cut training parameters by 98% while matching vanilla prompt tuning performance on NLP tasks.

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

cs.CL · 2024-10-23 · conditional · novelty 6.0

Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.

Efficient Training of Language Models to Fill in the Middle

cs.CL · 2022-07-28 · unverdicted · novelty 6.0

Autoregressive language models trained on data with middle spans relocated to the end learn infilling without degrading left-to-right perplexity or sampling quality.

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

cs.CL · 2022-04-14 · accept · novelty 6.0

GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.

IO-SVD: Input-Output Whitened SVD for Adaptive-Rank LLM Compression

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

IO-SVD performs SVD-based LLM compression by constructing a KL-aware double-sided whitening space and using first-order loss estimates for heterogeneous rank allocation.

MuonQ: Enhancing Low-Bit Muon Quantization via Directional Fidelity Optimization

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

MuonQ achieves stable 4-bit quantization of Muon optimizer states via pre-quantization normalization, singular component decomposition with power iteration, and μ-law companding, matching full-precision loss and accuracy on GPT and LLaMA models with up to 7.3x memory savings.

ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation

cs.CV · 2026-04-13 · unverdicted · novelty 5.0

ReSpinQuant achieves state-of-the-art accuracy in W4A4 and W3A3 LLM quantization by using efficient residual subspace rotation approximations that match layer-wise performance while retaining the inference speed of global rotation methods.

AIvaluateXR: An Evaluation Framework for on-Device AI in XR with Benchmarking Results

cs.DC · 2025-02-13 · unverdicted · novelty 5.0

AIvaluateXR benchmarks 17 LLMs across four XR platforms on performance, speed, memory and battery metrics and proposes a 3D Pareto optimality method to identify optimal on-device model-device pairs.

Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility

cs.LG · 2026-05-07 · unverdicted · novelty 4.0 · 2 refs

Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.

Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked?

cs.CL · 2025-07-21 · unverdicted · novelty 4.0

LLM accuracy on reasoning tasks differs significantly by question type, with step-by-step reasoning accuracy often uncorrelated to final answer selection.

citing papers explorer

Showing 19 of 19 citing papers.

ConQuR: Corner Aligned Activation Quantization via Optimized Rotations for LLMs cs.LG · 2026-05-11 · unverdicted · none · ref 34
ConQuR is a post-training rotation calibration technique that aligns activations to hypercube corners via Procrustes optimization and online updates, delivering competitive LLM quantization performance without end-to-end training or offline activation storage.
ReasonXL: Shifting LLM Reasoning Language Without Sacrificing Performance cs.CL · 2026-04-14 · unverdicted · none · ref 5
A new parallel reasoning dataset enables LLMs to shift reasoning to non-English languages via SFT and RLVR while matching or exceeding baseline performance.
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach cs.LG · 2025-02-07 · unverdicted · none · ref 128
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
Multilingual Knowledge Transfer under Data Constraints via Lexical Interventions cs.CL · 2026-05-22 · unverdicted · none · ref 28
LINK improves cross-lingual knowledge transfer via lexical substitutions in English pretraining data, yielding notable downstream gains and up to 2x training speedup across eight languages and five model sizes.
UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models cs.LG · 2026-05-15 · unverdicted · none · ref 61
UB-SMoE balances expert utilization in heterogeneous federated SMoE fine-tuning via Dynamic Modulated Routing and Universal Pseudo-Gradient, delivering up to 45% compute reduction and 8.7x performance gains for low-resource clients over prior LoRA-rank methods.
SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning cs.CL · 2026-04-21 · unverdicted · none · ref 42
SAMoRA is a parameter-efficient fine-tuning framework that uses semantic-aware routing and task-adaptive scaling within a Mixture of LoRA Experts to improve multi-task performance and generalization over prior methods.
How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source Data cs.CL · 2026-04-15 · unverdicted · none · ref 4
Rephrasing web text into structured formats such as tables, math problems, FAQs, and tutorials produces higher-quality synthetic pretraining data than curated web baselines or prior synthetic methods, as demonstrated by trillion-token experiments and the resulting FinePhrase dataset that reduces gen
TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models cs.LG · 2026-04-07 · unverdicted · none · ref 29
TalkLoRA equips MoE-LoRA experts with a communication module that smooths routing dynamics and improves performance on language tasks under similar parameter budgets.
MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale Deployment cs.LG · 2026-03-16 · unverdicted · none · ref 13
MobileLLM-Flash creates 350M-1.4B parameter LLMs via latency-guided search and attention skipping, delivering up to 1.8x faster prefill and 1.6x faster decode on mobile CPUs with comparable or better quality.
Ultra-Low-Dimensional Prompt Tuning via Random Projection cs.CL · 2025-02-06 · unverdicted · none · ref 46
ULPT optimizes prompts in ultra-low dimensions with frozen random up-projection to cut training parameters by 98% while matching vanilla prompt tuning performance on NLP tasks.
Scaling Diffusion Language Models via Adaptation from Autoregressive Models cs.CL · 2024-10-23 · conditional · none · ref 172
Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.
Efficient Training of Language Models to Fill in the Middle cs.CL · 2022-07-28 · unverdicted · none · ref 134
Autoregressive language models trained on data with middle spans relocated to the end learn infilling without degrading left-to-right perplexity or sampling quality.
GPT-NeoX-20B: An Open-Source Autoregressive Language Model cs.CL · 2022-04-14 · accept · none · ref 81
GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.
IO-SVD: Input-Output Whitened SVD for Adaptive-Rank LLM Compression cs.LG · 2026-05-15 · unverdicted · none · ref 33
IO-SVD performs SVD-based LLM compression by constructing a KL-aware double-sided whitening space and using first-order loss estimates for heterogeneous rank allocation.
MuonQ: Enhancing Low-Bit Muon Quantization via Directional Fidelity Optimization cs.LG · 2026-05-12 · unverdicted · none · ref 12
MuonQ achieves stable 4-bit quantization of Muon optimizer states via pre-quantization normalization, singular component decomposition with power iteration, and μ-law companding, matching full-precision loss and accuracy on GPT and LLaMA models with up to 7.3x memory savings.
ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation cs.CV · 2026-04-13 · unverdicted · none · ref 8
ReSpinQuant achieves state-of-the-art accuracy in W4A4 and W3A3 LLM quantization by using efficient residual subspace rotation approximations that match layer-wise performance while retaining the inference speed of global rotation methods.
AIvaluateXR: An Evaluation Framework for on-Device AI in XR with Benchmarking Results cs.DC · 2025-02-13 · unverdicted · none · ref 54
AIvaluateXR benchmarks 17 LLMs across four XR platforms on performance, speed, memory and battery metrics and proposes a 3D Pareto optimality method to identify optimal on-device model-device pairs.
Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility cs.LG · 2026-05-07 · unverdicted · none · ref 24 · 2 links
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.
Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked? cs.CL · 2025-07-21 · unverdicted · none · ref 27
LLM accuracy on reasoning tasks differs significantly by question type, with step-by-step reasoning accuracy often uncorrelated to final answer selection.

Noam Shazeer and Mitchell Stern

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer