International conference on machine learning , pages=

Sparsegpt: Massive language models can be accurately pruned in one-shot , author= · 2023

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Search Your Block Floating Point Scales!

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.

GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models

cs.AI · 2026-04-21 · unverdicted · novelty 6.0

GRASPrune removes 50% of parameters from LLaMA-2-7B via global gating and projected straight-through estimation, reaching 12.18 WikiText-2 perplexity and competitive zero-shot accuracy after four epochs on 512 calibration sequences.

Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

AdaLeZO uses a non-stationary multi-armed bandit to adaptively allocate perturbation budget across layers in zeroth-order optimization and applies inverse probability weighting to reduce variance while preserving unbiased gradients, delivering 1.7x-3.0x wall-clock speedup on LLaMA and OPT models.

TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

Task-aware pruning improves OOD performance by removing layers that distort task-adapted representation profiles, realigning OOD inputs with the geometry observed on ID data.

citing papers explorer

Showing 4 of 4 citing papers.

Search Your Block Floating Point Scales! cs.LG · 2026-05-12 · unverdicted · none · ref 39
ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.
GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models cs.AI · 2026-04-21 · unverdicted · none · ref 13
GRASPrune removes 50% of parameters from LLaMA-2-7B via global gating and projected straight-through estimation, reaching 12.18 WikiText-2 perplexity and competitive zero-shot accuracy after four epochs on 512 calibration sequences.
Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling cs.LG · 2026-04-20 · unverdicted · none · ref 36
AdaLeZO uses a non-stationary multi-armed bandit to adaptively allocate perturbation budget across layers in zeroth-order optimization and applies inverse probability weighting to reduce variance while preserving unbiased gradients, delivering 1.7x-3.0x wall-clock speedup on LLaMA and OPT models.
TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability cs.LG · 2026-05-14 · unverdicted · none · ref 17
Task-aware pruning improves OOD performance by removing layers that distort task-adapted representation profiles, realigning OOD inputs with the geometry observed on ID data.

International conference on machine learning , pages=

fields

years

verdicts

representative citing papers

citing papers explorer