pith. sign in

hub

Slicegpt: Compress large language models by deleting rows and columns

25 Pith papers cite this work. Polarity classification is still indexing.

25 Pith papers citing it

hub tools

citation-role summary

background 4

citation-polarity summary

roles

background 4

polarities

background 4

clear filters

representative citing papers

End-to-End Dynamic Sparsity for Resource-Adaptive LLM Inference

cs.IR · 2026-06-26 · unverdicted · novelty 6.0

L2A trains one LLM with input-and-budget-conditioned gates to adapt sparsity across layers, heads, and tokens, tracing the compute-accuracy frontier while staying within 0.6% of dense performance at 34% layer sparsity on tested models.

DOT-MoE: Differentiable Optimal Transport for MoEfication

cs.LG · 2026-06-01 · unverdicted · novelty 6.0

DOT-MoE uses differentiable optimal transport and straight-through estimators to partition FFN layers into capacity-constrained experts, outperforming heuristic baselines in retaining 90% performance at 50% active parameters.

Compact SO(3) Equivariant Atomistic Foundation Models via Structural Pruning

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Structural pruning of SO(3) equivariant atomistic models from large checkpoints yields 1.5-4x fewer parameters and 2.5-4x less pre-training compute than small models trained from scratch, while outperforming them on most Matbench Discovery metrics and downstream tasks.

RAP: Runtime Adaptive Pruning for LLM Inference

cs.LG · 2025-05-22 · unverdicted · novelty 5.0

RAP is a reinforcement learning framework for runtime-adaptive pruning of LLMs that jointly optimizes model weights and KV-cache usage under varying memory budgets.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.