Advances in neural information processing systems , volume=

Learning both weights, connections for efficient neural network , author=

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

representative citing papers

Improving Dictionary Learning with Gated Sparse Autoencoders

cs.LG · 2024-04-24 · unverdicted · novelty 7.0

Gated SAEs decouple which features to use from how large their activations should be, applying the L1 penalty only to selection and thereby eliminating shrinkage while halving the number of firing features needed for good fidelity.

HORST: Composing Optimizer Geometries for Sparse Transformer Training

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

HORST uses non-commutative operator composition and a hyperbolic mirror map to combine stability from adaptive optimizers with L1 sparsity bias, outperforming AdamW across sparsity levels on vision and language tasks.

GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models

cs.AI · 2026-04-21 · unverdicted · novelty 6.0

GRASPrune removes 50% of parameters from LLaMA-2-7B via global gating and projected straight-through estimation, reaching 12.18 WikiText-2 perplexity and competitive zero-shot accuracy after four epochs on 512 calibration sequences.

TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

Task-aware pruning improves OOD performance by removing layers that distort task-adapted representation profiles, realigning OOD inputs with the geometry observed on ID data.

FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion

cs.LG · 2026-04-21 · unverdicted · novelty 5.0

FedProxy replaces weak adapters with a proxy SLM for federated LLM fine-tuning, outperforming prior methods and approaching centralized performance via compression, heterogeneity-aware aggregation, and training-free fusion.

citing papers explorer

Showing 5 of 5 citing papers.

Improving Dictionary Learning with Gated Sparse Autoencoders cs.LG · 2024-04-24 · unverdicted · none · ref 183
Gated SAEs decouple which features to use from how large their activations should be, applying the L1 penalty only to selection and thereby eliminating shrinkage while halving the number of firing features needed for good fidelity.
HORST: Composing Optimizer Geometries for Sparse Transformer Training cs.LG · 2026-05-20 · unverdicted · none · ref 160
HORST uses non-commutative operator composition and a hyperbolic mirror map to combine stability from adaptive optimizers with L1 sparsity bias, outperforming AdamW across sparsity levels on vision and language tasks.
GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models cs.AI · 2026-04-21 · unverdicted · none · ref 11
GRASPrune removes 50% of parameters from LLaMA-2-7B via global gating and projected straight-through estimation, reaching 12.18 WikiText-2 perplexity and competitive zero-shot accuracy after four epochs on 512 calibration sequences.
TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability cs.LG · 2026-05-14 · unverdicted · none · ref 41
Task-aware pruning improves OOD performance by removing layers that distort task-adapted representation profiles, realigning OOD inputs with the geometry observed on ID data.
FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion cs.LG · 2026-04-21 · unverdicted · none · ref 135
FedProxy replaces weak adapters with a proxy SLM for federated LLM fine-tuning, outperforming prior methods and approaching centralized performance via compression, heterogeneity-aware aggregation, and training-free fusion.

Advances in neural information processing systems , volume=

fields

years

verdicts

representative citing papers

citing papers explorer