PAD-Rec augments standard draft models with item-position and step-position embeddings plus learnable gates, delivering up to 3.1x wall-clock speedup and 5% average gain over strong speculative-decoding baselines on four datasets while largely preserving recommendation quality.
Second order derivatives for network pruning: Optimal brain surgeon
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
SLaB compresses LLM weights via sparse-lowrank-binary decomposition guided by activation-aware scores, achieving up to 36% lower perplexity than prior methods at 50% compression on Llama models.
Fed-FSTQ reduces uplink traffic by 46x and improves time-to-accuracy by 52% in federated LLM fine-tuning using Fisher-guided token quantization and selection.
citing papers explorer
-
Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation
PAD-Rec augments standard draft models with item-position and step-position embeddings plus learnable gates, delivering up to 3.1x wall-clock speedup and 5% average gain over strong speculative-decoding baselines on four datasets while largely preserving recommendation quality.
-
SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models
SLaB compresses LLM weights via sparse-lowrank-binary decomposition guided by activation-aware scores, achieving up to 36% lower perplexity than prior methods at 50% compression on Llama models.
-
FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices
Fed-FSTQ reduces uplink traffic by 46x and improves time-to-accuracy by 52% in federated LLM fine-tuning using Fisher-guided token quantization and selection.