{"total":15,"items":[{"citing_arxiv_id":"2607.00927","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Post-Training Pruning for Diffusion Transformers","primary_cat":"cs.CV","submitted_at":"2026-07-01T13:30:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DiT-Pruning introduces an energy-based saliency metric balancing weights and activations plus clustering-aware granularity for post-training pruning of DiTs, showing near-zero CLIP score degradation at 50% sparsity on FLUX.1-dev.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.26538","ref_index":24,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"CascadeFormer: Depth-Tapered Transformers Motivated by Gradient Fan-in Asymmetry","primary_cat":"cs.LG","submitted_at":"2026-06-25T02:25:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CascadeFormer tapers Transformer width with depth based on gradient fan-in asymmetry to match uniform baselines in perplexity while cutting latency 8.6%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.25324","ref_index":65,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Efficient Remote Sensing Instance Segmentation with Linear-Time State Space Distilled Visual Foundation Models","primary_cat":"cs.CV","submitted_at":"2026-06-24T02:41:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"RS4D distills ViT knowledge into SSM backbones for remote sensing instance segmentation, delivering 8x fewer parameters and 9x fewer FLOPs than ViT methods while matching or exceeding accuracy on SSDD, WHU, and NWPU datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00535","ref_index":102,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"DREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation","primary_cat":"cs.LG","submitted_at":"2026-05-30T05:05:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DREAM-S combines neural architecture search, target-aware supernet training, and attention-entropy-guided distillation to accelerate speculative decoding in VLMs, reporting up to 3.85x speedup over standard methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16470","ref_index":21,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Strategic Over-Parameterization for Generalizable Low-Rank Adaptation","primary_cat":"cs.LG","submitted_at":"2026-05-15T12:26:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LoRA-Over injects auxiliary parameters into low-rank adapters during training and decomposes them back into standard LoRA at inference, with static or dynamic scheduling to allocate extra capacity where needed, yielding better generalization than vanilla LoRA on GLUE, MT-Bench, GSM8K and HumanEval.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08840","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing","primary_cat":"cs.CL","submitted_at":"2026-05-09T09:49:32+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ReST-KV formulates KV eviction as layer-wise output reconstruction optimization with spatial-temporal smoothing, outperforming baselines by 2.58% on LongBench and 15.2% on RULER while cutting decoding latency by 10.61x at 128k context.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"We deﬁne the eviction indicator It[n] as the reconstruction error of the MHA output caused by removing the n-th KV pair. Speciﬁcally, the eviction indicator is given by: It[n] = MHA (xt, ⟨KT , VT ⟩) − MHA xt, ⟨KT,\\n, VT,\\n⟩ \u0001 2 , (15) where Kt,\\n and Vt,\\n represent the set of cache keys and values with the n-th KV pair removed. Using Eq. 3 and Eq. 4, we can expand Eq. 15 as follows: It[n] = AtVT WO − At,\\nVT,\\nWO 2 (16) where At,\\n represents the attention weights with the n-th KV pair removed, and VT,\\n represents the values corresponding to the remaining cache sets after the removal of the n-th KV pair. Further, we expand the matrix computation into a weighted sum form as: It[n] = X m At[m]vmWO − X m̸=n At,\\n[m]vmWO 2 (17) where At[m] and At,\\n[m] represent the attention weights for the m-th query in the presence and"},{"citing_arxiv_id":"2604.16943","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation","primary_cat":"cs.CL","submitted_at":"2026-04-18T09:54:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MNAFT identifies language-agnostic and language-specific neurons via activation analysis and selectively fine-tunes only relevant ones in MLLMs to close the modality gap and outperform full fine-tuning and other methods on image translation benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08971","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference","primary_cat":"cs.LG","submitted_at":"2026-04-10T05:26:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SentryFuse delivers modality-aware zero-shot pruning and sparse attention that improves accuracy by 12.7% on average and up to 18% under sensor dropout while cutting memory 28.2% and latency up to 1.63x across multimodal edge models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.11089","ref_index":2,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"TPV: Parameter Perturbations Through the Lens of Test Prediction Variance","primary_cat":"stat.ML","submitted_at":"2025-12-11T20:04:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TPV measures first-order sensitivity of model outputs to parameter perturbations, unifies robustness analysis under one lens, proves train-to-test convergence in overparameterized limits, and enables label-free pruning and model selection applications.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.17530","ref_index":27,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Deep-OFDM: Neural Modulation for High Mobility","primary_cat":"cs.IT","submitted_at":"2025-06-21T00:57:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A CNN modulator jointly trained with a neural receiver spreads information across local time-frequency neighborhoods in OFDM, breaking QAM rotational symmetry to support sparse or zero pilots under high Doppler.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2404.06230","ref_index":75,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Aggressive or Imperceptible, or Both: Network Pruning Assisted Hybrid Byzantines in Federated Learning","primary_cat":"cs.LG","submitted_at":"2024-04-09T11:42:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A hybrid sparse Byzantine attack using network pruning insights and slow accumulation bypasses eight state-of-the-art defenses in federated learning simulations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2310.02277","ref_index":39,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs \"Difficult\" Downstream Tasks in LLMs","primary_cat":"cs.LG","submitted_at":"2023-09-29T22:55:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Pruning small-magnitude weights from pre-trained LLMs causes monotonic irreversible performance degradation on difficult downstream tasks, supporting the Junk DNA Hypothesis that these weights hold essential knowledge.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2306.14048","ref_index":59,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models","primary_cat":"cs.LG","submitted_at":"2023-06-24T20:11:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"H2O evicts non-heavy-hitter tokens from the KV cache using a dynamic submodular policy, retaining recent and frequent-co-occurrence tokens to reduce memory while preserving accuracy.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"through weight equalization and bias correction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1325-1334, 2019. [58] Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Chris De Sa, and Zhiru Zhang. Improving neural network quantization without retraining using outlier channel splitting. In International conference on machine learning, pages 7543-7552. PMLR, 2019. [59] Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. Pruning convo- lutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440, 2016. [60] Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270, 2018. [61] Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, and Yi Yang."},{"citing_arxiv_id":"1907.00274","ref_index":43,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"NetTailor: Tuning the Architecture, Not Just the Weights","primary_cat":"cs.CV","submitted_at":"2019-06-29T20:32:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"NetTailor adapts CNN architecture for new tasks by assembling pre-trained universal blocks with task-specific layers, trained via activation mimicry and complexity penalties to match accuracy while reducing size for simpler tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.10337","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"COP: Customized Deep Model Compression via Regularized Correlation-Based Filter-Level Pruning","primary_cat":"cs.CV","submitted_at":"2019-06-25T06:15:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"COP prunes CNN filters using correlation-based importance with global normalization and dual regularization on parameter quantity and FLOPs to enable customized compression.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}