Minilmv2: Multi-head self-attention relation distillation for compressing pretrained transformers.arXiv preprint arXiv:2012.15828, 2020a

Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei · 2012 · arXiv 2012.15828

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM

cs.CL · 2025-10-20 · unverdicted · novelty 6.0

AtlasKV integrates billion-scale KGs into LLMs parametrically with sub-linear complexity and low memory by converting triples into key-value representations handled by the model's attention.

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

cs.CL · 2021-11-18 · accept · novelty 6.0

DeBERTaV3 improves DeBERTa by switching to replaced token detection pre-training and using gradient-disentangled embedding sharing, reaching 91.37% on GLUE and new SOTA on XNLI zero-shot.

Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency

cs.CL · 2026-04-27 · conditional · novelty 5.0

Widthwise pruning of LVLM language backbones combined with supervised finetuning and hidden-state distillation recovers over 95% performance using just 5% of data across 3B-7B models.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency cs.CL · 2026-04-27 · conditional · none · ref 41
Widthwise pruning of LVLM language backbones combined with supervised finetuning and hidden-state distillation recovers over 95% performance using just 5% of data across 3B-7B models.

Minilmv2: Multi-head self-attention relation distillation for compressing pretrained transformers.arXiv preprint arXiv:2012.15828, 2020a

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer