Tinybert: Distilling bert for natural language understanding

Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu · 2020

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

On-policy distillation works when student and teacher models share thinking patterns and the teacher adds new capabilities, with success tied to alignment on a small set of high-probability tokens.

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

cs.CL · 2026-02-17 · unverdicted · novelty 5.0

A distillation-plus-task-contrastive training regimen yields compact embedding models that match or exceed state-of-the-art performance for their size while supporting 32k-token contexts and quantization.

citing papers explorer

Showing 2 of 2 citing papers.

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe cs.LG · 2026-04-14 · unverdicted · none · ref 11
On-policy distillation works when student and teacher models share thinking patterns and the teacher adds new capabilities, with success tied to alignment on a small set of high-probability tokens.
jina-embeddings-v5-text: Task-Targeted Embedding Distillation cs.CL · 2026-02-17 · unverdicted · none · ref 4
A distillation-plus-task-contrastive training regimen yields compact embedding models that match or exceed state-of-the-art performance for their size while supporting 32k-token contexts and quantization.

Tinybert: Distilling bert for natural language understanding

fields

years

verdicts

representative citing papers

citing papers explorer