Original DSKD-CMA Method To demonstrate how our methods fit into the existing framework, we provide a brief overview of DSKD-CMA [3]

METHODS 3

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Dual-Space Knowledge Distillation with Key-Query Matching for Large Language Models with Vocabulary Mismatch

cs.CL · 2026-03-23 · unverdicted · novelty 6.0

The authors introduce DSKD-CMA-GA using generative adversarial learning to fix key-query distribution mismatches in cross-tokenizer knowledge distillation, reporting modest average ROUGE-L gains of 0.37 especially on out-of-distribution data.

citing papers explorer

Showing 1 of 1 citing paper.

Dual-Space Knowledge Distillation with Key-Query Matching for Large Language Models with Vocabulary Mismatch cs.CL · 2026-03-23 · unverdicted · none · ref 3
The authors introduce DSKD-CMA-GA using generative adversarial learning to fix key-query distribution mismatches in cross-tokenizer knowledge distillation, reporting modest average ROUGE-L gains of 0.37 especially on out-of-distribution data.

Original DSKD-CMA Method To demonstrate how our methods fit into the existing framework, we provide a brief overview of DSKD-CMA [3]

fields

years

verdicts

representative citing papers

citing papers explorer