ArXiv , year=

Distilling the Knowledge in a Neural Network , author=

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

representative citing papers

Fast Inference from Transformers via Speculative Decoding

cs.LG · 2022-11-30 · accept · novelty 7.0

Speculative decoding accelerates exact sampling from large autoregressive models by 2-3x on T5-XXL by running smaller approximation models in parallel to propose token sequences that the large model then verifies in batches while preserving the original output distribution.

Teaching Molecular Dynamics to a Non-Autoregressive Ionic Transport Predictor

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Non-autoregressive ionic transport predictor learns dynamics from auxiliary trajectory data during training only, achieving over 200x speedup versus autoregressive models and lower error than non-autoregressive baselines on both dataset types.

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

cs.CL · 2019-10-02 · unverdicted · novelty 6.0

DistilBERT compresses BERT by 40% via pre-training distillation with a triple loss, retaining 97% performance and running 60% faster.

Beyond Overlap Metrics: Rewarding Reasoning and Preferences for Faithful Multi-Role Dialogue Summarization

cs.CL · 2026-04-19 · unverdicted · novelty 5.0

A reasoning-distillation plus dual-reward GRPO method for multi-role dialogue summarization matches ROUGE and BERTScore baselines while improving factual faithfulness and preference alignment on CSDS and SAMSum.

A Case-Driven Multi-Agent Framework for E-Commerce Search Relevance

cs.IR · 2026-05-07 · unverdicted · novelty 3.0

A case-driven multi-agent system automates the full pipeline of bad-case detection, annotation, and resolution for e-commerce search relevance using Annotator, Optimizer, and User agents plus supporting components.

citing papers explorer

Showing 5 of 5 citing papers.

Fast Inference from Transformers via Speculative Decoding cs.LG · 2022-11-30 · accept · none · ref 8
Speculative decoding accelerates exact sampling from large autoregressive models by 2-3x on T5-XXL by running smaller approximation models in parallel to propose token sequences that the large model then verifies in batches while preserving the original output distribution.
Teaching Molecular Dynamics to a Non-Autoregressive Ionic Transport Predictor cs.LG · 2026-05-10 · unverdicted · none · ref 55
Non-autoregressive ionic transport predictor learns dynamics from auxiliary trajectory data during training only, achieving over 200x speedup versus autoregressive models and lower error than non-autoregressive baselines on both dataset types.
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter cs.CL · 2019-10-02 · unverdicted · none · ref 5
DistilBERT compresses BERT by 40% via pre-training distillation with a triple loss, retaining 97% performance and running 60% faster.
Beyond Overlap Metrics: Rewarding Reasoning and Preferences for Faithful Multi-Role Dialogue Summarization cs.CL · 2026-04-19 · unverdicted · none · ref 53
A reasoning-distillation plus dual-reward GRPO method for multi-role dialogue summarization matches ROUGE and BERTScore baselines while improving factual faithfulness and preference alignment on CSDS and SAMSum.
A Case-Driven Multi-Agent Framework for E-Commerce Search Relevance cs.IR · 2026-05-07 · unverdicted · none · ref 58
A case-driven multi-agent system automates the full pipeline of bad-case detection, annotation, and resolution for e-commerce search relevance using Annotator, Optimizer, and User agents plus supporting components.

ArXiv , year=

fields

years

verdicts

representative citing papers

citing papers explorer