pith. sign in

Mixed citations

Towards understanding ensemble, knowledge distillation and self-distillation in deep learning

Mixed citation behavior. Most common role is background (60%).

9 Pith papers citing it
Background 60% of classified citations

citation-role summary

background 4 baseline 1

citation-polarity summary

representative citing papers

Benign Overfitting in Adversarial Training for Vision Transformers

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Adversarial training on simplified Vision Transformers achieves benign overfitting with near-zero robust loss and generalization error when signal-to-noise ratio and perturbation budget meet specific conditions.

Hierarchical Mixture-of-Experts with Two-Stage Optimization

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Hi-MoE uses two-level hierarchical routing objectives to enforce group-level balance while promoting within-group specialization, yielding better perplexity and expert utilization than prior MoE baselines in NLP and vision tasks.

Provable Knowledge Acquisition and Extraction in One-Layer Transformers

cs.LG · 2025-07-28 · unverdicted · novelty 6.0

In a stylized one-layer transformer, pre-training encodes factual knowledge via relation-specific feature directions and attention patterns; fine-tuning extracts it through a relation-covering mechanism that succeeds when enough latent templates are triggered, with a failure regime explaining inauds

citing papers explorer

Showing 9 of 9 citing papers.