Self-supervised ViTs show emergent semantic segmentation and 78.3% k-NN accuracy on ImageNet; DINO reaches 80.1% linear evaluation with ViT-Base.
arXiv preprint arXiv:1804.03235 , year=
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
FedF-ADMM uses function-space ADMM updates projected via knowledge distillation plus a PI-like stabilization term to deliver faster, more stable convergence and higher accuracy than prior decentralized FL methods under severe non-IID conditions.
CE-FI maps heterogeneous model representations to a shared embedding space via unsupervised training on unlabeled data, enabling privacy-preserving federated inference that outperforms solo models on image classification benchmarks.
LatentBurst is a new multi-frame super-resolution network for hexadeca-Bayer CIS images that uses pyramid latent alignment, an efficient UNet, and two-step knowledge distillation to handle motion and run on mobile devices.
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
Populations of 1-4B parameter LLMs using peer verification and shared cultural memory achieve 8.8-18.9 point gains on mathematical reasoning tasks and close much of the gap to 70B+ single models.
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.
citing papers explorer
-
Emerging Properties in Self-Supervised Vision Transformers
Self-supervised ViTs show emergent semantic segmentation and 78.3% k-NN accuracy on ImageNet; DINO reaches 80.1% linear evaluation with ViT-Base.
-
Function-Space ADMM for Decentralized Federated Learning: A Control Theoretic Perspective
FedF-ADMM uses function-space ADMM updates projected via knowledge distillation plus a PI-like stabilization term to deliver faster, more stable convergence and higher accuracy than prior decentralized FL methods under severe non-IID conditions.
-
Enabling Federated Inference via Unsupervised Consensus Embedding
CE-FI maps heterogeneous model representations to a shared embedding space via unsupervised training on unlabeled data, enabling privacy-preserving federated inference that outperforms solo models on image classification benchmarks.
-
LatentBurst: A Fast and Efficient Multi Frame Super-Resolution for Hexadeca-Bayer Pattern CIS images
LatentBurst is a new multi-frame super-resolution network for hexadeca-Bayer CIS images that uses pyramid latent alignment, an efficient UNet, and two-step knowledge distillation to handle motion and run on mobile devices.
-
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
-
The Ratchet Effect in Silico through Interaction-Driven Cumulative Intelligence in Large Language Models
Populations of 1-4B parameter LLMs using peer verification and shared cultural memory achieve 8.8-18.9 point gains on mathematical reasoning tasks and close much of the gap to 70B+ single models.
-
Vision Transformers Need Registers
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
-
Gemma 2: Improving Open Language Models at a Practical Size
Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.