arXiv preprint arXiv:1804.03235 , year=

Anil, R · 2018 · arXiv 1804.03235

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Emerging Properties in Self-Supervised Vision Transformers

cs.CV · 2021-04-29 · conditional · novelty 8.0

Self-supervised ViTs show emergent semantic segmentation and 78.3% k-NN accuracy on ImageNet; DINO reaches 80.1% linear evaluation with ViT-Base.

Function-Space ADMM for Decentralized Federated Learning: A Control Theoretic Perspective

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

FedF-ADMM uses function-space ADMM updates projected via knowledge distillation plus a PI-like stabilization term to deliver faster, more stable convergence and higher accuracy than prior decentralized FL methods under severe non-IID conditions.

Enabling Federated Inference via Unsupervised Consensus Embedding

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

CE-FI maps heterogeneous model representations to a shared embedding space via unsupervised training on unlabeled data, enabling privacy-preserving federated inference that outperforms solo models on image classification benchmarks.

LatentBurst: A Fast and Efficient Multi Frame Super-Resolution for Hexadeca-Bayer Pattern CIS images

cs.CV · 2026-04-25 · unverdicted · novelty 6.0

LatentBurst is a new multi-frame super-resolution network for hexadeca-Bayer CIS images that uses pyramid latent alignment, an efficient UNet, and two-step knowledge distillation to handle motion and run on mobile devices.

HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.

The Ratchet Effect in Silico through Interaction-Driven Cumulative Intelligence in Large Language Models

cs.LG · 2025-07-25 · unverdicted · novelty 6.0

Populations of 1-4B parameter LLMs using peer verification and shared cultural memory achieve 8.8-18.9 point gains on mathematical reasoning tasks and close much of the gap to 70B+ single models.

Vision Transformers Need Registers

cs.CV · 2023-09-28 · unverdicted · novelty 6.0

Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.

Gemma 2: Improving Open Language Models at a Practical Size

cs.CL · 2024-07-31 · conditional · novelty 3.0

Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.

citing papers explorer

Showing 8 of 8 citing papers.

Emerging Properties in Self-Supervised Vision Transformers cs.CV · 2021-04-29 · conditional · none · ref 1
Self-supervised ViTs show emergent semantic segmentation and 78.3% k-NN accuracy on ImageNet; DINO reaches 80.1% linear evaluation with ViT-Base.
Function-Space ADMM for Decentralized Federated Learning: A Control Theoretic Perspective cs.LG · 2026-05-10 · unverdicted · none · ref 22
FedF-ADMM uses function-space ADMM updates projected via knowledge distillation plus a PI-like stabilization term to deliver faster, more stable convergence and higher accuracy than prior decentralized FL methods under severe non-IID conditions.
Enabling Federated Inference via Unsupervised Consensus Embedding cs.LG · 2026-05-07 · unverdicted · none · ref 17
CE-FI maps heterogeneous model representations to a shared embedding space via unsupervised training on unlabeled data, enabling privacy-preserving federated inference that outperforms solo models on image classification benchmarks.
LatentBurst: A Fast and Efficient Multi Frame Super-Resolution for Hexadeca-Bayer Pattern CIS images cs.CV · 2026-04-25 · unverdicted · none · ref 25
LatentBurst is a new multi-frame super-resolution network for hexadeca-Bayer CIS images that uses pyramid latent alignment, an efficient UNet, and two-step knowledge distillation to handle motion and run on mobile devices.
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering cs.AI · 2026-04-22 · unverdicted · none · ref 178
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
The Ratchet Effect in Silico through Interaction-Driven Cumulative Intelligence in Large Language Models cs.LG · 2025-07-25 · unverdicted · none · ref 6
Populations of 1-4B parameter LLMs using peer verification and shared cultural memory achieve 8.8-18.9 point gains on mathematical reasoning tasks and close much of the gap to 70B+ single models.
Vision Transformers Need Registers cs.CV · 2023-09-28 · unverdicted · none · ref 182
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
Gemma 2: Improving Open Language Models at a Practical Size cs.CL · 2024-07-31 · conditional · none · ref 159
Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.

arXiv preprint arXiv:1804.03235 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer