Big Transfer (BiT): General Visual Repre- sentation Learning

Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby · 2020 · arXiv 1912.11370

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

Silhouette Loss: Differentiable Global Structure Learning for Deep Representations

cs.LG · 2026-03-27 · unverdicted · novelty 8.0

Soft Silhouette Loss offers a batch-global differentiable metric to promote intra-class compactness and inter-class separation in learned representations, boosting performance when hybridized with cross-entropy and contrastive losses.

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

cs.CV · 2021-03-25 · accept · novelty 8.0

Swin Transformer reaches 87.3% ImageNet accuracy and sets new records on COCO detection and ADE20K segmentation by replacing global self-attention with shifted-window local attention inside a hierarchical pyramid.

DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

cs.RO · 2026-05-18 · unverdicted · novelty 6.0

DexHoldem is a new benchmark providing 1,470 teleoperated demonstrations across 14 manipulation primitives, plus standardized tests for dexterous policy execution and agentic perception in a physical Texas Hold'em setting.

Florence: A New Foundation Model for Computer Vision

cs.CV · 2021-11-22 · unverdicted · novelty 6.0

Florence is a new vision foundation model that learns universal visual-language representations from web-scale data and reports state-of-the-art results on 44 benchmarks including 83.74% zero-shot ImageNet top-1 accuracy.

A Large-Scale Study on the Accuracy vs Cost Trade-offs of Training and Evaluation Settings in Fine-Grained Image Recognition

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

Large-scale experiments demonstrate that data-aware augmentations applied only during training allow fine-grained image models to reach high accuracy without using discriminative crops at inference, lowering costs.

citing papers explorer

Showing 5 of 5 citing papers.

Silhouette Loss: Differentiable Global Structure Learning for Deep Representations cs.LG · 2026-03-27 · unverdicted · none · ref 3
Soft Silhouette Loss offers a batch-global differentiable metric to promote intra-class compactness and inter-class separation in learned representations, boosting performance when hybridized with cross-entropy and contrastive losses.
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows cs.CV · 2021-03-25 · accept · none · ref 38
Swin Transformer reaches 87.3% ImageNet accuracy and sets new records on COCO detection and ADE20K segmentation by replacing global self-attention with shifted-window local attention inside a hierarchical pyramid.
DexHoldem: Playing Texas Hold'em with Dexterous Embodied System cs.RO · 2026-05-18 · unverdicted · none · ref 29
DexHoldem is a new benchmark providing 1,470 teleoperated demonstrations across 14 manipulation primitives, plus standardized tests for dexterous policy execution and agentic perception in a physical Texas Hold'em setting.
Florence: A New Foundation Model for Computer Vision cs.CV · 2021-11-22 · unverdicted · none · ref 12
Florence is a new vision foundation model that learns universal visual-language representations from web-scale data and reports state-of-the-art results on 44 benchmarks including 83.74% zero-shot ImageNet top-1 accuracy.
A Large-Scale Study on the Accuracy vs Cost Trade-offs of Training and Evaluation Settings in Fine-Grained Image Recognition cs.CV · 2026-05-18 · unverdicted · none · ref 14
Large-scale experiments demonstrate that data-aware augmentations applied only during training allow fine-grained image models to reach high accuracy without using discriminative crops at inference, lowering costs.

Big Transfer (BiT): General Visual Repre- sentation Learning

fields

years

verdicts

representative citing papers

citing papers explorer