hub

arXiv preprint arXiv:1904.12848 (Apr 2019)

Xie, Q · 1904 · arXiv 1904.12848

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Emerging Properties in Self-Supervised Vision Transformers

cs.CV · 2021-04-29 · conditional · novelty 8.0

Self-supervised ViTs show emergent semantic segmentation and 78.3% k-NN accuracy on ImageNet; DINO reaches 80.1% linear evaluation with ViT-Base.

Longformer: The Long-Document Transformer

cs.CL · 2020-04-10 · accept · novelty 7.0

Longformer uses local windowed attention plus task-specific global attention to achieve linear scaling and state-of-the-art results on long-document language modeling, QA, and summarization after pretraining.

A Simple Framework for Contrastive Learning of Visual Representations

cs.LG · 2020-02-13 · accept · novelty 7.0

SimCLR learns visual representations by contrasting augmented views of the same image and reaches 76.5% ImageNet top-1 accuracy with a linear classifier, matching a supervised ResNet-50.

Unsupervised Cross-lingual Representation Learning at Scale

cs.CL · 2019-11-05 · conditional · novelty 7.0

XLM-R, pretrained on 100 languages with 2TB of CommonCrawl data, improves average XNLI accuracy by 14.6 points and MLQA F1 by 13 points over mBERT while matching strong monolingual models on GLUE.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

cs.CL · 2019-06-19 · accept · novelty 7.0

XLNet is a generalized autoregressive pretraining method that learns bidirectional contexts via permutation-based factorization and outperforms BERT on 20 NLP tasks.

Revisiting Feature Prediction for Learning Visual Representations from Video

cs.CV · 2024-02-15 · conditional · novelty 6.0

V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.

Vision Transformers Need Registers

cs.CV · 2023-09-28 · unverdicted · novelty 6.0

Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.

Graph Star Net for Generalized Multi-Task Learning

cs.SI · 2019-06-21 · unverdicted · novelty 6.0

GraphStar is a new GNN that adds star nodes and relay attention to achieve non-local representations for node, graph, and link tasks, claiming 2-5% gains over prior SOTA on benchmarks.

Invariance-inducing regularization using worst-case transformations suffices to boost accuracy and spatial robustness

cs.LG · 2019-06-26 · unverdicted · novelty 5.0

Invariance-inducing regularization using worst-case transformations reduces relative error by 20% on CIFAR10 transformed examples, improves standard accuracy on SVHN, outperforms equivariant networks, and proves no accuracy-robustness trade-off in the infinite data limit.

Efficient data augmentation using graph imputation neural networks

stat.ML · 2019-06-20 · unverdicted · novelty 5.0

Graph imputation neural networks augment semi-supervised datasets up to 10x by reconstructing heavily damaged samples on a similarity graph, improving over fully-supervised baselines on benchmarks.

Voice Biomarkers for Depression and Anxiety

cs.LG · 2026-05-11 · unverdicted · novelty 4.0

Deep learning models extract content-agnostic voice biomarkers for depression and anxiety from a ~65k-utterance proprietary dataset, achieving 71% sensitivity and specificity when combined with lexical features.

Domain-Specific Query Understanding for Automotive Applications: A Modular and Scalable Approach

cs.IR · 2026-01-16 · unverdicted · novelty 4.0

Decomposing automotive query understanding into a lightweight classification stage followed by specialized entity extraction yields better accuracy and lower latency than joint single-step processing.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Emerging Properties in Self-Supervised Vision Transformers cs.CV · 2021-04-29 · conditional · none · ref 75
Self-supervised ViTs show emergent semantic segmentation and 78.3% k-NN accuracy on ImageNet; DINO reaches 80.1% linear evaluation with ViT-Base.

arXiv preprint arXiv:1904.12848 (Apr 2019)

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer