Title resolution pending

Masked autoencoders are scalable vision learners , author=

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2

citation-polarity summary

unclear 2

representative citing papers

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

cs.CV · 2023-10-09 · unverdicted · novelty 7.0

A new shared video-image tokenizer enables large language models to surpass diffusion models on standard visual generation benchmarks.

Protein Fold Classification at Scale: Benchmarking and Pretraining

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

Introduces TEDBench benchmark and MiAE self-supervised framework that outperforms baselines for large-scale protein fold classification.

RELO: Reinforcement Learning to Localize for Visual Object Tracking

cs.CV · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

RELO formulates visual object tracking localization as a Markov decision process solved by reinforcement learning with combined IoU and AUC rewards, augmented by layer-aligned temporal token propagation, and reports 57.5% AUC on LaSOText without template updates.

PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

cs.CV · 2023-09-30 · accept · novelty 6.0

PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.

Vision Transformers Need Registers

cs.CV · 2023-09-28 · unverdicted · novelty 6.0

Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.

Layer by Layer: Uncovering Hidden Representations in Language Models

cs.LG · 2025-02-04 · unverdicted · novelty 5.0

Intermediate layers in LLMs consistently provide stronger features than final layers across tasks and architectures, as quantified by a new framework of information-theoretic, geometric, and invariance metrics.

citing papers explorer

Showing 6 of 6 citing papers.

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation cs.CV · 2023-10-09 · unverdicted · none · ref 193
A new shared video-image tokenizer enables large language models to surpass diffusion models on standard visual generation benchmarks.
Protein Fold Classification at Scale: Benchmarking and Pretraining cs.LG · 2026-05-18 · unverdicted · none · ref 19
Introduces TEDBench benchmark and MiAE self-supervised framework that outperforms baselines for large-scale protein fold classification.
RELO: Reinforcement Learning to Localize for Visual Object Tracking cs.CV · 2026-05-08 · unverdicted · none · ref 5 · 2 links
RELO formulates visual object tracking localization as a Markov decision process solved by reinforcement learning with combined IoU and AUC rewards, augmented by layer-aligned temporal token propagation, and reports 57.5% AUC on LaSOText without template updates.
PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis cs.CV · 2023-09-30 · accept · none · ref 125
PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.
Vision Transformers Need Registers cs.CV · 2023-09-28 · unverdicted · none · ref 18
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
Layer by Layer: Uncovering Hidden Representations in Language Models cs.LG · 2025-02-04 · unverdicted · none · ref 145
Intermediate layers in LLMs consistently provide stronger features than final layers across tasks and architectures, as quantified by a new framework of information-theoretic, geometric, and invariance metrics.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer