Vision transformers need registers

Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski · 2024

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 unclear 1

representative citing papers

Measuring Maximum Activations in Open Large Language Models

cs.CL · 2026-05-15 · unverdicted · novelty 7.0

Maximum activations in modern open LLMs span nearly four orders of magnitude across families, with MoE models exhibiting 14-23x lower peaks than dense counterparts and residual streams carrying the global max in most cases.

Sink vs. diagonal patterns as mechanisms for attention switch and oversmoothing prevention

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Sinks are equivalent to hard attention switches that zero out outputs and are cheaper than diagonal patterns when self-communication is allowed, closing the gap between oversmoothing prevention needs and what sinks provide.

Resolving Representation Ambiguity in Feedforward Novel View Synthesis Transformer via Semantic-Spatial Decoupling

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

Decouples semantic and spatial tokens in NVS transformers to resolve representation ambiguity, yielding consistent gains with near-zero added latency.

bViT: Investigating Single-Block Recurrence in Vision Transformers for Image Recognition

cs.CV · 2026-05-11 · unverdicted · novelty 5.0

A 12-step single-block recurrent ViT-B reaches accuracy comparable to a standard ViT-B on ImageNet-1K while using an order of magnitude fewer parameters.

citing papers explorer

Showing 4 of 4 citing papers.

Measuring Maximum Activations in Open Large Language Models cs.CL · 2026-05-15 · unverdicted · none · ref 8
Maximum activations in modern open LLMs span nearly four orders of magnitude across families, with MoE models exhibiting 14-23x lower peaks than dense counterparts and residual streams carrying the global max in most cases.
Sink vs. diagonal patterns as mechanisms for attention switch and oversmoothing prevention cs.LG · 2026-05-08 · unverdicted · none · ref 6
Sinks are equivalent to hard attention switches that zero out outputs and are cheaper than diagonal patterns when self-communication is allowed, closing the gap between oversmoothing prevention needs and what sinks provide.
Resolving Representation Ambiguity in Feedforward Novel View Synthesis Transformer via Semantic-Spatial Decoupling cs.CV · 2026-05-18 · unverdicted · none · ref 6
Decouples semantic and spatial tokens in NVS transformers to resolve representation ambiguity, yielding consistent gains with near-zero added latency.
bViT: Investigating Single-Block Recurrence in Vision Transformers for Image Recognition cs.CV · 2026-05-11 · unverdicted · none · ref 4
A 12-step single-block recurrent ViT-B reaches accuracy comparable to a standard ViT-B on ImageNet-1K while using an order of magnitude fewer parameters.

Vision transformers need registers

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer