RelFlexformers enable flexible integrable 3D RPE in attention via NU-FFT, generalizing prior methods to heterogeneous token positions with O(L log L) complexity.
Indoor segmentation and support inference from rgbd images
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3representative citing papers
Spatial Prediction pretext task learns spatial structure in self-supervised learning by regressing relative position and scale between image views, yielding more structured representations and better generalization.
Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.
citing papers explorer
-
RelFlexformer: Efficient Attention 3D-Transformers for Integrable Relative Positional Encodings
RelFlexformers enable flexible integrable 3D RPE in attention via NU-FFT, generalizing prior methods to heterogeneous token positions with O(L log L) complexity.
-
Learning to Perceive "Where": Spatial Pretext Tasks for Robust Self-Supervised Learning
Spatial Prediction pretext task learns spatial structure in self-supervised learning by regressing relative position and scale between image views, yielding more structured representations and better generalization.
-
Seed1.5-VL Technical Report
Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.