hub

Axial attention in multidimensional transformers

Ho, J · 1912 · arXiv 1912.12180

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Training Agents Inside of Scalable World Models

cs.AI · 2025-09-29 · conditional · novelty 7.0

Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

cs.LG · 2024-01-19 · conditional · novelty 7.0

Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.

Video Diffusion Models

cs.CV · 2022-04-07 · unverdicted · novelty 7.0

A diffusion model for video generation extends image architectures with joint image-video training and improved conditional sampling, delivering first large-scale text-to-video results and state-of-the-art performance on video prediction and unconditional generation benchmarks.

Elastic Attention Cores for Scalable Vision Transformers

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.

RT-Transformer: The Transformer Block as a Spherical State Estimator

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Transformer components arise as the natural solution to precision-weighted directional state estimation on the hypersphere.

Earth System Foundation Model (ESFM): A unified framework for heterogeneous data integration and forecasting

physics.ao-ph · 2026-04-20 · unverdicted · novelty 6.0

ESFM is a single open foundation model that unifies heterogeneous Earth data sources and forecasts missing regions while preserving inter-variable physical relationships.

MoBA: Mixture of Block Attention for Long-Context LLMs

cs.LG · 2025-02-18 · unverdicted · novelty 6.0

MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.

YOLOv12: Attention-Centric Real-Time Object Detectors

cs.CV · 2025-02-18 · unverdicted · novelty 6.0

YOLOv12 is a new attention-based real-time object detector that reports higher accuracy than YOLOv10, YOLOv11, and RT-DETR variants at comparable or better speed and efficiency.

Deformable DETR: Deformable Transformers for End-to-End Object Detection

cs.CV · 2020-10-08 · accept · novelty 6.0

Deformable DETR achieves higher accuracy than DETR, especially on small objects, while converging in one-tenth the training epochs by using sparse deformable attention on image features.

Jukebox: A Generative Model for Music

eess.AS · 2020-04-30 · unverdicted · novelty 6.0

Jukebox generates high-fidelity and diverse songs with singing and coherence up to multiple minutes by compressing raw audio via multi-scale VQ-VAE and modeling the codes with large autoregressive Transformers conditioned on artist, genre, and unaligned lyrics.

Multi-Block Attention for Efficient Channel Estimation in IRS-Assisted mmWave MIMO

eess.SP · 2026-05-14 · unverdicted · novelty 5.0

A multi-block attention neural network reduces pilot overhead by 87% and NMSE by 51% at 10 dB SNR for cascaded channel estimation in IRS-assisted mmWave MIMO-OFDM systems.

citing papers explorer

Showing 11 of 11 citing papers.

Training Agents Inside of Scalable World Models cs.AI · 2025-09-29 · conditional · none · ref 38
Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads cs.LG · 2024-01-19 · conditional · none · ref 155
Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.
Video Diffusion Models cs.CV · 2022-04-07 · unverdicted · none · ref 21
A diffusion model for video generation extends image architectures with joint image-video training and improved conditional sampling, delivering first large-scale text-to-video results and state-of-the-art performance on video prediction and unconditional generation benchmarks.
Elastic Attention Cores for Scalable Vision Transformers cs.CV · 2026-05-12 · unverdicted · none · ref 92
VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.
RT-Transformer: The Transformer Block as a Spherical State Estimator cs.LG · 2026-05-10 · unverdicted · none · ref 87
Transformer components arise as the natural solution to precision-weighted directional state estimation on the hypersphere.
Earth System Foundation Model (ESFM): A unified framework for heterogeneous data integration and forecasting physics.ao-ph · 2026-04-20 · unverdicted · none · ref 19
ESFM is a single open foundation model that unifies heterogeneous Earth data sources and forecasts missing regions while preserving inter-variable physical relationships.
MoBA: Mixture of Block Attention for Long-Context LLMs cs.LG · 2025-02-18 · unverdicted · none · ref 8
MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.
YOLOv12: Attention-Centric Real-Time Object Detectors cs.CV · 2025-02-18 · unverdicted · none · ref 26
YOLOv12 is a new attention-based real-time object detector that reports higher accuracy than YOLOv10, YOLOv11, and RT-DETR variants at comparable or better speed and efficiency.
Deformable DETR: Deformable Transformers for End-to-End Object Detection cs.CV · 2020-10-08 · accept · none · ref 6
Deformable DETR achieves higher accuracy than DETR, especially on small objects, while converging in one-tenth the training epochs by using sparse deformable attention on image features.
Jukebox: A Generative Model for Music eess.AS · 2020-04-30 · unverdicted · none · ref 7
Jukebox generates high-fidelity and diverse songs with singing and coherence up to multiple minutes by compressing raw audio via multi-scale VQ-VAE and modeling the codes with large autoregressive Transformers conditioned on artist, genre, and unaligned lyrics.
Multi-Block Attention for Efficient Channel Estimation in IRS-Assisted mmWave MIMO eess.SP · 2026-05-14 · unverdicted · none · ref 39
A multi-block attention neural network reduces pilot overhead by 87% and NMSE by 51% at 10 dB SNR for cascaded channel estimation in IRS-assisted mmWave MIMO-OFDM systems.

Axial attention in multidimensional transformers

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer