pith. sign in

hub

Axial attention in multidimensional transformers

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

hub tools

citation-role summary

background 3

citation-polarity summary

roles

background 3

polarities

background 3

representative citing papers

Training Agents Inside of Scalable World Models

cs.AI · 2025-09-29 · conditional · novelty 7.0

Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.

Video Diffusion Models

cs.CV · 2022-04-07 · unverdicted · novelty 7.0

A diffusion model for video generation extends image architectures with joint image-video training and improved conditional sampling, delivering first large-scale text-to-video results and state-of-the-art performance on video prediction and unconditional generation benchmarks.

Elastic Attention Cores for Scalable Vision Transformers

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.

YOLOv12: Attention-Centric Real-Time Object Detectors

cs.CV · 2025-02-18 · unverdicted · novelty 6.0

YOLOv12 is a new attention-based real-time object detector that reports higher accuracy than YOLOv10, YOLOv11, and RT-DETR variants at comparable or better speed and efficiency.

Jukebox: A Generative Model for Music

eess.AS · 2020-04-30 · unverdicted · novelty 6.0

Jukebox generates high-fidelity and diverse songs with singing and coherence up to multiple minutes by compressing raw audio via multi-scale VQ-VAE and modeling the codes with large autoregressive Transformers conditioned on artist, genre, and unaligned lyrics.

citing papers explorer

Showing 11 of 11 citing papers.

  • Training Agents Inside of Scalable World Models cs.AI · 2025-09-29 · conditional · none · ref 38

    Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.

  • Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads cs.LG · 2024-01-19 · conditional · none · ref 155

    Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.

  • Video Diffusion Models cs.CV · 2022-04-07 · unverdicted · none · ref 21

    A diffusion model for video generation extends image architectures with joint image-video training and improved conditional sampling, delivering first large-scale text-to-video results and state-of-the-art performance on video prediction and unconditional generation benchmarks.

  • Elastic Attention Cores for Scalable Vision Transformers cs.CV · 2026-05-12 · unverdicted · none · ref 92

    VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.

  • RT-Transformer: The Transformer Block as a Spherical State Estimator cs.LG · 2026-05-10 · unverdicted · none · ref 87

    Transformer components arise as the natural solution to precision-weighted directional state estimation on the hypersphere.

  • Earth System Foundation Model (ESFM): A unified framework for heterogeneous data integration and forecasting physics.ao-ph · 2026-04-20 · unverdicted · none · ref 19

    ESFM is a single open foundation model that unifies heterogeneous Earth data sources and forecasts missing regions while preserving inter-variable physical relationships.

  • MoBA: Mixture of Block Attention for Long-Context LLMs cs.LG · 2025-02-18 · unverdicted · none · ref 8

    MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.

  • YOLOv12: Attention-Centric Real-Time Object Detectors cs.CV · 2025-02-18 · unverdicted · none · ref 26

    YOLOv12 is a new attention-based real-time object detector that reports higher accuracy than YOLOv10, YOLOv11, and RT-DETR variants at comparable or better speed and efficiency.

  • Deformable DETR: Deformable Transformers for End-to-End Object Detection cs.CV · 2020-10-08 · accept · none · ref 6

    Deformable DETR achieves higher accuracy than DETR, especially on small objects, while converging in one-tenth the training epochs by using sparse deformable attention on image features.

  • Jukebox: A Generative Model for Music eess.AS · 2020-04-30 · unverdicted · none · ref 7

    Jukebox generates high-fidelity and diverse songs with singing and coherence up to multiple minutes by compressing raw audio via multi-scale VQ-VAE and modeling the codes with large autoregressive Transformers conditioned on artist, genre, and unaligned lyrics.

  • Multi-Block Attention for Efficient Channel Estimation in IRS-Assisted mmWave MIMO eess.SP · 2026-05-14 · unverdicted · none · ref 39

    A multi-block attention neural network reduces pilot overhead by 87% and NMSE by 51% at 10 dB SNR for cascaded channel estimation in IRS-assisted mmWave MIMO-OFDM systems.