Title resolution pending

Xiong, Y · 2021 · arXiv 2102.03902

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

RACE Attention: A Strictly Linear-Time Attention Layer for Training on Outrageously Large Contexts

cs.LG · 2025-10-05 · unverdicted · novelty 7.0

RACE Attention is a strictly linear-time attention mechanism that approximates softmax attention outputs using Gaussian projections and soft LSH to enable training on contexts up to 12 million tokens.

HubRouter: A Pluggable Sub-Quadratic Routing Primitive for Hybrid Sequence Models

cs.LG · 2026-04-24 · unverdicted · novelty 6.0

HubRouter is a sub-quadratic routing primitive using learned hubs that replaces attention layers in hybrid models while delivering competitive perplexity and large throughput gains.

Seeing Further and Wider: Joint Spatio-Temporal Enlargement for Micro-Video Popularity Prediction

cs.MM · 2026-04-22 · unverdicted · novelty 5.0

A new joint spatio-temporal enlargement model for micro-video popularity prediction using frame scoring for long sequences and a topology-aware memory bank for unbounded historical associations.

What exactly did the Transformer learn from our physics data?

astro-ph.IM · 2025-05-27 · unverdicted · novelty 5.0

Transformers trained on cosmic ray simulations learn physically plausible features in positional encodings for symmetric air showers and in attention mechanisms for galaxy-origin particles.

citing papers explorer

Showing 4 of 4 citing papers.

RACE Attention: A Strictly Linear-Time Attention Layer for Training on Outrageously Large Contexts cs.LG · 2025-10-05 · unverdicted · none · ref 34
RACE Attention is a strictly linear-time attention mechanism that approximates softmax attention outputs using Gaussian projections and soft LSH to enable training on contexts up to 12 million tokens.
HubRouter: A Pluggable Sub-Quadratic Routing Primitive for Hybrid Sequence Models cs.LG · 2026-04-24 · unverdicted · none · ref 22
HubRouter is a sub-quadratic routing primitive using learned hubs that replaces attention layers in hybrid models while delivering competitive perplexity and large throughput gains.
Seeing Further and Wider: Joint Spatio-Temporal Enlargement for Micro-Video Popularity Prediction cs.MM · 2026-04-22 · unverdicted · none · ref 66
A new joint spatio-temporal enlargement model for micro-video popularity prediction using frame scoring for long sequences and a topology-aware memory bank for unbounded historical associations.
What exactly did the Transformer learn from our physics data? astro-ph.IM · 2025-05-27 · unverdicted · none · ref 23
Transformers trained on cosmic ray simulations learn physically plausible features in positional encodings for symmetric air showers and in attention mechanisms for galaxy-origin particles.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer