hub

Attention is all you need.Advances in Neural Information Processing Systems, 30

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, Illia Polosukhin · 2017

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

browse 11 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 3 method 1

citation-polarity summary

background 3 use method 1

representative citing papers

What-Where Transformer: A Slot-Centric Visual Backbone for Concurrent Representation and Localization

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

The What-Where Transformer achieves explicit what-where separation in a ViT-style backbone via concurrent token and attention-map streams, yielding emergent object discovery from attention maps and better weakly-supervised localization.

Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity

stat.ML · 2026-05-08 · unverdicted · novelty 7.0

Every fixed finite feedforward neural network definable in an o-minimal structure has finite sample complexity in the agnostic PAC setting.

Rethinking Constraint Awareness for Efficient State Embedding of Neural Routing Solver

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

The CARM module boosts neural routing solvers by adaptively modulating embeddings with constraint variables, enabling better use of global observations and improved performance on constrained VRPs.

RDKV: Rate-Distortion Bit Allocation for Joint Eviction and Quantization of the KV Cache

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

RDKV derives per-token and per-channel weights from attention distortion, then uses reverse water-filling to assign bit-widths from full precision to zero after prefilling, recovering 97.81% accuracy with 2.48% cache retention on LongBench.

From Model to Data (M2D): Shifting Complexity from GNNs to Graphs for Transparent Graph Learning

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

M2D distillation augments input graphs with model-derived features and structure, letting simple student GNNs match teacher performance while exposing mechanisms such as attention and fairness directly in the data.

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

A shared global expert pool in MoE improves validation loss over per-layer experts and allows sublinear expert-parameter growth with depth.

Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

Temporal reasoning is not the core bottleneck for LLMs on time-based QA; the real issue is unstructured text-to-event mapping, addressed by a neuro-symbolic system with PIS that reaches 100% accuracy on benchmarks when representations are correct.

Exploring Time Conditioning in Diffusion Generative Models from Disjoint Noisy Data Manifolds

cs.LG · 2026-04-28 · unverdicted · novelty 5.0

Aligning the DDIM forward diffusion process with flow-matching manifold evolution enables high-quality generation without time conditioning, and class-conditional synthesis is possible with an unconditional denoiser by using separate time spaces per class.

EgoMotion: Hierarchical Reasoning and Diffusion for Egocentric Vision-Language Motion Generation

cs.CV · 2026-04-21 · unverdicted · novelty 5.0

EgoMotion decouples reasoning from motion synthesis in egocentric vision-language tasks by mapping inputs to motion primitives via VLM then using diffusion to produce grounded and coherent 3D trajectories.

Computational Hermeneutics: Evaluating generative AI as a cultural technology

cs.AI · 2026-03-31 · unverdicted · novelty 5.0

Generative AI should be evaluated through computational hermeneutics using iterative, human-inclusive benchmarks that measure cultural context rather than isolated model outputs.

RASALoRE: Region Aware Spatial Attention with Location-based Random Embeddings for Weakly Supervised Anomaly Detection in Brain MRI Scans

cs.CV · 2025-10-09 · conditional · novelty 5.0

A novel weakly supervised anomaly detection method for brain MRI that uses discriminative dual prompt tuning for pseudo masks and region-aware spatial attention with location-based random embeddings to achieve SOTA results with under 8 million parameters on BraTS and MSD datasets.

citing papers explorer

Showing 11 of 11 citing papers.

What-Where Transformer: A Slot-Centric Visual Backbone for Concurrent Representation and Localization cs.CV · 2026-05-12 · unverdicted · none · ref 70
The What-Where Transformer achieves explicit what-where separation in a ViT-style backbone via concurrent token and attention-map streams, yielding emergent object discovery from attention maps and better weakly-supervised localization.
Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity stat.ML · 2026-05-08 · unverdicted · none · ref 87
Every fixed finite feedforward neural network definable in an o-minimal structure has finite sample complexity in the agnostic PAC setting.
Rethinking Constraint Awareness for Efficient State Embedding of Neural Routing Solver cs.AI · 2026-05-11 · unverdicted · none · ref 16
The CARM module boosts neural routing solvers by adaptively modulating embeddings with constraint variables, enabling better use of global observations and improved performance on constrained VRPs.
RDKV: Rate-Distortion Bit Allocation for Joint Eviction and Quantization of the KV Cache cs.LG · 2026-05-08 · unverdicted · none · ref 37
RDKV derives per-token and per-channel weights from attention distortion, then uses reverse water-filling to assign bit-widths from full precision to zero after prefilling, recovering 97.81% accuracy with 2.48% cache retention on LongBench.
From Model to Data (M2D): Shifting Complexity from GNNs to Graphs for Transparent Graph Learning cs.LG · 2026-05-07 · unverdicted · none · ref 47
M2D distillation augments input graphs with model-derived features and structure, letting simple student GNNs match teacher performance while exposing mechanisms such as attention and fairness directly in the data.
UniPool: A Globally Shared Expert Pool for Mixture-of-Experts cs.LG · 2026-05-07 · unverdicted · none · ref 50
A shared global expert pool in MoE improves validation loss over per-layer experts and allows sublinear expert-parameter growth with depth.
Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA cs.AI · 2026-05-05 · unverdicted · none · ref 85
Temporal reasoning is not the core bottleneck for LLMs on time-based QA; the real issue is unstructured text-to-event mapping, addressed by a neuro-symbolic system with PIS that reaches 100% accuracy on benchmarks when representations are correct.
Exploring Time Conditioning in Diffusion Generative Models from Disjoint Noisy Data Manifolds cs.LG · 2026-04-28 · unverdicted · none · ref 39
Aligning the DDIM forward diffusion process with flow-matching manifold evolution enables high-quality generation without time conditioning, and class-conditional synthesis is possible with an unconditional denoiser by using separate time spaces per class.
EgoMotion: Hierarchical Reasoning and Diffusion for Egocentric Vision-Language Motion Generation cs.CV · 2026-04-21 · unverdicted · none · ref 50
EgoMotion decouples reasoning from motion synthesis in egocentric vision-language tasks by mapping inputs to motion primitives via VLM then using diffusion to produce grounded and coherent 3D trajectories.
Computational Hermeneutics: Evaluating generative AI as a cultural technology cs.AI · 2026-03-31 · unverdicted · none · ref 109
Generative AI should be evaluated through computational hermeneutics using iterative, human-inclusive benchmarks that measure cultural context rather than isolated model outputs.
RASALoRE: Region Aware Spatial Attention with Location-based Random Embeddings for Weakly Supervised Anomaly Detection in Brain MRI Scans cs.CV · 2025-10-09 · conditional · none · ref 32
A novel weakly supervised anomaly detection method for brain MRI that uses discriminative dual prompt tuning for pseudo masks and region-aware spatial attention with location-based random embeddings to achieve SOTA results with under 8 million parameters on BraTS and MSD datasets.

Attention is all you need.Advances in Neural Information Processing Systems, 30

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer