super hub Mixed citations

Attention is all you need.Advances in neural information processing systems, 30

Aidan N Gomez, Ashish Vaswani, Jakob Uszkoreit, Llion Jones, Niki Parmar, Noam Shazeer · 2017

Mixed citation behavior. Most common role is background (52%).

138 Pith papers citing it

Background 52% of classified citations

browse 138 citing papers more from Aidan N Gomez

hub tools

JSON dossier citing papers JSON

citation-role summary

background 17 method 12 baseline 2 dataset 1 other 1

citation-polarity summary

background 17 use method 12 baseline 2 unclear 1 use dataset 1

claims ledger

background and block-sparse FlashAttentionenable longer context in Transformers, yielding higher quality models (0.7 better perplexity on GPT-2 and 6.4 points of lift on long-document classiﬁcation) and entirely new capabilities: the ﬁrst Transformers to achieve better-than-chance performance on the Path-X challenge (seq. length 16K, 61.4% accuracy) and Path-256 (seq. length 64K, 63.1% accuracy). 1 Introduction Transformer models [82] have emerged as the most widely used architecture in applications such a
method would differ in the network structure (i.e., G̸=G ′). Then, there exists (f, g) such that the population trajectories({x i(t+k)}, G(t+k)and({x ′ i(t+k)}, G ′(t+k)diverge for allk >0. Single-task agentic systems either treat observations as independent and identically distributed (i.i.d.) [93] or the dependencies are modeled globally through a full attention mechanism [ 95]. Neither captures the topology-constrained local observability that characterizes real social systems. In a MASS, G is an ir

authors

Aidan N Gomez Ashish Vaswani Jakob Uszkoreit Llion Jones Niki Parmar Noam Shazeer

co-cited works

representative citing papers

Any-Dimensional Invariant Universality

cs.LG · 2026-05-22 · unverdicted · novelty 8.0

A systematic approach maps any-dimensional invariant functions to a unique function on an infinite-dimensional limit space admitting a topology with compact sets where universality holds, with examples of non-universal architectures and fixes.

Embedding Dimension Lower Bounds for Universality of Deep Sets and Janossy Pooling

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

New lower bounds establish that Deep Sets need embedding dimension linear in the number of points (up to constants) for d>1, and give the first non-trivial bounds for higher-order Janossy pooling.

Rotation Equivariant Mamba for Vision Tasks

cs.CV · 2026-03-10 · unverdicted · novelty 8.0

EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.

Weisfeiler-Leman Is Incomplete on Simple Spectrum Graphs, so Canonicalize Them

cs.LG · 2026-05-22 · unverdicted · novelty 7.0

k-WL is incomplete on simple spectrum graphs; PRiSM is the first provably complete canonicalization for their eigendecompositions.

Contrastive Distribution Matching for Amortized Sequential Monte Carlo in Discrete Diffusion

cs.LG · 2026-05-22 · unverdicted · novelty 7.0

CDM amortizes SMC inference for reward-tilted discrete diffusion by training a parameterized twist function on contrastive samples with closed-form kernels.

Classical State Preparation for Variational Quantum Algorithms via Reinforcement Learning

quant-ph · 2026-05-22 · unverdicted · novelty 7.0

CRiSP uses neural-guided MCTS and curriculum learning to insert Clifford prefixes before parameterized rotations in VQAs, yielding mean 3.17x and max 45x gains in energy accuracy on 22-qubit QAOA benchmarks versus prior Clifford initializers.

Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

FISolver trains a compact LLM on backward-generated (differential equation, first integral) pairs and uses guided reinforcement learning to outperform larger models and Mathematica on first-integral benchmarks at lower cost.

DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation

cs.RO · 2026-05-20 · unverdicted · novelty 7.0

A hypernetwork generates complete task-specific visuomotor policy parameters from instructions alone to structurally eliminate observation leakage in language-conditioned robotic control.

Forecasting megaelectron-volt electron flux in the Earth's outer radiation belt using supervised machine learning algorithms and a timeseries foundation model

astro-ph.IM · 2026-05-15 · unverdicted · novelty 7.0

Hybrid TimesFM plus ridge regression on covariates forecasts 1-MeV electron flux with average R² of 0.9 on out-of-sample 2024 data, outperforming linear regression, CNN, LSTM and Transformer models.

Dynamic Chunking for Diffusion Language Models

cs.CL · 2026-05-15 · unverdicted · novelty 7.0

DCDM replaces positional blocks with learnable semantic chunks via differentiable Chunking Attention, yielding consistent gains over block and unstructured diffusion baselines up to 1.5B parameters.

SurvivalPFN: Amortizing Survival Prediction via In-Context Bayesian Inference

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.

Neural-Schwarz Tiling for Geometry-Universal PDE Solving at Scale

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Local neural operators on 3x3x3 patches, composed via Schwarz iteration, solve large-scale nonlinear elasticity on arbitrary geometries without domain-specific retraining.

Pretraining Strategies and Scaling for ECG Foundation Models: A Systematic Study

eess.SP · 2026-05-12 · unverdicted · novelty 7.0

Contrastive predictive coding pretraining combined with structured state space models yields the strongest ECG foundation models, with continued gains from scaling data to 11 million samples.

Can Graphs Help Vision SSMs See Better?

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.

DeepL\'evy: Learning Heavy-Tailed Uncertainty in Highly Volatile Time Series

cs.LG · 2026-05-11 · unverdicted · novelty 7.0 · 3 refs

DeepLévy learns mixtures of Lévy stable distributions for heavy-tailed time series forecasting by minimizing discrepancies between empirical and parametric characteristic functions, outperforming prior methods on tail risk metrics under extreme volatility.

TIDES: Implicit Time-Awareness in Selective State Space Models

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

TIDES reconciles selective SSM expressivity with continuous-time physical discretization by moving input dependence onto the state matrix, enabling native irregular time series handling and achieving SOTA on UEA and Physiome-ODE benchmarks.

PRIM: Meta-Learned Bayesian Root Cause Analysis

cs.LG · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

PRIM meta-learns a Model-Averaged Causal Estimation transformer to perform Bayesian RCA by marginalizing structural uncertainty over synthetic causal priors, achieving 17ms inference on systems up to 100 variables.

MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

MLS-Bench is a benchmark with 140 tasks that evaluates AI agents on inventing generalizable and scalable ML methods, finding they lag human performance especially in insight-driven invention rather than tuning.

LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.

Masks Can Talk: Extracting Structured Text Information from Single-Modal Images for Remote Sensing Change Detection

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

S2M extracts structured text quadruples from change masks to provide noise-free multimodal supervision, achieving 17.80% Sek and 66.14% F_scd on the new Gaza-Change-v2 dataset and outperforming LLM-based multimodal methods.

LookWhen? Fast Video Recognition by Learning When, Where, and What to Compute

cs.CV · 2026-05-07 · conditional · novelty 7.0

LookWhen factorizes video recognition into learning when, where, and what to compute via uniqueness-based token selection and dual-teacher distillation, achieving better accuracy-FLOPs trade-offs than baselines on multiple datasets.

How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

DAPRO provides the first dynamic, theoretically guaranteed way to allocate interaction budgets across test cases for bounding time-to-event in multi-turn LLM evaluations, achieving tighter coverage than static conformal survival methods.

FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.

TrajShield: Trajectory-Level Safety Mediation for Defending Text-to-Video Models Against Jailbreak Attacks

cs.CV · 2026-05-03 · unverdicted · novelty 7.0

TrajShield is a training-free defense that reduces jailbreak success rates by 52.44% on average in text-to-video models by localizing and neutralizing risks through trajectory simulation and causal intervention.

citing papers explorer

Showing 35 of 35 citing papers after filters.

Rotation Equivariant Mamba for Vision Tasks cs.CV · 2026-03-10 · unverdicted · none · ref 7
EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.
Can Graphs Help Vision SSMs See Better? cs.CV · 2026-05-11 · unverdicted · none · ref 57
GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.
LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling cs.CV · 2026-05-08 · unverdicted · none · ref 38
LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.
Masks Can Talk: Extracting Structured Text Information from Single-Modal Images for Remote Sensing Change Detection cs.CV · 2026-05-08 · unverdicted · none · ref 30
S2M extracts structured text quadruples from change masks to provide noise-free multimodal supervision, achieving 17.80% Sek and 66.14% F_scd on the new Gaza-Change-v2 dataset and outperforming LLM-based multimodal methods.
LookWhen? Fast Video Recognition by Learning When, Where, and What to Compute cs.CV · 2026-05-07 · conditional · none · ref 1
LookWhen factorizes video recognition into learning when, where, and what to compute via uniqueness-based token selection and dual-teacher distillation, achieving better accuracy-FLOPs trade-offs than baselines on multiple datasets.
TrajShield: Trajectory-Level Safety Mediation for Defending Text-to-Video Models Against Jailbreak Attacks cs.CV · 2026-05-03 · unverdicted · none · ref 16
TrajShield is a training-free defense that reduces jailbreak success rates by 52.44% on average in text-to-video models by localizing and neutralizing risks through trajectory simulation and causal intervention.
LAION-5B: An open large-scale dataset for training next generation image-text models cs.CV · 2022-10-16 · accept · none · ref 84
LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.
AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild cs.CV · 2026-05-21 · unverdicted · none · ref 67
AnyMo pre-trains a graph encoder on physics-simulated multi-placement IMU data and aligns full-body motion tokens with LLMs to enable zero-shot activity recognition, retrieval, and captioning across unseen datasets and setups.
Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models cs.CV · 2026-05-20 · unverdicted · none · ref 43
AVIS applies autoregressive diffusion models to video inverse problems by streaming restoration with measurement-consistent initialization, reducing latency from 114s to 4s and raising throughput to 1.18 FPS (or 5.91 FPS in the Flash variant).
PIXLRelight: Controllable Relighting via Intrinsic Conditioning cs.CV · 2026-05-18 · unverdicted · none · ref 41
A transformer-based neural renderer that transfers arbitrary PBR lighting to single images via shared intrinsic conditioning extracted from both multi-illumination photos and path-traced coarse 3D renders.
SpecSem-Net: Integrating Spectral and Semantic Features for Robust AI-generated Video Detection cs.CV · 2026-05-17 · unverdicted · none · ref 39
SpecSem-Net integrates Fourier-based spectral filtering with semantic-guided gated merging to detect AI-generated videos, reporting 87.25% accuracy on a new benchmark of five commercial generators and 95.59% on public datasets.
Registers Matter for Pixel-Space Diffusion Transformers cs.CV · 2026-05-15 · unverdicted · none · ref 4
Register tokens enhance pixel-space DiT training and output quality via cleaner high-noise feature maps, and a dual-stream design adds further gains with little overhead.
Invaria: Learning Scale and Density Invariance in Point Clouds via Next-Resolution Prediction cs.CV · 2026-05-15 · unverdicted · none · ref 14
Invaria trains point cloud encoders with next-resolution prediction to learn scale and density invariant features, yielding higher mIoU on ScanNet under lower resolution and scaled objects while using a smaller model.
Nonlinear Bipolar Compensation: Handling Outliers in Post-Training Quantization cs.CV · 2026-05-14 · unverdicted · none · ref 35
Nonlinear Bipolar Compensation with Bipolar Logarithmic Transformation reduces outlier effects in post-training quantization by performing compensation in a compressed transformed space.
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation cs.CV · 2026-05-14 · conditional · none · ref 44
InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.
Self-Consistent Latent Reasoning: Long Latent Sequence Reasoning for Vision-Language Model cs.CV · 2026-05-12 · unverdicted · none · ref 42 · 2 links
SCOLAR fixes information gain collapse in latent visual reasoning by generating independent auxiliary visual tokens via a detransformer, extending acceptable CoT length over 30x and delivering +14.12% gains on reasoning benchmarks.
RigidFormer: Learning Rigid Dynamics using Transformers cs.CV · 2026-05-09 · unverdicted · none · ref 37
RigidFormer learns mesh-free rigid dynamics from point clouds using object-centric anchors, Anchor-Vertex Pooling, Anchor-based RoPE, and differentiable Kabsch alignment to enforce rigidity.
DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring cs.CV · 2026-05-06 · unverdicted · none · ref 11
DART is a cross-modal foundation model that delivers rope damage classification, severity regression, and few-shot recognition from a single frozen representation trained on 4270 images across 14 damage classes.
Linear-Time Global Visual Modeling without Explicit Attention cs.CV · 2026-05-03 · unverdicted · none · ref 31
Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.
NeuroSymb-MRG: Differentiable Abductive Reasoning with Active Uncertainty Minimization for Radiology Report Generation cs.CV · 2026-03-02 · unverdicted · none · ref 2
NeuroSymb-MRG uses differentiable logic chains and uncertainty-driven sampling to produce more factually consistent radiology reports than standard encoder-decoder or retrieval methods.
Efficient Score Pre-computation for Diffusion Models via Cross-Matrix Krylov Projection cs.CV · 2025-11-19 · unverdicted · none · ref 19
Cross-matrix Krylov projection reuses shared subspaces from seed matrices to accelerate score pre-computation in diffusion models, delivering 15.8-43.7% time savings and up to 115x speedup versus DDPM baselines.
PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference cs.CV · 2024-05-23 · unverdicted · none · ref 23
PipeFusion applies patch partitioning and pipeline parallelism with one-step stale feature reuse to reduce communication overhead in DiT inference, reporting SOTA results on 8x L40 GPUs for Pixart, SD3, and Flux.1.
AVI-HT: Adaptive Vision-IMU Fusion for 3D Hand Tracking cs.CV · 2026-05-20 · unverdicted · none · ref 35
AVI-HT adaptively fuses vision and IMU data via attention to cut 3D hand keypoint error by 16.1% (24.2% wrist-aligned) on a new 100K+ sample DexGloveHOI dataset in occluded hand-object scenarios.
MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset cs.CV · 2026-05-20 · unverdicted · none · ref 94
MONET is an open 104.9M image-text pair dataset created via safety filtering, deduplication, and multi-VLM recaptioning from 2.9B raw pairs, validated by training a competitive 4B-parameter latent diffusion model.
SynVA: A Modular Toolkit for Vessel Generation and Aneurysm Editing cs.CV · 2026-05-13 · unverdicted · none · ref 91
SynVA toolkit generates realistic vascular meshes and anatomically plausible aneurysms, releasing 50,000 labeled samples for medical vision tasks.
Perceptual Flow Network for Visually Grounded Reasoning cs.CV · 2026-05-04 · unverdicted · none · ref 49
PFlowNet decouples perception from reasoning, integrates multi-dimensional rewards with vicinal geometric shaping via variational RL, and reports new SOTA results on V* Bench (90.6%) and MME-RealWorld-lite (67.0%).
Colinearity Decay: Training Quantization-Friendly ViTs with Outlier Decay cs.CV · 2026-05-02 · unverdicted · none · ref 1
Colinearity-Decay regularizer trains ViTs that maintain or improve full-precision accuracy while delivering higher accuracy after low-bit quantization on ImageNet and COCO tasks.
Deploy DINO with Many-to-Many Association cs.CV · 2026-04-26 · unverdicted · none · ref 40
DINO features combined with many-to-many association and the proposed Harmonic Consensus Maximization enable general visual features to compete with specialized models on out-of-distribution image matching and camera pose estimation.
Enhancing Text-to-Image Diffusion Transformer via Split-Text Conditioning cs.CV · 2025-05-25 · unverdicted · none · ref 7
DiT-ST converts complete-text captions into split-text primitives via LLMs and injects them hierarchically across denoising stages to reduce semantic confusion in DiT-based text-to-image generation.
ZAYA1-VL-8B Technical Report cs.CV · 2026-05-08 · unverdicted · none · ref 71
ZAYA1-VL-8B is a new MoE vision-language model with vision-specific LoRA adapters and bidirectional image attention that reports competitive performance against several 3B-4B models on image, reasoning, and counting benchmarks.
ViBE: Visual-to-M/EEG Brain Encoding via Spatio-Temporal VAE and Distribution-Aligned Projection cs.CV · 2026-04-29 · unverdicted · none · ref 15
ViBE generates M/EEG signals from visual stimuli by reconstructing neural responses with a TSC-VAE and aligning CLIP image features to its latent space via Q-Former, MSE, and sliced Wasserstein losses.
VAGNet: Vision-based Accident Anticipation with Global Features cs.CV · 2026-04-10 · unverdicted · none · ref 42
VAGNet anticipates accidents in dashcam videos using global features from VideoMAE-V2 combined with transformers and graphs, reporting higher average precision and mean time-to-accident on four benchmarks while running more efficiently than prior methods.
Representation learning from OCT images cs.CV · 2026-05-04 · unverdicted · none · ref 63
A structured survey of representation learning methods for retinal OCT image analysis, covering supervised, self-supervised, generative, multimodal, and foundation model approaches along with datasets and open problems.
HunyuanImage 3.0 Technical Report cs.CV · 2025-09-28 · unreviewed · ref 11
VRAG: Learning World Models for Interactive Video Generation cs.CV · 2025-05-28 · unreviewed · ref 33

Attention is all you need.Advances in neural information processing systems, 30

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer