Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, Illia Polosukhin · 2017

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

VINS-120K supplies the first large-scale set of instruction-image-edited-image triplets at ultra-high resolution together with an adaptation strategy that improves detail synthesis.

WD-FQDet: Multispectral Detection Transformer via Wavelet Decomposition and Frequency-aware Query Learning

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

WD-FQDet decouples modality-shared and modality-specific features in infrared-visible images via wavelet-based frequency decomposition and frequency-aware query selection to achieve state-of-the-art detection performance.

AssemblyBench: Physics-Aware Assembly of Complex Industrial Objects

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

AssemblyBench dataset and AssemblyDyno transformer model enable physics-aware prediction of assembly sequences and trajectories for complex industrial objects from multimodal instructions and 3D shapes.

Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

C-MET transfers emotions from speech to facial video by learning cross-modal semantic vectors with pretrained audio and disentangled expression encoders, yielding 14% higher emotion accuracy on MEAD and CREMA-D even for unseen emotions.

Space-Time Forecasting of Dynamic Scenes with Motion-aware Gaussian Grouping

cs.CV · 2026-02-25 · unverdicted · novelty 7.0

MoGaF groups Gaussians by motion in 4D splatting representations to enable stable long-term forecasting of dynamic scenes.

Pointer-CAD: Unifying B-Rep and Command Sequences via Pointer-based Edges & Faces Selection

cs.CV · 2026-03-04 · unverdicted · novelty 6.0

Pointer-CAD unifies B-Rep geometry with command sequences via pointer-based entity selection, allowing LLMs to perform complex CAD edits while cutting topological errors from quantization.

SGSoft: Learning Fused Semantic-Geometric Features for 3D Shape Correspondence via Template-Guided Soft Signals

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

SGSoft introduces a template-guided pipeline that fuses semantic and geometric features to learn dense correspondences across deformable 3D shapes with claimed SOTA generalization and real-time efficiency.

SD-MVSum: Script-Driven Multimodal Video Summarization Method and Datasets

cs.CV · 2025-10-07 · conditional · novelty 5.0

SD-MVSum extends script-driven video summarization to multimodal inputs by modeling script-video and script-transcript relevance with a new weighted cross-modal attention mechanism, plus extended S-VideoXum and MrHiSum datasets.

citing papers explorer

Showing 8 of 8 citing papers.

VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset cs.CV · 2026-05-22 · unverdicted · none · ref 44
VINS-120K supplies the first large-scale set of instruction-image-edited-image triplets at ultra-high resolution together with an adaptation strategy that improves detail synthesis.
WD-FQDet: Multispectral Detection Transformer via Wavelet Decomposition and Frequency-aware Query Learning cs.CV · 2026-05-13 · unverdicted · none · ref 39
WD-FQDet decouples modality-shared and modality-specific features in infrared-visible images via wavelet-based frequency decomposition and frequency-aware query selection to achieve state-of-the-art detection performance.
AssemblyBench: Physics-Aware Assembly of Complex Industrial Objects cs.CV · 2026-05-13 · unverdicted · none · ref 37
AssemblyBench dataset and AssemblyDyno transformer model enable physics-aware prediction of assembly sequences and trajectories for complex industrial objects from multimodal instructions and 3D shapes.
Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video cs.CV · 2026-04-09 · unverdicted · none · ref 57
C-MET transfers emotions from speech to facial video by learning cross-modal semantic vectors with pretrained audio and disentangled expression encoders, yielding 14% higher emotion accuracy on MEAD and CREMA-D even for unseen emotions.
Space-Time Forecasting of Dynamic Scenes with Motion-aware Gaussian Grouping cs.CV · 2026-02-25 · unverdicted · none · ref 34
MoGaF groups Gaussians by motion in 4D splatting representations to enable stable long-term forecasting of dynamic scenes.
Pointer-CAD: Unifying B-Rep and Command Sequences via Pointer-based Edges & Faces Selection cs.CV · 2026-03-04 · unverdicted · none · ref 43
Pointer-CAD unifies B-Rep geometry with command sequences via pointer-based entity selection, allowing LLMs to perform complex CAD edits while cutting topological errors from quantization.
SGSoft: Learning Fused Semantic-Geometric Features for 3D Shape Correspondence via Template-Guided Soft Signals cs.CV · 2026-05-18 · unverdicted · none · ref 66
SGSoft introduces a template-guided pipeline that fuses semantic and geometric features to learn dense correspondences across deformable 3D shapes with claimed SOTA generalization and real-time efficiency.
SD-MVSum: Script-Driven Multimodal Video Summarization Method and Datasets cs.CV · 2025-10-07 · conditional · none · ref 35
SD-MVSum extends script-driven video summarization to multimodal inputs by modeling script-video and script-transcript relevance with a new weighted cross-modal attention mechanism, plus extended S-VideoXum and MrHiSum datasets.

Attention is all you need.Advances in neural information processing systems, 30, 2017

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer