Attention is all you need.Advances in neural information processing systems, 30

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, Illia Polosukhin · 2017

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

SparseSplat: Towards Applicable Feed-Forward 3D Gaussian Splatting with Pixel-Unaligned Prediction

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

SparseSplat uses entropy-based probabilistic sampling and a specialized point cloud network to generate compact 3D Gaussian maps that retain high rendering quality with far fewer Gaussians than prior feed-forward methods.

STAC: Plug-and-Play Spatio-Temporal Aware Cache Compression for Streaming 3D Reconstruction

cs.CV · 2026-03-18 · unverdicted · novelty 7.0

STAC compresses KV caches in streaming 3D reconstruction transformers via temporal token preservation with decayed attention, spatial voxel compression, and chunked multi-frame optimization, delivering 10x memory reduction and 4x faster inference at SOTA quality.

HAD: Hallucination-Aware Diffusion Priors for 3D Reconstruction

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

HAD uses multi-view reasoning from a pre-trained feedforward NVS network to estimate and mask hallucination scores in diffusion priors, reducing artifacts and achieving SOTA novel view synthesis in sparse-view 3D reconstruction.

EdgeVTP: Exploration of Latency-efficient Trajectory Prediction for Edge-based Embedded Vision Applications

cs.CV · 2026-04-18 · unverdicted · novelty 6.0

EdgeVTP delivers the lowest measured end-to-end latency on Jetson-class platforms while matching or exceeding state-of-the-art accuracy on highway trajectory benchmarks by using bounded graph interactions and a one-shot curve decoder.

Saliency-R1: Enforcing Interpretable and Faithful Vision-language Reasoning via Saliency-map Alignment Reward

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

Saliency-R1 uses a novel saliency map technique and GRPO with human bounding-box overlap as reward to improve VLM reasoning faithfulness and interpretability.

FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision

cs.CV · 2025-12-17 · unverdicted · novelty 6.0

FlexAvatar introduces bias sinks in a transformer to unify monocular and multi-view training, yielding complete 3D head avatars with strong generalization and view extrapolation from single images.

Eevee: Towards Close-up High-resolution Video-based Virtual Try-on

cs.CV · 2025-11-24 · unverdicted · novelty 6.0

A new dataset with high-fidelity close-up garment images and full/close-up try-on videos plus the VGID metric enables better texture and structure preservation in high-resolution video virtual try-on.

LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

LangFlash introduces a feed-forward model for 3D language Gaussian splatting from sparse unposed images, claiming superior novel view synthesis and semantic consistency via enriched training data and sparse semantic encoding.

DietDelta: A Vision-Language Approach for Dietary Assessment via Before-and-After Images

cs.CV · 2026-04-07 · unverdicted · novelty 4.0

DietDelta uses vision-language prompts on paired before-and-after RGB images to localize food items, estimate their weights, and compute consumption differences, reporting better results than prior single-image methods on three public datasets.

citing papers explorer

Showing 9 of 9 citing papers.

SparseSplat: Towards Applicable Feed-Forward 3D Gaussian Splatting with Pixel-Unaligned Prediction cs.CV · 2026-04-03 · unverdicted · none · ref 31
SparseSplat uses entropy-based probabilistic sampling and a specialized point cloud network to generate compact 3D Gaussian maps that retain high rendering quality with far fewer Gaussians than prior feed-forward methods.
STAC: Plug-and-Play Spatio-Temporal Aware Cache Compression for Streaming 3D Reconstruction cs.CV · 2026-03-18 · unverdicted · none · ref 35
STAC compresses KV caches in streaming 3D reconstruction transformers via temporal token preservation with decayed attention, spatial voxel compression, and chunked multi-frame optimization, delivering 10x memory reduction and 4x faster inference at SOTA quality.
HAD: Hallucination-Aware Diffusion Priors for 3D Reconstruction cs.CV · 2026-05-16 · unverdicted · none · ref 37
HAD uses multi-view reasoning from a pre-trained feedforward NVS network to estimate and mask hallucination scores in diffusion priors, reducing artifacts and achieving SOTA novel view synthesis in sparse-view 3D reconstruction.
EdgeVTP: Exploration of Latency-efficient Trajectory Prediction for Edge-based Embedded Vision Applications cs.CV · 2026-04-18 · unverdicted · none · ref 70
EdgeVTP delivers the lowest measured end-to-end latency on Jetson-class platforms while matching or exceeding state-of-the-art accuracy on highway trajectory benchmarks by using bounded graph interactions and a one-shot curve decoder.
Saliency-R1: Enforcing Interpretable and Faithful Vision-language Reasoning via Saliency-map Alignment Reward cs.CV · 2026-04-06 · unverdicted · none · ref 70
Saliency-R1 uses a novel saliency map technique and GRPO with human bounding-box overlap as reward to improve VLM reasoning faithfulness and interpretability.
FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision cs.CV · 2025-12-17 · unverdicted · none · ref 50
FlexAvatar introduces bias sinks in a transformer to unify monocular and multi-view training, yielding complete 3D head avatars with strong generalization and view extrapolation from single images.
Eevee: Towards Close-up High-resolution Video-based Virtual Try-on cs.CV · 2025-11-24 · unverdicted · none · ref 52
A new dataset with high-fidelity close-up garment images and full/close-up try-on videos plus the VGID metric enables better texture and structure preservation in high-resolution video virtual try-on.
LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images cs.CV · 2026-05-22 · unverdicted · none · ref 39
LangFlash introduces a feed-forward model for 3D language Gaussian splatting from sparse unposed images, claiming superior novel view synthesis and semantic consistency via enriched training data and sparse semantic encoding.
DietDelta: A Vision-Language Approach for Dietary Assessment via Before-and-After Images cs.CV · 2026-04-07 · unverdicted · none · ref 37
DietDelta uses vision-language prompts on paired before-and-after RGB images to localize food items, estimate their weights, and compute consumption differences, reporting better results than prior single-image methods on three public datasets.

Attention is all you need.Advances in neural information processing systems, 30

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer