Quo vadis, action recognition? a new model and the kinetics dataset

Joao Carreira, Andrew Zisserman · 2017

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions

cs.CV · 2026-04-30 · unverdicted · novelty 7.0

TransVLM formalizes Shot Transition Detection as identifying full temporal transition segments rather than single cut points and introduces a VLM that injects optical flow as a motion prior via simple feature fusion, plus a synthetic data engine and benchmark.

Video Diffusion Models

cs.CV · 2022-04-07 · unverdicted · novelty 7.0

A diffusion model for video generation extends image architectures with joint image-video training and improved conditional sampling, delivering first large-scale text-to-video results and state-of-the-art performance on video prediction and unconditional generation benchmarks.

Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models

cs.RO · 2026-04-20 · unverdicted · novelty 6.0

State-of-the-art vision-language-action models catastrophically fail dynamic embodied reasoning due to lexical-kinematic shortcuts, behavioral inertia, and semantic feature collapse caused by architectural bottlenecks, as shown by the new BeTTER benchmark with real-world validation.

ConvFormer3D-TAP: Phase/Uncertainty-Aware Front-End Fusion for Cine CMR View Classification Pipelines

cs.CV · 2026-04-13 · unverdicted · novelty 5.0

ConvFormer3D-TAP classifies six cine CMR views at 96% accuracy using 3D conv tokenization, multiscale attention, and uncertainty-aware multi-clip fusion on 150k sequences.

VAGNet: Vision-based Accident Anticipation with Global Features

cs.CV · 2026-04-10 · unverdicted · novelty 4.0

VAGNet anticipates accidents in dashcam videos using global features from VideoMAE-V2 combined with transformers and graphs, reporting higher average precision and mean time-to-accident on four benchmarks while running more efficiently than prior methods.

citing papers explorer

Showing 5 of 5 citing papers.

TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions cs.CV · 2026-04-30 · unverdicted · none · ref 10
TransVLM formalizes Shot Transition Detection as identifying full temporal transition segments rather than single cut points and introduces a VLM that injects optical flow as a motion prior via simple feature fusion, plus a synthetic data engine and benchmark.
Video Diffusion Models cs.CV · 2022-04-07 · unverdicted · none · ref 8
A diffusion model for video generation extends image architectures with joint image-video training and improved conditional sampling, delivering first large-scale text-to-video results and state-of-the-art performance on video prediction and unconditional generation benchmarks.
Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models cs.RO · 2026-04-20 · unverdicted · none · ref 1
State-of-the-art vision-language-action models catastrophically fail dynamic embodied reasoning due to lexical-kinematic shortcuts, behavioral inertia, and semantic feature collapse caused by architectural bottlenecks, as shown by the new BeTTER benchmark with real-world validation.
ConvFormer3D-TAP: Phase/Uncertainty-Aware Front-End Fusion for Cine CMR View Classification Pipelines cs.CV · 2026-04-13 · unverdicted · none · ref 31
ConvFormer3D-TAP classifies six cine CMR views at 96% accuracy using 3D conv tokenization, multiscale attention, and uncertainty-aware multi-clip fusion on 150k sequences.
VAGNet: Vision-based Accident Anticipation with Global Features cs.CV · 2026-04-10 · unverdicted · none · ref 30
VAGNet anticipates accidents in dashcam videos using global features from VideoMAE-V2 combined with transformers and graphs, reporting higher average precision and mean time-to-accident on four benchmarks while running more efficiently than prior methods.

Quo vadis, action recognition? a new model and the kinetics dataset

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer