Quo vadis, action recognition? a new model and the kinetics dataset

Joao Carreira, Andrew Zisserman · 2017

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Reading Recognition in the Wild

cs.CV · 2025-05-30 · unverdicted · novelty 8.0

Introduces the Reading in the Wild dataset and a flexible transformer model using egocentric RGB, eye gaze, and head pose modalities to recognize reading activity in diverse real-world scenarios.

Fine-Grained Action Segmentation for Renorrhaphy in Robot-Assisted Partial Nephrectomy

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

Introduces SIA-RAPN benchmark of 50 clinical videos with 12 fine-grained renorrhaphy action labels and evaluates four temporal segmentation models, with DiffAct leading on most metrics.

4D-GSW: Kinematic-Aware Spatio-Temporal Consistent Watermarking for 4D Gaussian Splatting

cs.CV · 2026-05-21 · unverdicted · novelty 6.0

4D-GSW introduces a kinematic-aware spatio-temporal watermarking framework for 4D Gaussian Splatting that uses a Spatio-Temporal Curvature metric and HMM-MRF model to maintain consistency under attacks.

Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

AVIS applies autoregressive diffusion models to video inverse problems by streaming restoration with measurement-consistent initialization, reducing latency from 114s to 4s and raising throughput to 1.18 FPS (or 5.91 FPS in the Flash variant).

SyncDPO: Enhancing Temporal Synchronization in Video-Audio Joint Generation via Preference Learning

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

SyncDPO improves temporal synchronization in video-audio joint generation using DPO with efficient on-the-fly negative sample construction and curriculum learning.

Detecting AI-Generated Videos with Spiking Neural Networks

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

MAST with spiking neural networks achieves 93.14% mean accuracy detecting AI-generated videos from 10 unseen generators by exploiting smoother pixel residuals and compact semantic trajectories.

citing papers explorer

Showing 6 of 6 citing papers.

Reading Recognition in the Wild cs.CV · 2025-05-30 · unverdicted · none · ref 7
Introduces the Reading in the Wild dataset and a flexible transformer model using egocentric RGB, eye gaze, and head pose modalities to recognize reading activity in diverse real-world scenarios.
Fine-Grained Action Segmentation for Renorrhaphy in Robot-Assisted Partial Nephrectomy cs.CV · 2026-04-10 · unverdicted · none · ref 6
Introduces SIA-RAPN benchmark of 50 clinical videos with 12 fine-grained renorrhaphy action labels and evaluates four temporal segmentation models, with DiffAct leading on most metrics.
4D-GSW: Kinematic-Aware Spatio-Temporal Consistent Watermarking for 4D Gaussian Splatting cs.CV · 2026-05-21 · unverdicted · none · ref 4
4D-GSW introduces a kinematic-aware spatio-temporal watermarking framework for 4D Gaussian Splatting that uses a Spatio-Temporal Curvature metric and HMM-MRF model to maintain consistency under attacks.
Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models cs.CV · 2026-05-20 · unverdicted · none · ref 64
AVIS applies autoregressive diffusion models to video inverse problems by streaming restoration with measurement-consistent initialization, reducing latency from 114s to 4s and raising throughput to 1.18 FPS (or 5.91 FPS in the Flash variant).
SyncDPO: Enhancing Temporal Synchronization in Video-Audio Joint Generation via Preference Learning cs.CV · 2026-05-12 · unverdicted · none · ref 6
SyncDPO improves temporal synchronization in video-audio joint generation using DPO with efficient on-the-fly negative sample construction and curriculum learning.
Detecting AI-Generated Videos with Spiking Neural Networks cs.CV · 2026-05-07 · unverdicted · none · ref 7
MAST with spiking neural networks achieves 93.14% mean accuracy detecting AI-generated videos from 10 unseen generators by exploiting smoother pixel residuals and compact semantic trajectories.

Quo vadis, action recognition? a new model and the kinetics dataset

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer