Introduces the Reading in the Wild dataset and a flexible transformer model using egocentric RGB, eye gaze, and head pose modalities to recognize reading activity in diverse real-world scenarios.
Quo vadis, action recognition? a new model and the kinetics dataset
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 6verdicts
UNVERDICTED 6roles
background 1polarities
background 1representative citing papers
Introduces SIA-RAPN benchmark of 50 clinical videos with 12 fine-grained renorrhaphy action labels and evaluates four temporal segmentation models, with DiffAct leading on most metrics.
4D-GSW introduces a kinematic-aware spatio-temporal watermarking framework for 4D Gaussian Splatting that uses a Spatio-Temporal Curvature metric and HMM-MRF model to maintain consistency under attacks.
AVIS applies autoregressive diffusion models to video inverse problems by streaming restoration with measurement-consistent initialization, reducing latency from 114s to 4s and raising throughput to 1.18 FPS (or 5.91 FPS in the Flash variant).
SyncDPO improves temporal synchronization in video-audio joint generation using DPO with efficient on-the-fly negative sample construction and curriculum learning.
MAST with spiking neural networks achieves 93.14% mean accuracy detecting AI-generated videos from 10 unseen generators by exploiting smoother pixel residuals and compact semantic trajectories.
citing papers explorer
-
Reading Recognition in the Wild
Introduces the Reading in the Wild dataset and a flexible transformer model using egocentric RGB, eye gaze, and head pose modalities to recognize reading activity in diverse real-world scenarios.
-
Fine-Grained Action Segmentation for Renorrhaphy in Robot-Assisted Partial Nephrectomy
Introduces SIA-RAPN benchmark of 50 clinical videos with 12 fine-grained renorrhaphy action labels and evaluates four temporal segmentation models, with DiffAct leading on most metrics.
-
4D-GSW: Kinematic-Aware Spatio-Temporal Consistent Watermarking for 4D Gaussian Splatting
4D-GSW introduces a kinematic-aware spatio-temporal watermarking framework for 4D Gaussian Splatting that uses a Spatio-Temporal Curvature metric and HMM-MRF model to maintain consistency under attacks.
-
Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models
AVIS applies autoregressive diffusion models to video inverse problems by streaming restoration with measurement-consistent initialization, reducing latency from 114s to 4s and raising throughput to 1.18 FPS (or 5.91 FPS in the Flash variant).
-
SyncDPO: Enhancing Temporal Synchronization in Video-Audio Joint Generation via Preference Learning
SyncDPO improves temporal synchronization in video-audio joint generation using DPO with efficient on-the-fly negative sample construction and curriculum learning.
-
Detecting AI-Generated Videos with Spiking Neural Networks
MAST with spiking neural networks achieves 93.14% mean accuracy detecting AI-generated videos from 10 unseen generators by exploiting smoother pixel residuals and compact semantic trajectories.