LiquidTAD distills liquid neural dynamics into a vectorized parallel temporal operator and hierarchical decay sharing to achieve efficient action detection with substantially reduced model size and computation.
Temporalmaxer: Maximize temporal context with only max pooling for temporal action localization
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 5years
2026 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
SkillSpotter raises class-specific mAP from 12.40 to 21.82 and balanced accuracy to 60.40% on Ego-Exo4D by adding adaptive temporal suppression, gated pose fusion, and bidirectional cross-view attention to temporal action detectors.
Gaze-following models on extended 4D-OR and Team-OR datasets reach F1 scores of 0.92 for clinical role prediction and 0.95 for surgical phase recognition while improving team communication detection by over 30%.
Adapts MDVLMs to TAL via planned training objective and step-level IoU reward, reporting gains over autoregressive baselines on ActivityNet and THUMOS datasets.
A new adapter module combining boundary-aware state space modeling with spatial processing boosts localization and robustness in temporal action detection.
citing papers explorer
-
LiquidTAD: Efficient Temporal Action Detection via Parallel Liquid-Inspired Temporal Relaxation
LiquidTAD distills liquid neural dynamics into a vectorized parallel temporal operator and hierarchical decay sharing to achieve efficient action detection with substantially reduced model size and computation.
-
SkillSpotter: Pose-Aware Multi-View Skilled Action Detection and Grading in Ego-Exo Videos
SkillSpotter raises class-specific mAP from 12.40 to 21.82 and balanced accuracy to 60.40% on Ego-Exo4D by adding adaptive temporal suppression, gated pose fusion, and bidirectional cross-view attention to temporal action detectors.
-
Where are they looking in the operating room?
Gaze-following models on extended 4D-OR and Team-OR datasets reach F1 scores of 0.92 for clinical role prediction and 0.95 for surgical phase recognition while improving team communication detection by over 30%.
-
Masked Diffusion Vision-Language Models for Temporal Action Localization
Adapts MDVLMs to TAL via planned training objective and step-level IoU reward, reporting gains over autoregressive baselines on ActivityNet and THUMOS datasets.
-
Efficient Spatial-Temporal Focal Adapter with SSM for Temporal Action Detection
A new adapter module combining boundary-aware state space modeling with spatial processing boosts localization and robustness in temporal action detection.