Future transformer for long-term action anticipation

Dayoung Gong, Joonseok Lee, Manjin Kim, Seong Jong Ha, Minsu Cho · 2022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

ProcObject-10K: Benchmarking Object-Centric Procedural Understanding in Instructional Videos

cs.CV · 2025-12-03 · conditional · novelty 7.0

ProcObject-10K is the first benchmark for object-centric procedural reasoning in videos that exposes a large gap where models answer questions plausibly but fail to ground their answers in the correct video segments.

EggHand: A Multimodal Foundation Model for Egocentric Hand Pose Forecasting

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

EggHand unifies VLA action decoding with viewpoint-aware video-text encoding to forecast egocentric hand poses, achieving SOTA accuracy on EgoExo4D while remaining robust to ego-motion and controllable via language prompts.

citing papers explorer

Showing 2 of 2 citing papers.

ProcObject-10K: Benchmarking Object-Centric Procedural Understanding in Instructional Videos cs.CV · 2025-12-03 · conditional · none · ref 11
ProcObject-10K is the first benchmark for object-centric procedural reasoning in videos that exposes a large gap where models answer questions plausibly but fail to ground their answers in the correct video segments.
EggHand: A Multimodal Foundation Model for Egocentric Hand Pose Forecasting cs.CV · 2026-05-08 · unverdicted · none · ref 17
EggHand unifies VLA action decoding with viewpoint-aware video-text encoding to forecast egocentric hand poses, achieving SOTA accuracy on EgoExo4D while remaining robust to ego-motion and controllable via language prompts.

Future transformer for long-term action anticipation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer