A hybrid pipeline of OSGNet candidate generation followed by MLLM reranking secured first place in both the Natural Language Queries and GoalStep tracks of the Ego4D Episodic Memory Challenge.
Time-r1: Post-training large vision language model for temporal video grounding.Advances in Neural Information Processing Systems, 38:83330–83364, 2026
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
OSGNet with MLLM Reranking @ Ego4D Episodic Memory Challenge 2026
A hybrid pipeline of OSGNet candidate generation followed by MLLM reranking secured first place in both the Natural Language Queries and GoalStep tracks of the Ego4D Episodic Memory Challenge.