EgoIntrospect provides the first egocentric dataset with self-annotations for internal state tasks and shows multimodal LLMs struggle to infer subjective states from combined signals.
Lifelongmemory: Leveraging llms for answering queries in long-form egocentric videos
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 10roles
background 1polarities
background 1representative citing papers
VSTAT benchmark shows state-of-the-art MLLMs perform far below humans and only modestly above answer-prior baselines on visual state tracking, failing at visual perception despite correct textual reasoning.
Pro²Assist uses multimodal egocentric perception from AR glasses to track fine-grained progress in long-horizon procedural tasks and deliver timely proactive assistance, outperforming baselines by over 21% in action understanding and up to 2.29x in timing accuracy.
VideoThinker uses LLM-generated synthetic tool trajectories in caption space grounded to video frames to train agentic VideoLLMs that outperform baselines on long-video benchmarks.
MemoryCard organizes long videos into self-contained topic-aware Memory Cards that improve long-video QA accuracy by up to 21.8% relative under fixed visual-token budgets.
EgoProx benchmark shows MLLMs have some spatial knowledge but struggle to leverage it for egocentric 3D proximity reasoning VQA.
Introduces Personal VCL formalization and benchmark revealing LMM context gaps, plus an Agentic Context Bank baseline that boosts personalized visual reasoning.
HiCrew improves long-form video question answering on EgoSchema and NExT-QA via a hybrid tree for temporal topology, question-aware captioning, and adaptive multi-agent planning, with gains in temporal and causal reasoning.
CogniGPT uses an interactive loop between a Multi-Granular Perception Agent and an Active Verification Agent to identify reliable clues in long videos with high accuracy and low frame usage.
An edge-deployed multimodal LLM pipeline for online episodic memory QA reaches 51.76% accuracy on an 8 GB GPU and 54.40% on a local server, within 4-5 points of a 56% cloud baseline.
citing papers explorer
No citing papers match the current filters.