Introduces the Reading in the Wild dataset and a flexible transformer model using egocentric RGB, eye gaze, and head pose modalities to recognize reading activity in diverse real-world scenarios.
Scaling egocentric vision: The epic-kitchens dataset
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
A large-scale benchmark finds that recent multimodal domain generalization methods give only marginal gains over a plain ERM baseline, with no method winning consistently and all degrading sharply under corruption or missing modalities.
citing papers explorer
-
Reading Recognition in the Wild
Introduces the Reading in the Wild dataset and a flexible transformer model using egocentric RGB, eye gaze, and head pose modalities to recognize reading activity in diverse real-world scenarios.
-
Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study
A large-scale benchmark finds that recent multimodal domain generalization methods give only marginal gains over a plain ERM baseline, with no method winning consistently and all degrading sharply under corruption or missing modalities.