HPP decouples perception from reasoning in long-video VLMs by having an LLM run iterative programmatic probes on hierarchically segmented video, reporting gains on LongVideoBench, EgoSchema, VideoMME, and MLVU.
Emergent symbolic mechanisms support abstract reasoning in large language models.arXiv preprint arXiv:2502.20332,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Training VLMs to point via text induces serial processing that eliminates binding errors and enables compositional generalization on multi-object tasks.
citing papers explorer
-
HPP: Hierarchical Programmatic Probing for Long Video Understanding by Decoupling Perception and Reasoning
HPP decouples perception from reasoning in long-video VLMs by having an LLM run iterative programmatic probes on hierarchically segmented video, reporting gains on LongVideoBench, EgoSchema, VideoMME, and MLVU.
-
Binding Visual Features Point by Point
Training VLMs to point via text induces serial processing that eliminates binding errors and enables compositional generalization on multi-object tasks.