Samtok: Representing any mask with two words.arXiv preprint arXiv:2601.16093, 2026

Yikang Zhou, Tao Zhang, Dengxian Gong, Yuanzheng Wu, Ye Tian, Haochen Wang, Haobo Yuan, Jiacong Wang, Lu Qi, Hao Fei, et al · 2026 · arXiv 2601.16093

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

PixelEyes: Decoupling Perception and Reasoning for Pinpoint Visual Evidence Seeking

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

PixelEyes decouples reasoning and perception via mask-guided search and semantic BFS, introduces PixelEyes-6K dataset and Pinpoint-Bench benchmark, and open-sources code and models.

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

cs.CV · 2026-06-05 · unverdicted · novelty 4.0

This is a survey that frames video MLLM research via a human-view formulation of perceptual representations, memory states, reasoning traces, and predictions, then reviews methods, datasets, benchmarks, and open problems.

citing papers explorer

Showing 2 of 2 citing papers after filters.

PixelEyes: Decoupling Perception and Reasoning for Pinpoint Visual Evidence Seeking cs.CV · 2026-06-30 · unverdicted · none · ref 52
PixelEyes decouples reasoning and perception via mask-guided search and semantic BFS, introduces PixelEyes-6K dataset and Pinpoint-Bench benchmark, and open-sources code and models.
Watch, Remember, Reason: Human-View Video Understanding with MLLMs cs.CV · 2026-06-05 · unverdicted · none · ref 84
This is a survey that frames video MLLM research via a human-view formulation of perceptual representations, memory states, reasoning traces, and predictions, then reviews methods, datasets, benchmarks, and open problems.

Samtok: Representing any mask with two words.arXiv preprint arXiv:2601.16093, 2026

fields

years

verdicts

representative citing papers

citing papers explorer