PGT generates synthetic tasks via geometric overlays on images to supply dense visual supervision, improving spatial and relational understanding in MLLMs by up to 20% on targeted benchmarks.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
A Nash equilibrium framework for training-free multimodal step verification that uses cross-modal agreement and disagreement signals for filtering and ranking reasoning steps.
TaTok is a theoretically grounded adaptive tokenization method that uses global tokens and cumulative conditional entropy filtering to reduce redundancy while improving reconstruction quality over fixed-rate patch tokenization.
citing papers explorer
-
PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs
PGT generates synthetic tasks via geometric overlays on images to supply dense visual supervision, improving spatial and relational understanding in MLLMs by up to 20% on targeted benchmarks.
-
A Nash Equilibrium Framework For Training-Free Multimodal Step Verification
A Nash equilibrium framework for training-free multimodal step verification that uses cross-modal agreement and disagreement signals for filtering and ranking reasoning steps.
-
Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice
TaTok is a theoretically grounded adaptive tokenization method that uses global tokens and cumulative conditional entropy filtering to reduce redundancy while improving reconstruction quality over fixed-rate patch tokenization.