SplitQ improves low-bit PTQ for VLMs by isolating modality-specific outlier channels via MOCD and applying dual-branch adaptive calibration via ACC, outperforming prior methods on six datasets across W4A8 to W3A2 settings.
Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
VLMs frequently switch away from a target visual path to nearby similar distractors in controlled tracing tasks, with standard scaling, reasoning, and instruction interventions providing only partial mitigation.
citing papers explorer
-
Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models
SplitQ improves low-bit PTQ for VLMs by isolating modality-specific outlier channels via MOCD and applying dual-branch adaptive calibration via ACC, outperforming prior methods on six datasets across W4A8 to W3A2 settings.
-
VLMs Trace Without Tracking: Diagnosing Failures in Visual Path Following
VLMs frequently switch away from a target visual path to nearby similar distractors in controlled tracing tasks, with standard scaling, reasoning, and instruction interventions providing only partial mitigation.