POINTS-Long is a dual-mode multimodal large language model that uses dynamic visual token scaling to retain 97.7-99.7% accuracy on long-form tasks with 1/40 to 1/10th the tokens and supports streaming via detachable KV-cache.
Wukong: A 100 million large-scale chinese cross-modal pre-training benchmark.Advances in Neural Information Processing Systems, 35:26418–26431
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
SkyReels-Text enables simultaneous fine-grained editing of multiple text regions in posters using arbitrary glyph patches for font control without labels or test-time fine-tuning.
citing papers explorer
-
POINTS-Long: Adaptive Dual-Mode Visual Reasoning in MLLMs
POINTS-Long is a dual-mode multimodal large language model that uses dynamic visual token scaling to retain 97.7-99.7% accuracy on long-form tasks with 1/40 to 1/10th the tokens and supports streaming via detachable KV-cache.
-
SkyReels-Text: Fine-Grained Font-Controllable Text Editing for Poster Design
SkyReels-Text enables simultaneous fine-grained editing of multiple text regions in posters using arbitrary glyph patches for font control without labels or test-time fine-tuning.