A proposer-solver agent pair achieves supervised-level video temporal grounding and fine-grained captioning from 2.5K unlabeled videos via self-reinforcing evolution.
Matryoshka multimodal models
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5representative citing papers
GraSP-VL turns frozen VLM embedding length into a controllable semantic granularity interface via a learned shared prefix transform that creates a Semantic Matryoshka structure.
WindowQuant performs window-adaptive mixed-precision KV cache quantization guided by similarity to the text prompt, with reordering to enable efficient inference in VLMs.
HAWK is a training-free method that prunes over 80% of visual tokens in MLLMs while retaining 96% accuracy by using head importance weights and text-guided attention to select task-relevant tokens.
citing papers explorer
-
EvoGround: Self-Evolving Video Agents for Video Temporal Grounding
A proposer-solver agent pair achieves supervised-level video temporal grounding and fine-grained captioning from 2.5K unlabeled videos via self-reinforcing evolution.
-
GraSP-VL: Length as a Semantic Granularity Interface for Vision-Language Representations
GraSP-VL turns frozen VLM embedding length into a controllable semantic granularity interface via a learned shared prefix transform that creates a Semantic Matryoshka structure.
-
WindowQuant: Mixed-Precision KV Cache Quantization based on Window-Level Similarity for VLMs Inference Optimization
WindowQuant performs window-adaptive mixed-precision KV cache quantization guided by similarity to the text prompt, with reordering to enable efficient inference in VLMs.
-
HAWK: Head Importance-Aware Visual Token Pruning in Multimodal Models
HAWK is a training-free method that prunes over 80% of visual tokens in MLLMs while retaining 96% accuracy by using head importance weights and text-guided attention to select task-relevant tokens.
- MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining