ST-GridPool improves video LLM performance via hierarchical temporal gridding and norm-based spatial pooling on visual tokens without training.
Ts-llava: Constructing vi- sual tokens through thumbnail-and-sampling for training-free video large language models.arXiv preprint arXiv:2411.11066,
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
other 1polarities
unclear 1representative citing papers
WindowQuant performs window-adaptive mixed-precision KV cache quantization guided by similarity to the text prompt, with reordering to enable efficient inference in VLMs.
Direct algorithms sample codon sequences from Boltzmann distributions using tensor-based secondary structure free energy models for RNA design under codon constraints.
citing papers explorer
-
Enhancing Visual Token Representations for Video Large Language Models via Training-Free Spatial-Temporal Pooling and Gridding
ST-GridPool improves video LLM performance via hierarchical temporal gridding and norm-based spatial pooling on visual tokens without training.
-
WindowQuant: Mixed-Precision KV Cache Quantization based on Window-Level Similarity for VLMs Inference Optimization
WindowQuant performs window-adaptive mixed-precision KV cache quantization guided by similarity to the text prompt, with reordering to enable efficient inference in VLMs.
-
Direct RNA sequence design under codon constraints using expressive tensor-based secondary structure models
Direct algorithms sample codon sequences from Boltzmann distributions using tensor-based secondary structure free energy models for RNA design under codon constraints.