LLaV A-onevision: Easy visual task transfer.Transactions on Machine Learning Research

Bo Li, Yuanhan Zhang, Dong Guo, Renrui Zhang, Feng Li, Hao Zhang, Kaichen Zhang, Peiyuan Zhang, Yanwei Li, Ziwei Liu, Chunyuan Li

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

When Negation Is a Geometry Problem in Vision-Language Models

cs.CV · 2026-03-20 · conditional · novelty 7.0

A direction associated with negation exists in CLIP embedding space and can be steered at test time via representation engineering to produce negation-aware outputs without fine-tuning.

SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses

cs.CV · 2026-02-26 · conditional · novelty 7.0

SUPERGLASSES is the first VQA benchmark built from actual smart glasses data, and SUPERLENS is an agent using automatic object detection, query decoupling, and multimodal search that outperforms GPT-4o by 2.19% on it.

Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

Proxy3D generates efficient 3D proxy representations via semantic clustering from video frames and aligns them to VLMs through multi-stage training on the new SpaceSpan dataset, achieving competitive performance on 3D VQA, grounding, and spatial benchmarks with shorter sequences.

EvoComp: Learning Visual Token Compression for Multimodal Large Language Models via Semantic-Guided Evolutionary Labeling

cs.CV · 2026-04-18 · unverdicted · novelty 5.0

EvoComp compresses visual tokens in MLLMs by 3x while retaining 99.3% accuracy via an evolutionary labeling strategy that searches for low-loss, semantically diverse token subsets.

citing papers explorer

Showing 4 of 4 citing papers.

When Negation Is a Geometry Problem in Vision-Language Models cs.CV · 2026-03-20 · conditional · none · ref 11
A direction associated with negation exists in CLIP embedding space and can be steered at test time via representation engineering to produce negation-aware outputs without fine-tuning.
SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses cs.CV · 2026-02-26 · conditional · none · ref 25
SUPERGLASSES is the first VQA benchmark built from actual smart glasses data, and SUPERLENS is an agent using automatic object detection, query decoupling, and multimodal search that outperforms GPT-4o by 2.19% on it.
Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment cs.CV · 2026-05-08 · unverdicted · none · ref 21
Proxy3D generates efficient 3D proxy representations via semantic clustering from video frames and aligns them to VLMs through multi-stage training on the new SpaceSpan dataset, achieving competitive performance on 3D VQA, grounding, and spatial benchmarks with shorter sequences.
EvoComp: Learning Visual Token Compression for Multimodal Large Language Models via Semantic-Guided Evolutionary Labeling cs.CV · 2026-04-18 · unverdicted · none · ref 22
EvoComp compresses visual tokens in MLLMs by 3x while retaining 99.3% accuracy via an evolutionary labeling strategy that searches for low-loss, semantically diverse token subsets.

LLaV A-onevision: Easy visual task transfer.Transactions on Machine Learning Research

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer