Eyes wide shut? exploring the visual shortcomings of multimodal llms

Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, Saining Xie · 2024

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

citation-role summary

background 1 dataset 1 method 1

citation-polarity summary

background 1 use dataset 1 use method 1

representative citing papers

Forest Before Trees: Latent Superposition for Efficient Visual Reasoning

cs.CL · 2026-01-11 · unverdicted · novelty 7.0

Laser reformulates visual reasoning via Dynamic Windowed Alignment Learning to maintain latent superposition of global features, delivering 5.03% average gains over Monet and over 97% fewer inference tokens on six benchmarks.

How Can AI Augment Access to Justice? Public Defenders' Perspectives on AI Adoption

cs.CY · 2025-10-27 · accept · novelty 7.0

Public defenders view AI as most useful for evidence investigation but limited in courtroom work and strategy, with adoption blocked by costs, confidentiality risks, and norms, requiring human oversight and open development.

Discovering Failure Modes in Vision-Language Models using RL

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

An RL-based questioner agent adaptively generates queries to discover novel failure modes in VLMs without human intervention.

Seed1.8 Model Card: Towards Generalized Real-World Agency

cs.AI · 2026-03-21 · unverdicted · novelty 5.0

Seed1.8 is a new foundation model that adds unified agentic capabilities for search, code execution, and GUI interaction to existing LLM and vision strengths.

LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models

cs.CV · 2025-05-21 · unverdicted · novelty 5.0

LENS is a new multi-level benchmark dataset for evaluating MLLMs on perception-to-reasoning tasks using the same images across all levels with recent social media content.

Seed1.5-VL Technical Report

cs.CV · 2025-05-11 · unverdicted · novelty 4.0

Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.

citing papers explorer

Showing 6 of 6 citing papers.

Forest Before Trees: Latent Superposition for Efficient Visual Reasoning cs.CL · 2026-01-11 · unverdicted · none · ref 39
Laser reformulates visual reasoning via Dynamic Windowed Alignment Learning to maintain latent superposition of global features, delivering 5.03% average gains over Monet and over 97% fewer inference tokens on six benchmarks.
How Can AI Augment Access to Justice? Public Defenders' Perspectives on AI Adoption cs.CY · 2025-10-27 · accept · none · ref 106
Public defenders view AI as most useful for evidence investigation but limited in courtroom work and strategy, with adoption blocked by costs, confidentiality risks, and norms, requiring human oversight and open development.
Discovering Failure Modes in Vision-Language Models using RL cs.CV · 2026-04-06 · unverdicted · none · ref 25
An RL-based questioner agent adaptively generates queries to discover novel failure modes in VLMs without human intervention.
Seed1.8 Model Card: Towards Generalized Real-World Agency cs.AI · 2026-03-21 · unverdicted · none · ref 66
Seed1.8 is a new foundation model that adds unified agentic capabilities for search, code execution, and GUI interaction to existing LLM and vision strengths.
LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models cs.CV · 2025-05-21 · unverdicted · none · ref 41
LENS is a new multi-level benchmark dataset for evaluating MLLMs on perception-to-reasoning tasks using the same images across all levels with recent social media content.
Seed1.5-VL Technical Report cs.CV · 2025-05-11 · unverdicted · none · ref 134
Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.

Eyes wide shut? exploring the visual shortcomings of multimodal llms

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer