Mmlongbench: Benchmarking long-context vision-language models effectively and thoroughly

Zhaowei Wang, Wenhao Yu, Xiyu Ren, Jipeng Zhang, Yu Zhao, Rohit Saxena, Liang Cheng, Ginny Wong, Simon See, Pasquale Minervini, Yangqiu Song, Mark Steedman · 2025 · arXiv 2505.10610

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

dataset 3

citation-polarity summary

use dataset 3

representative citing papers

MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

MemLens benchmark shows long-context LVLMs lose accuracy with length while memory agents lose visual fidelity, with multi-session reasoning below 30% for most systems and neither approach solving the task alone.

PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

cs.AI · 2026-03-24 · unverdicted · novelty 7.0

PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Continued pre-training with balanced long-document VQA data extends a 7B LVLM to 128K context, improving long-document VQA by 7.1% and generalizing to 512K without further training.

MMCL-Bench: Multimodal Context Learning from Visual Rules, Procedures, and Evidence

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

MMCL-Bench shows that even the strongest frontier multimodal models solve fewer than one-third of tasks requiring recovery and application of visual rules, procedures, and empirical patterns.

Seed1.8 Model Card: Towards Generalized Real-World Agency

cs.AI · 2026-03-21 · unverdicted · novelty 5.0

Seed1.8 is a new foundation model that adds unified agentic capabilities for search, code execution, and GUI interaction to existing LLM and vision strengths.

Internalized Reasoning for Long-Context Visual Document Understanding

cs.CV · 2026-03-31

citing papers explorer

Showing 6 of 6 citing papers.

MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models cs.CV · 2026-05-14 · unverdicted · none · ref 11
MemLens benchmark shows long-context LVLMs lose accuracy with length while memory agents lose visual fidelity, with multi-session reasoning below 30% for most systems and neither approach solving the task alone.
PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments cs.AI · 2026-03-24 · unverdicted · none · ref 59
PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.
Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context cs.CV · 2026-05-13 · unverdicted · none · ref 5
Continued pre-training with balanced long-document VQA data extends a 7B LVLM to 128K context, improving long-document VQA by 7.1% and generalizing to 512K without further training.
MMCL-Bench: Multimodal Context Learning from Visual Rules, Procedures, and Evidence cs.CV · 2026-05-12 · unverdicted · none · ref 5
MMCL-Bench shows that even the strongest frontier multimodal models solve fewer than one-third of tasks requiring recovery and application of visual rules, procedures, and empirical patterns.
Seed1.8 Model Card: Towards Generalized Real-World Agency cs.AI · 2026-03-21 · unverdicted · none · ref 74
Seed1.8 is a new foundation model that adds unified agentic capabilities for search, code execution, and GUI interaction to existing LLM and vision strengths.
Internalized Reasoning for Long-Context Visual Document Understanding cs.CV · 2026-03-31 · unreviewed · ref 49

Mmlongbench: Benchmarking long-context vision-language models effectively and thoroughly

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer