MemLens benchmark shows long-context LVLMs lose accuracy with length while memory agents lose visual fidelity, with multi-session reasoning below 30% for most systems and neither approach solving the task alone.
Mmlongbench: Benchmarking long-context vision-language models effectively and thoroughly
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
dataset 3polarities
use dataset 3representative citing papers
PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.
Continued pre-training with balanced long-document VQA data extends a 7B LVLM to 128K context, improving long-document VQA by 7.1% and generalizing to 512K without further training.
MMCL-Bench shows that even the strongest frontier multimodal models solve fewer than one-third of tasks requiring recovery and application of visual rules, procedures, and empirical patterns.
Seed1.8 is a new foundation model that adds unified agentic capabilities for search, code execution, and GUI interaction to existing LLM and vision strengths.
citing papers explorer
-
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models
MemLens benchmark shows long-context LVLMs lose accuracy with length while memory agents lose visual fidelity, with multi-session reasoning below 30% for most systems and neither approach solving the task alone.
-
PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments
PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.
-
Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context
Continued pre-training with balanced long-document VQA data extends a 7B LVLM to 128K context, improving long-document VQA by 7.1% and generalizing to 512K without further training.
-
MMCL-Bench: Multimodal Context Learning from Visual Rules, Procedures, and Evidence
MMCL-Bench shows that even the strongest frontier multimodal models solve fewer than one-third of tasks requiring recovery and application of visual rules, procedures, and empirical patterns.
-
Seed1.8 Model Card: Towards Generalized Real-World Agency
Seed1.8 is a new foundation model that adds unified agentic capabilities for search, code execution, and GUI interaction to existing LLM and vision strengths.
- Internalized Reasoning for Long-Context Visual Document Understanding