SpaceDG is the first large-scale benchmark dataset (~1M QA pairs) simulating nine visual degradations in 3DGS-rendered scenes to measure and improve spatial intelligence robustness in MLLMs.
Vlmevalkit: An open-source toolkit for evaluating large multi-modality models
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4roles
background 2polarities
background 2representative citing papers
TimeSeriesExamAgent combines templates and LLM agents to generate scalable time series reasoning benchmarks, demonstrating that current LLMs have limited performance on both abstract and domain-specific tasks.
KVCapsule compresses KV cache in VLMs by 60% to deliver up to 2x higher tokens-per-second and 2.4x memory reduction with negligible accuracy loss.
citing papers explorer
-
SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation
SpaceDG is the first large-scale benchmark dataset (~1M QA pairs) simulating nine visual degradations in 3DGS-rendered scenes to measure and improve spatial intelligence robustness in MLLMs.
-
TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale
TimeSeriesExamAgent combines templates and LLM agents to generate scalable time series reasoning benchmarks, demonstrating that current LLMs have limited performance on both abstract and domain-specific tasks.
-
KVCapsule: Efficient Sequential KV Cache Compression for Vision-Language Models with Asymmetric Redundancy
KVCapsule compresses KV cache in VLMs by 60% to deliver up to 2x higher tokens-per-second and 2.4x memory reduction with negligible accuracy loss.
- Internalized Reasoning for Long-Context Visual Document Understanding