MMEB-V3 benchmark shows omni-modality embedding models fail to enforce instruction-specified modality constraints and exhibit asymmetric, query-biased retrieval.
Peerqa: A scientific question answering dataset from peer reviews
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
PaperMind is a new benchmark that evaluates integrated multimodal reasoning and critique over scientific papers through four complementary task families across seven domains.
RPC-Bench supplies 15K verified QA pairs and a research-flow taxonomy that shows top foundation models still achieve only 68.2 percent correctness-completeness on academic paper comprehension.
citing papers explorer
-
MMEB-V3: Measuring the Performance Gaps of Omni-Modality Embedding Models
MMEB-V3 benchmark shows omni-modality embedding models fail to enforce instruction-specified modality constraints and exhibit asymmetric, query-biased retrieval.
-
PaperMind: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs
PaperMind is a new benchmark that evaluates integrated multimodal reasoning and critique over scientific papers through four complementary task families across seven domains.
-
RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension
RPC-Bench supplies 15K verified QA pairs and a research-flow taxonomy that shows top foundation models still achieve only 68.2 percent correctness-completeness on academic paper comprehension.