Peerqa: A scientific question answering dataset from peer reviews

Peerqa: A scientific question answering dataset from peer reviews · 2009 · arXiv 2502.13668

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

MMEB-V3: Measuring the Performance Gaps of Omni-Modality Embedding Models

cs.IR · 2026-04-25 · unverdicted · novelty 7.0

MMEB-V3 benchmark shows omni-modality embedding models fail to enforce instruction-specified modality constraints and exhibit asymmetric, query-biased retrieval.

PaperMind: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs

cs.IR · 2026-04-23 · unverdicted · novelty 7.0

PaperMind is a new benchmark that evaluates integrated multimodal reasoning and critique over scientific papers through four complementary task families across seven domains.

RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension

cs.CL · 2026-01-14 · conditional · novelty 6.0

RPC-Bench supplies 15K verified QA pairs and a research-flow taxonomy that shows top foundation models still achieve only 68.2 percent correctness-completeness on academic paper comprehension.

citing papers explorer

Showing 3 of 3 citing papers.

MMEB-V3: Measuring the Performance Gaps of Omni-Modality Embedding Models cs.IR · 2026-04-25 · unverdicted · none · ref 1
MMEB-V3 benchmark shows omni-modality embedding models fail to enforce instruction-specified modality constraints and exhibit asymmetric, query-biased retrieval.
PaperMind: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs cs.IR · 2026-04-23 · unverdicted · none · ref 58
PaperMind is a new benchmark that evaluates integrated multimodal reasoning and critique over scientific papers through four complementary task families across seven domains.
RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension cs.CL · 2026-01-14 · conditional · none · ref 1
RPC-Bench supplies 15K verified QA pairs and a research-flow taxonomy that shows top foundation models still achieve only 68.2 percent correctness-completeness on academic paper comprehension.

Peerqa: A scientific question answering dataset from peer reviews

fields

years

verdicts

representative citing papers

citing papers explorer