Detectiveqa: Evaluating long-context reasoning on detective novels.arXiv preprint arXiv:2409.02465

Zhe Xu, Jiasheng Ye, Xiaoran Liu, Xiangyang Liu, Tianxiang Sun, Zhigeng Liu, Qipeng Guo, Linlin Li, Qun Liu, Xuanjing Huang, et al · 2025 · arXiv 2409.02465

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs

cs.AI · 2026-04-09 · accept · novelty 7.0

The paper delivers the first survey of abductive reasoning in LLMs, a unified two-stage taxonomy, a compact benchmark, and an analysis of gaps relative to deductive and inductive reasoning.

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

cs.CL · 2025-07-07 · unverdicted · novelty 7.0

MemoryAgentBench is a new multi-turn benchmark assessing four memory competencies in LLM agents—accurate retrieval, test-time learning, long-range understanding, and selective forgetting—showing that existing methods fall short.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions cs.CL · 2025-07-07 · unverdicted · none · ref 43
MemoryAgentBench is a new multi-turn benchmark assessing four memory competencies in LLM agents—accurate retrieval, test-time learning, long-range understanding, and selective forgetting—showing that existing methods fall short.

Detectiveqa: Evaluating long-context reasoning on detective novels.arXiv preprint arXiv:2409.02465

fields

years

verdicts

representative citing papers

citing papers explorer