StackRepoQA shows LLMs reach only moderate accuracy on multi-file Java QA tasks, with gains from graph-based retrieval but frequent reliance on verbatim answer reproduction.
arXiv preprint arXiv:2412.05127 (2024)
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
DORI benchmark shows top vision-language models reach only 54.2% accuracy on coarse orientation tasks and 33% on granular judgments, with sharp drops on reference-frame shifts and compound rotations.
TDA-RC embeds topological patterns from multi-round reasoning into CoT via persistent homology and a repair agent, yielding better accuracy-efficiency trade-offs than ToT or GoT on tested datasets.
citing papers explorer
-
Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering
StackRepoQA shows LLMs reach only moderate accuracy on multi-file Java QA tasks, with gains from graph-based retrieval but frequent reliance on verbatim answer reproduction.
-
Seeing Isn't Orienting: A Cognitively Grounded Benchmark Reveals Systematic Orientation Failures in MLLMs
DORI benchmark shows top vision-language models reach only 54.2% accuracy on coarse orientation tasks and 33% on granular judgments, with sharp drops on reference-frame shifts and compound rotations.
-
TDA-RC: Task-Driven Alignment for Knowledge-Based Reasoning Chains in Large Language Models
TDA-RC embeds topological patterns from multi-round reasoning into CoT via persistent homology and a repair agent, yielding better accuracy-efficiency trade-offs than ToT or GoT on tested datasets.