InProceedings of the 2021 Conference of the North American Chap- ter of the Association for Computational Linguistics (NAACL-HLT)

SpartQA: A Textual Question Answering Benchmark for Spatial Reasoning · 2021 · DOI 10.18653/v1/2021.naacl-main.364

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Lost in Aggregation: A Multi-Scale Diagnostic Benchmark for LLM Spatial Navigation

physics.soc-ph · 2026-06-20 · unverdicted · novelty 7.0

A new diagnostic benchmark decomposes LLM spatial navigation into three cognitive scales and shows that cross-scale aggregation, not single-level deficits, causes failure beyond small mazes.

Einstein World Models

cs.AI · 2026-06-25 · unverdicted · novelty 5.0

Einstein World Models integrate visual rollouts from a callable world-module into LLM reasoning traces to support complex thought beyond language.

citing papers explorer

Showing 2 of 2 citing papers.

Lost in Aggregation: A Multi-Scale Diagnostic Benchmark for LLM Spatial Navigation physics.soc-ph · 2026-06-20 · unverdicted · none · ref 17
A new diagnostic benchmark decomposes LLM spatial navigation into three cognitive scales and shows that cross-scale aggregation, not single-level deficits, causes failure beyond small mazes.
Einstein World Models cs.AI · 2026-06-25 · unverdicted · none · ref 7
Einstein World Models integrate visual rollouts from a callable world-module into LLM reasoning traces to support complex thought beyond language.

InProceedings of the 2021 Conference of the North American Chap- ter of the Association for Computational Linguistics (NAACL-HLT)

fields

years

verdicts

representative citing papers

citing papers explorer