{"paper":{"title":"QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi","license":"http://creativecommons.org/licenses/by/4.0/","headline":"","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Anthony G. Cohn, Robert E. Blackwell","submitted_at":"2026-05-18T13:26:14Z","abstract_excerpt":"We introduce an extensive qualitative spatial and temporal reasoning (QSTR) benchmark for evaluating large language models (LLMs). We pose questions concerning compositional reasoning (using composition tables, CT), converse relations, and conceptual neighbourhoods (CN) for QSTR calculi, Point Algebra (PA), Allen's Interval Algebra, Interval and Duration (INDU), Region Connection Calculus (RCC-5, RCC-8, and RCC-22), the nine intersection model, cardinal direction calculus, and STAR. The RCC-22 CN is published here for the first time. An extended benchmark systematically varies question present"},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"2605.18380","kind":"arxiv","version":1},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.18380/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"citation_quote_validity","ran_at":"2026-05-19T23:50:04.476501Z","status":"skipped","version":"0.1.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T23:33:29.998439Z","status":"skipped","version":"1.0.0","findings_count":0},{"name":"external_links","ran_at":"2026-05-19T23:31:47.113943Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T23:21:58.760952Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"d922268edbb6c9467fb8c533ede938a98f0c6519277cd99d2ca3d19af12b37b7"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}