QSTRBench is a new benchmark evaluating LLMs on compositional reasoning, converse relations, and conceptual neighbourhoods across QSTR calculi including a newly published RCC-22 CN, showing models exceed chance but fail to achieve consistent correctness.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
Empirical study shows LLM inference backends can shift benchmark scores by up to 16.6 percentage points and cause output disagreements due to optimizations like prefix caching and custom kernels.
GitHub Actions workflows achieve only 28% overall compliance with best practices, with LLMs enabling an 81% reduction in verification effort via hybrid adjudication but still requiring expert oversight for security judgments.
Generative AI use in science can be governed through structured documentation and provenance capture by framing AI interactions as inspectable Research Objects rather than debating authorship.
citing papers explorer
-
QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi
QSTRBench is a new benchmark evaluating LLMs on compositional reasoning, converse relations, and conceptual neighbourhoods across QSTR calculi including a newly published RCC-22 CN, showing models exceed chance but fail to achieve consistent correctness.
-
The Silent Hyperparameter: Quantifying the Impact of Inference Backends on LLM Reproducibility
Empirical study shows LLM inference backends can shift benchmark scores by up to 16.6 percentage points and cause output disagreements due to optimizations like prefix caching and custom kernels.
-
How Compliant Are GitHub Actions Workflows? A Checklist-Based Study with LLM-Assisted Auditing
GitHub Actions workflows achieve only 28% overall compliance with best practices, with LLMs enabling an 81% reduction in verification effort via hybrid adjudication but still requiring expert oversight for security judgments.
-
Inspectable AI for Science: A Research Object Approach to Generative AI Governance
Generative AI use in science can be governed through structured documentation and provenance capture by framing AI interactions as inspectable Research Objects rather than debating authorship.