QSTRBench is a new benchmark evaluating LLMs on compositional reasoning, converse relations, and conceptual neighbourhoods across QSTR calculi including a newly published RCC-22 CN, showing models exceed chance but fail to achieve consistent correctness.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
Empirical study shows LLM inference backends can shift benchmark scores by up to 16.6 percentage points and cause output disagreements due to optimizations like prefix caching and custom kernels.
GitHub Actions workflows achieve only 28% overall compliance with best practices, with LLMs enabling an 81% reduction in verification effort via hybrid adjudication but still requiring expert oversight for security judgments.
Generative AI use in science can be governed through structured documentation and provenance capture by framing AI interactions as inspectable Research Objects rather than debating authorship.
citing papers explorer
-
Inspectable AI for Science: A Research Object Approach to Generative AI Governance
Generative AI use in science can be governed through structured documentation and provenance capture by framing AI interactions as inspectable Research Objects rather than debating authorship.