Title resolution pending

Robert E Blackwell, Jon Barry, Anthony G · 2024 · arXiv 2410.03492

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

cs.AI · 2026-05-18 · accept · novelty 8.0

QSTRBench is a new benchmark evaluating LLMs on compositional reasoning, converse relations, and conceptual neighbourhoods across QSTR calculi including a newly published RCC-22 CN, showing models exceed chance but fail to achieve consistent correctness.

The Silent Hyperparameter: Quantifying the Impact of Inference Backends on LLM Reproducibility

cs.LG · 2026-05-19 · unverdicted · novelty 6.0 · 2 refs

Empirical study shows LLM inference backends can shift benchmark scores by up to 16.6 percentage points and cause output disagreements due to optimizations like prefix caching and custom kernels.

How Compliant Are GitHub Actions Workflows? A Checklist-Based Study with LLM-Assisted Auditing

cs.SE · 2026-05-03 · accept · novelty 6.0

GitHub Actions workflows achieve only 28% overall compliance with best practices, with LLMs enabling an 81% reduction in verification effort via hybrid adjudication but still requiring expert oversight for security judgments.

Inspectable AI for Science: A Research Object Approach to Generative AI Governance

cs.AI · 2026-04-13 · conditional · novelty 5.0

Generative AI use in science can be governed through structured documentation and provenance capture by framing AI interactions as inspectable Research Objects rather than debating authorship.

citing papers explorer

Showing 4 of 4 citing papers.

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi cs.AI · 2026-05-18 · accept · none · ref 67
QSTRBench is a new benchmark evaluating LLMs on compositional reasoning, converse relations, and conceptual neighbourhoods across QSTR calculi including a newly published RCC-22 CN, showing models exceed chance but fail to achieve consistent correctness.
The Silent Hyperparameter: Quantifying the Impact of Inference Backends on LLM Reproducibility cs.LG · 2026-05-19 · unverdicted · none · ref 18 · 2 links
Empirical study shows LLM inference backends can shift benchmark scores by up to 16.6 percentage points and cause output disagreements due to optimizations like prefix caching and custom kernels.
How Compliant Are GitHub Actions Workflows? A Checklist-Based Study with LLM-Assisted Auditing cs.SE · 2026-05-03 · accept · none · ref 5
GitHub Actions workflows achieve only 28% overall compliance with best practices, with LLMs enabling an 81% reduction in verification effort via hybrid adjudication but still requiring expert oversight for security judgments.
Inspectable AI for Science: A Research Object Approach to Generative AI Governance cs.AI · 2026-04-13 · conditional · none · ref 5
Generative AI use in science can be governed through structured documentation and provenance capture by framing AI interactions as inspectable Research Objects rather than debating authorship.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer