pith. sign in

inertness

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

years

2026 9 2025 1

verdicts

UNVERDICTED 10

clear filters

representative citing papers

CEO-Bench: Can Agents Play the Long Game?

cs.AI · 2026-06-16 · unverdicted · novelty 6.0

CEO-Bench evaluates AI agents on managing a startup over 500 days, showing that even top models like Claude Opus 4.8 and GPT-5.5 barely maintain starting capital and fail to turn consistent profits.

Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework

cs.AI · 2026-05-23 · unverdicted · novelty 5.0

A multi-dimensional framework with six dimensions (Correctness, Consistency, Robustness, Logical Coherence, Efficiency, Stability) is applied to seven LLMs on 975 items, revealing orthogonality between logical coherence and correctness plus ranking inversions invisible to accuracy metrics.

Business Utility of Large Language Models as Exploratory Data Analysis Agents

cs.CY · 2026-05-08 · unverdicted · novelty 5.0

Evaluation of 15 LLM configurations across four conditions in a supply chain EDA benchmark finds most lack sufficient repeatability for autonomous deployment, with GPT-5.4 at extra-high reasoning effort scoring highest on mean score (0.8748) and proposed Business utility (0.6952).

citing papers explorer

Showing 1 of 1 citing paper after filters.