pith. sign in

hub Mixed citations

Guanhua Zhang and Moritz Hardt

Mixed citation behavior. Most common role is background (67%).

25 Pith papers citing it
Background 67% of classified citations

hub tools

citation-role summary

background 5 other 1

citation-polarity summary

polarities

background 4 unclear 2

clear filters

representative citing papers

Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework

cs.AI · 2026-05-23 · unverdicted · novelty 5.0 · 3 refs

Proposes a multi-dimensional behavioral framework with six dimensions (Correctness, Consistency, Robustness, Local Logical Coherence, Efficiency, Stability) plus deployment-aware aggregation to diagnose LLM reasoning beyond accuracy-based benchmarks.

LLM Benchmark Datasets Should Be Contamination-Resistant

cs.LG · 2026-05-19 · unverdicted · novelty 4.0

Authors call for contamination-resistant LLM benchmarks that exploit Transformer training-inference asymmetry and require new mathematical methods for cross-architecture interoperability.

Measuring AI Reasoning: A Guide for Researchers

cs.AI · 2026-05-04 · unverdicted · novelty 4.0

Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.

citing papers explorer

Showing 20 of 20 citing papers after filters.