pith. sign in

hub

Big-bench extra hard.arXiv preprint arXiv:2502.19187, 2025

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

hub tools

citation-role summary

dataset 2 background 1

citation-polarity summary

years

2026 3 2025 7

polarities

background 2 support 1

representative citing papers

Agentic Frameworks for Reasoning Tasks: An Empirical Study

cs.AI · 2026-04-17 · unverdicted · novelty 6.0

An empirical evaluation of 22 agentic frameworks on BBH, GSM8K, and ARC benchmarks shows stable performance in 12 frameworks but highlights orchestration failures and weaker mathematical reasoning.

Too long; didn't solve

cs.AI · 2026-04-08 · unverdicted · novelty 5.0

Longer prompts and solutions in a new expert-authored math dataset correlate with higher failure rates across LLMs, with length linked to empirical difficulty after difficulty adjustment.

The Serial Scaling Hypothesis

cs.LG · 2025-07-16 · unverdicted · novelty 5.0

The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.

citing papers explorer

Showing 10 of 10 citing papers.