pith. sign in

Chain-of-thought prompting obscures hallucination cues in large language models: An empirical evaluation

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

years

2026 4 2025 1

verdicts

UNVERDICTED 5

roles

background 2

polarities

background 2

representative citing papers

DeonticBench: A Benchmark for Reasoning over Rules

cs.CL · 2026-04-06 · unverdicted · novelty 7.0

DEONTICBENCH is a new benchmark of 6,232 deontic reasoning tasks from U.S. legal domains where frontier LLMs reach only ~45% accuracy and symbolic Prolog assistance plus RL training still fail to solve tasks reliably.

citing papers explorer

Showing 5 of 5 citing papers.