Audit-then-Score evolves factuality benchmarks through verifier-auditor disputes, raising expert accuracy from 60.8% to 90.9% and yielding a new verification agent that outperforms prior methods on deep research reports.
A 2025 study on achievement emotions ([2]) used a mixed- methods explanatory sequential design (ESD) to link qualitative interview data with quantitative regres- sion models
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DeepFact: Co-Evolving Benchmarks and Agents for Deep Research Factuality
Audit-then-Score evolves factuality benchmarks through verifier-auditor disputes, raising expert accuracy from 60.8% to 90.9% and yielding a new verification agent that outperforms prior methods on deep research reports.