Audit-then-Score evolves factuality benchmarks through verifier-auditor disputes, raising expert accuracy from 60.8% to 90.9% and yielding a new verification agent that outperforms prior methods on deep research reports.
question refinement
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DeepFact: Co-Evolving Benchmarks and Agents for Deep Research Factuality
Audit-then-Score evolves factuality benchmarks through verifier-auditor disputes, raising expert accuracy from 60.8% to 90.9% and yielding a new verification agent that outperforms prior methods on deep research reports.