RLVR for LLMs tolerates up to 15% verifier noise with validation accuracy within 2 points of clean baselines across three model families and two task domains.
ASurveyonLLM-as-a-Judge
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
The paper releases a benchmark of ten life-insurance contracts, a domain ontology, and 58 evidence-linked scenarios that shows ontology-driven knowledge graph queries produce more consistent and diagnosable gap/overlap results than text-only LLM inference.
citing papers explorer
-
An Imperfect Verifier is Good Enough: Learning with Noisy Rewards
RLVR for LLMs tolerates up to 15% verifier noise with validation accuracy within 2 points of clean baselines across three model families and two task domains.
-
A Benchmark for Gap and Overlap Analysis as a Test of KG Task Readiness
The paper releases a benchmark of ten life-insurance contracts, a domain ontology, and 58 evidence-linked scenarios that shows ontology-driven knowledge graph queries produce more consistent and diagnosable gap/overlap results than text-only LLM inference.