RTLC prompting lifts Claude 3.7 Sonnet pairwise accuracy on 350 hard JudgeBench items from 64.6% to 78.6% via a Research-Teach-Critique scaffold that beats self-consistency.
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
With 100 anchors the Bayesian linear corrector matches or beats the Neural-ODE flow on distribution recovery while both fix mean offset; with 1500 anchors the flow wins on MAE, Pearson correlation, and KL divergence.
citing papers explorer
-
RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning
RTLC prompting lifts Claude 3.7 Sonnet pairwise accuracy on 350 hard JudgeBench items from 64.6% to 78.6% via a Research-Teach-Critique scaffold that beats self-consistency.
-
Two Ways to De-Bias an LLM-as-a-Judge: A Continuous-Score Comparison of Hierarchical Bayesian Calibration and Neural-ODE Score Transport
With 100 anchors the Bayesian linear corrector matches or beats the Neural-ODE flow on distribution recovery while both fix mean offset; with 1500 anchors the flow wins on MAE, Pearson correlation, and KL divergence.