RankJudge creates paired multi-turn conversations with isolated single-turn flaws to generate unambiguous benchmarks for LLM-as-a-judge systems across ML, biomedicine, and finance domains.
Deep reinforcement learning from human preferences
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
In high-dimensional analysis, pretrained PCA representations for linear probing generalize best at low dimensionality when pretraining data is plentiful but labeled data scarce, with an exact trade-off showing how much unlabeled data replaces one labeled sample.
citing papers explorer
-
RankJudge: A Multi-Turn LLM-as-a-Judge Synthetic Benchmark Generator
RankJudge creates paired multi-turn conversations with isolated single-turn flaws to generate unambiguous benchmarks for LLM-as-a-judge systems across ML, biomedicine, and finance domains.
-
Optimal Representation Size: High-Dimensional Analysis of Pretraining and Linear Probing
In high-dimensional analysis, pretrained PCA representations for linear probing generalize best at low dimensionality when pretraining data is plentiful but labeled data scarce, with an exact trade-off showing how much unlabeled data replaces one labeled sample.