AMARIS augments rubric-based RL with long-term evaluation memory and dual retrieval to update rubrics, outperforming baselines across domains with ~5% overhead.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
The paper compiles practical lessons on reproducible LM evaluation and introduces the lm-eval library to mitigate common methodological problems in NLP.
citing papers explorer
-
AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning
AMARIS augments rubric-based RL with long-term evaluation memory and dual retrieval to update rubrics, outperforming baselines across domains with ~5% overhead.
-
Lessons from the Trenches on Reproducible Evaluation of Language Models
The paper compiles practical lessons on reproducible LM evaluation and introduces the lm-eval library to mitigate common methodological problems in NLP.