Introduces EURO-5K dataset from 136 EU acts and benchmarks full fine-tuning vs QLoRA for BERT and LLM models on reporting obligation extraction, reporting 0.89 F1 with limited gains from legal pretraining except under parameter-efficient adaptation.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
JudgmentBench supplies the first public paired rubric and preference annotations from legal experts on the same LLM outputs, showing comparative judgments outperform rubrics in recovering quality orderings.
citing papers explorer
-
EURO-5K: When Does Domain Pretraining Matter? Benchmarking Transformers for EU Reporting Obligation Extraction
Introduces EURO-5K dataset from 136 EU acts and benchmarks full fine-tuning vs QLoRA for BERT and LLM models on reporting obligation extraction, reporting 0.89 F1 with limited gains from legal pretraining except under parameter-efficient adaptation.
-
JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment
JudgmentBench supplies the first public paired rubric and preference annotations from legal experts on the same LLM outputs, showing comparative judgments outperform rubrics in recovering quality orderings.