AuthorityBench shows citation presence (real or fabricated) increases LLM hallucination rates vs no-citation baseline, strongest for fabricated citations on true claims, with domain variation but negligible venue or author effects.
and Henderson, Peter and Ho, Daniel E
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
Introduces EURO-5K dataset from 136 EU acts and benchmarks full fine-tuning vs QLoRA for BERT and LLM models on reporting obligation extraction, reporting 0.89 F1 with limited gains from legal pretraining except under parameter-efficient adaptation.
UA-Legal-Bench is a new five-task benchmark for Ukrainian legal reasoning that demonstrates task-dependent few-shot prompting effects and the need for macro-F1 over accuracy on imbalanced classes.
LexRubric is a rubric-based benchmark containing 649 instances and 12,337 atomic criteria for diagnostic evaluation of LLMs on open-ended Chinese legal consultation and judicial examination tasks across 14 scenarios.
citing papers explorer
-
Authority, Truth, and Citation Bias: A Large-Scale Multi-Domain Benchmark for Studying Epistemic Susceptibility in Large Language Models
AuthorityBench shows citation presence (real or fabricated) increases LLM hallucination rates vs no-citation baseline, strongest for fabricated citations on true claims, with domain variation but negligible venue or author effects.
-
EURO-5K: When Does Domain Pretraining Matter? Benchmarking Transformers for EU Reporting Obligation Extraction
Introduces EURO-5K dataset from 136 EU acts and benchmarks full fine-tuning vs QLoRA for BERT and LLM models on reporting obligation extraction, reporting 0.89 F1 with limited gains from legal pretraining except under parameter-efficient adaptation.
-
UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning
UA-Legal-Bench is a new five-task benchmark for Ukrainian legal reasoning that demonstrates task-dependent few-shot prompting effects and the need for macro-F1 over accuracy on imbalanced classes.
-
LexRubric: A Rubric-Guided Diagnostic Benchmark for Open-Ended Legal Tasks
LexRubric is a rubric-based benchmark containing 649 instances and 12,337 atomic criteria for diagnostic evaluation of LLMs on open-ended Chinese legal consultation and judicial examination tasks across 14 scenarios.