and Henderson, Peter and Ho, Daniel E

doi: 10 · 2021 · arXiv 2757.346608

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Authority, Truth, and Citation Bias: A Large-Scale Multi-Domain Benchmark for Studying Epistemic Susceptibility in Large Language Models

cs.LG · 2026-06-11 · unverdicted · novelty 7.0

AuthorityBench shows citation presence (real or fabricated) increases LLM hallucination rates vs no-citation baseline, strongest for fabricated citations on true claims, with domain variation but negligible venue or author effects.

EURO-5K: When Does Domain Pretraining Matter? Benchmarking Transformers for EU Reporting Obligation Extraction

cs.CL · 2026-06-02 · unverdicted · novelty 7.0 · 2 refs

Introduces EURO-5K dataset from 136 EU acts and benchmarks full fine-tuning vs QLoRA for BERT and LLM models on reporting obligation extraction, reporting 0.89 F1 with limited gains from legal pretraining except under parameter-efficient adaptation.

UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning

cs.CL · 2026-05-27 · unverdicted · novelty 7.0

UA-Legal-Bench is a new five-task benchmark for Ukrainian legal reasoning that demonstrates task-dependent few-shot prompting effects and the need for macro-F1 over accuracy on imbalanced classes.

LexRubric: A Rubric-Guided Diagnostic Benchmark for Open-Ended Legal Tasks

cs.CL · 2026-06-08 · unverdicted · novelty 6.0

LexRubric is a rubric-based benchmark containing 649 instances and 12,337 atomic criteria for diagnostic evaluation of LLMs on open-ended Chinese legal consultation and judicial examination tasks across 14 scenarios.

citing papers explorer

Showing 4 of 4 citing papers.

Authority, Truth, and Citation Bias: A Large-Scale Multi-Domain Benchmark for Studying Epistemic Susceptibility in Large Language Models cs.LG · 2026-06-11 · unverdicted · none · ref 3
AuthorityBench shows citation presence (real or fabricated) increases LLM hallucination rates vs no-citation baseline, strongest for fabricated citations on true claims, with domain variation but negligible venue or author effects.
EURO-5K: When Does Domain Pretraining Matter? Benchmarking Transformers for EU Reporting Obligation Extraction cs.CL · 2026-06-02 · unverdicted · none · ref 8 · 2 links
Introduces EURO-5K dataset from 136 EU acts and benchmarks full fine-tuning vs QLoRA for BERT and LLM models on reporting obligation extraction, reporting 0.89 F1 with limited gains from legal pretraining except under parameter-efficient adaptation.
UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning cs.CL · 2026-05-27 · unverdicted · none · ref 16
UA-Legal-Bench is a new five-task benchmark for Ukrainian legal reasoning that demonstrates task-dependent few-shot prompting effects and the need for macro-F1 over accuracy on imbalanced classes.
LexRubric: A Rubric-Guided Diagnostic Benchmark for Open-Ended Legal Tasks cs.CL · 2026-06-08 · unverdicted · none · ref 21
LexRubric is a rubric-based benchmark containing 649 instances and 12,337 atomic criteria for diagnostic evaluation of LLMs on open-ended Chinese legal consultation and judicial examination tasks across 14 scenarios.

and Henderson, Peter and Ho, Daniel E

fields

years

verdicts

representative citing papers

citing papers explorer