arXiv preprint arXiv:2601.16669 , year=

PLawBench: A Rubric-Based Benchmark for Evaluating LLMs in Real-World Legal Practice , author= · 2026 · arXiv 2601.16669

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Magis-Bench: Evaluating LLMs on Magistrate-Level Legal Tasks

cs.CL · 2026-05-08 · accept · novelty 7.0

Magis-Bench is a new benchmark of 74 magistrate-level legal writing tasks from Brazilian exams where the strongest LLMs reach only 6.97/10, showing judicial reasoning remains difficult for current models.

Can LLMs Act as Historians? Evaluating Historical Research Capabilities of LLMs via the Chinese Imperial Examination

cs.CL · 2026-04-27 · unverdicted · novelty 7.0

ProHist-Bench shows that even state-of-the-art LLMs struggle with complex historical research questions requiring evidentiary reasoning, based on 400 questions and 10,891 rubrics from the Keju system.

LegalCiteBench: Evaluating Citation Reliability in Legal Language Models

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

LegalCiteBench reveals that current LLMs achieve under 7% accuracy on closed-book legal citation retrieval and completion tasks, with misleading answer rates above 94% for nearly all tested models.

citing papers explorer

Showing 3 of 3 citing papers.

Magis-Bench: Evaluating LLMs on Magistrate-Level Legal Tasks cs.CL · 2026-05-08 · accept · none · ref 12
Magis-Bench is a new benchmark of 74 magistrate-level legal writing tasks from Brazilian exams where the strongest LLMs reach only 6.97/10, showing judicial reasoning remains difficult for current models.
Can LLMs Act as Historians? Evaluating Historical Research Capabilities of LLMs via the Chinese Imperial Examination cs.CL · 2026-04-27 · unverdicted · none · ref 3
ProHist-Bench shows that even state-of-the-art LLMs struggle with complex historical research questions requiring evidentiary reasoning, based on 400 questions and 10,891 rubrics from the Keju system.
LegalCiteBench: Evaluating Citation Reliability in Legal Language Models cs.CL · 2026-05-11 · unverdicted · none · ref 10
LegalCiteBench reveals that current LLMs achieve under 7% accuracy on closed-book legal citation retrieval and completion tasks, with misleading answer rates above 94% for nearly all tested models.

arXiv preprint arXiv:2601.16669 , year=

fields

years

verdicts

representative citing papers

citing papers explorer