arXiv preprint arXiv:2505.12864 , year=

Yu Fan et al · 2025 · arXiv 2505.12864

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Magis-Bench: Evaluating LLMs on Magistrate-Level Legal Tasks

cs.CL · 2026-05-08 · accept · novelty 7.0

Magis-Bench is a new benchmark of 74 magistrate-level legal writing tasks from Brazilian exams where the strongest LLMs reach only 6.97/10, showing judicial reasoning remains difficult for current models.

Expert Evaluation of LLM's Open-Ended Legal Reasoning on the Japanese Bar Exam Writing Task

cs.AI · 2026-04-26 · unverdicted · novelty 7.0

Expert evaluation of LLMs on Japanese bar exam writing tasks shows clear limitations in open-ended legal reasoning and frequent hallucinations unsupported by law or precedent.

LegalCiteBench: Evaluating Citation Reliability in Legal Language Models

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

LegalCiteBench reveals that current LLMs achieve under 7% accuracy on closed-book legal citation retrieval and completion tasks, with misleading answer rates above 94% for nearly all tested models.

BenGER: A Collaborative Web Platform for End-to-End Benchmarking of German Legal Tasks

cs.CL · 2026-04-15 · unverdicted · novelty 6.0

BenGER is a collaborative web platform that integrates end-to-end workflows for creating, annotating, running, and evaluating benchmarks on German legal tasks with large language models.

EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies

cs.CL · 2026-02-10 · conditional · novelty 6.0

EcoGym is a new open benchmark with three economic environments that reveals no leading LLM dominates at sustained plan-and-execute decision making across scenarios.

GradeLegal: Automated Grading for German Legal Cases

cs.CL · 2026-05-20 · unverdicted · novelty 5.0

Reasoning-oriented LLMs reach up to 0.91 quadratic weighted kappa agreement with experts on public law cases when given sample solutions and grading rubrics, but only 0.60 on criminal law cases.

Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization

cs.CL · 2026-04-22 · unverdicted · novelty 5.0

Automatic prompt optimization using lenient LLM judges improves performance and transferability in legal QA evaluations compared to human design or strict judges.

Beyond Imperfect Alternatives with Rulemapping: A Neuro-Symbolic Case Study on Online Hate Speech

cs.CY · 2026-04-10 · unverdicted · novelty 5.0

Rulemapping uses expert symbolic scaffolds to constrain LLMs, raising precision on §130(1) German hate-speech classification from 0.34-0.49 to 0.80-0.86 while preserving recall of 0.82-0.89.

NyayaMind- A Framework for Transparent Legal Reasoning and Judgment Prediction in the Indian Legal System

cs.CL · 2026-04-10 · unverdicted · novelty 5.0

NyayaMind combines RAG retrieval with domain-specific LLMs to generate transparent, structured legal reasoning and judgment predictions for Indian court cases.

citing papers explorer

Showing 9 of 9 citing papers.

Magis-Bench: Evaluating LLMs on Magistrate-Level Legal Tasks cs.CL · 2026-05-08 · accept · none · ref 4
Magis-Bench is a new benchmark of 74 magistrate-level legal writing tasks from Brazilian exams where the strongest LLMs reach only 6.97/10, showing judicial reasoning remains difficult for current models.
Expert Evaluation of LLM's Open-Ended Legal Reasoning on the Japanese Bar Exam Writing Task cs.AI · 2026-04-26 · unverdicted · none · ref 6
Expert evaluation of LLMs on Japanese bar exam writing tasks shows clear limitations in open-ended legal reasoning and frequent hallucinations unsupported by law or precedent.
LegalCiteBench: Evaluating Citation Reliability in Legal Language Models cs.CL · 2026-05-11 · unverdicted · none · ref 8
LegalCiteBench reveals that current LLMs achieve under 7% accuracy on closed-book legal citation retrieval and completion tasks, with misleading answer rates above 94% for nearly all tested models.
BenGER: A Collaborative Web Platform for End-to-End Benchmarking of German Legal Tasks cs.CL · 2026-04-15 · unverdicted · none · ref 1
BenGER is a collaborative web platform that integrates end-to-end workflows for creating, annotating, running, and evaluating benchmarks on German legal tasks with large language models.
EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies cs.CL · 2026-02-10 · conditional · none · ref 9
EcoGym is a new open benchmark with three economic environments that reveals no leading LLM dominates at sustained plan-and-execute decision making across scenarios.
GradeLegal: Automated Grading for German Legal Cases cs.CL · 2026-05-20 · unverdicted · none · ref 16
Reasoning-oriented LLMs reach up to 0.91 quadratic weighted kappa agreement with experts on public law cases when given sample solutions and grading rubrics, but only 0.60 on criminal law cases.
Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization cs.CL · 2026-04-22 · unverdicted · none · ref 4
Automatic prompt optimization using lenient LLM judges improves performance and transferability in legal QA evaluations compared to human design or strict judges.
Beyond Imperfect Alternatives with Rulemapping: A Neuro-Symbolic Case Study on Online Hate Speech cs.CY · 2026-04-10 · unverdicted · none · ref 14
Rulemapping uses expert symbolic scaffolds to constrain LLMs, raising precision on §130(1) German hate-speech classification from 0.34-0.49 to 0.80-0.86 while preserving recall of 0.82-0.89.
NyayaMind- A Framework for Transparent Legal Reasoning and Judgment Prediction in the Indian Legal System cs.CL · 2026-04-10 · unverdicted · none · ref 4
NyayaMind combines RAG retrieval with domain-specific LLMs to generate transparent, structured legal reasoning and judgment predictions for Indian court cases.

arXiv preprint arXiv:2505.12864 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer