Daniel Martin Katz, Michael James Bommarito, Shang Gao, and Pablo Arredondo

URLhttps://arxiv · 2024 · arXiv 2103.06268

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Multi-Legal-Bench: Evaluating LLMs on Legal Reasoning Across Jurisdictions, Languages, and Legal Traditions

cs.CL · 2026-05-28 · unverdicted · novelty 8.0

Multi-Legal-Bench creates a sparse 5x6 task-jurisdiction matrix across six countries and reports that few-shot effects replicate, no model dominates, cross-lingual transfer tracks label alignment more than language family, and tokenizer fertility does not predict accuracy.

UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning

cs.CL · 2026-05-27 · unverdicted · novelty 7.0

UA-Legal-Bench is a new five-task benchmark for Ukrainian legal reasoning that demonstrates task-dependent few-shot prompting effects and the need for macro-F1 over accuracy on imbalanced classes.

LegalCiteBench: Evaluating Citation Reliability in Legal Language Models

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

LegalCiteBench reveals that current LLMs achieve under 7% accuracy on closed-book legal citation retrieval and completion tasks, with misleading answer rates above 94% for nearly all tested models.

Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free

cs.CL · 2026-05-16 · unverdicted · novelty 5.0

Retrieval with frozen embeddings and k-NN delivers competitive accuracy, high data efficiency, and zero hallucinations on legal multi-label annotation across ECtHR and Eurlex datasets.

A Few Good Clauses: Comparing LLMs vs Domain-Trained Small Language Models on Structured Contract Extraction

cs.CL · 2026-05-07 · unverdicted · novelty 5.0

Domain-trained small language model Olava Extract outperforms frontier LLMs on structured contract extraction with macro F1 0.812, micro F1 0.842, highest precision, and 78-97% lower inference cost.

A Benchmark for Gap and Overlap Analysis as a Test of KG Task Readiness

cs.AI · 2026-04-12 · unverdicted · novelty 5.0

The paper releases a benchmark of ten life-insurance contracts, a domain ontology, and 58 evidence-linked scenarios that shows ontology-driven knowledge graph queries produce more consistent and diagnosable gap/overlap results than text-only LLM inference.

citing papers explorer

Showing 6 of 6 citing papers after filters.

Multi-Legal-Bench: Evaluating LLMs on Legal Reasoning Across Jurisdictions, Languages, and Legal Traditions cs.CL · 2026-05-28 · unverdicted · none · ref 8
Multi-Legal-Bench creates a sparse 5x6 task-jurisdiction matrix across six countries and reports that few-shot effects replicate, no model dominates, cross-lingual transfer tracks label alignment more than language family, and tokenizer fertility does not predict accuracy.
UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning cs.CL · 2026-05-27 · unverdicted · none · ref 8
UA-Legal-Bench is a new five-task benchmark for Ukrainian legal reasoning that demonstrates task-dependent few-shot prompting effects and the need for macro-F1 over accuracy on imbalanced classes.
LegalCiteBench: Evaluating Citation Reliability in Legal Language Models cs.CL · 2026-05-11 · unverdicted · none · ref 4
LegalCiteBench reveals that current LLMs achieve under 7% accuracy on closed-book legal citation retrieval and completion tasks, with misleading answer rates above 94% for nearly all tested models.
Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free cs.CL · 2026-05-16 · unverdicted · none · ref 15
Retrieval with frozen embeddings and k-NN delivers competitive accuracy, high data efficiency, and zero hallucinations on legal multi-label annotation across ECtHR and Eurlex datasets.
A Few Good Clauses: Comparing LLMs vs Domain-Trained Small Language Models on Structured Contract Extraction cs.CL · 2026-05-07 · unverdicted · none · ref 5
Domain-trained small language model Olava Extract outperforms frontier LLMs on structured contract extraction with macro F1 0.812, micro F1 0.842, highest precision, and 78-97% lower inference cost.
A Benchmark for Gap and Overlap Analysis as a Test of KG Task Readiness cs.AI · 2026-04-12 · unverdicted · none · ref 11
The paper releases a benchmark of ten life-insurance contracts, a domain ontology, and 58 evidence-linked scenarios that shows ontology-driven knowledge graph queries produce more consistent and diagnosable gap/overlap results than text-only LLM inference.

Daniel Martin Katz, Michael James Bommarito, Shang Gao, and Pablo Arredondo

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer