Introduces a new dataset and Average Severity Error metric for benchmarking LLMs on multi-label legal precedent treatment classification.
Mikail Demir, Hakan T
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
LLMs exhibit 20-40% lower recall on ambiguous human names for PII detection, worsening under prompt injections, as shown via the new AmBench benchmark.
citing papers explorer
-
Validate Your Authority: Benchmarking LLMs on Multi-Label Precedent Treatment Classification
Introduces a new dataset and Average Severity Error metric for benchmarking LLMs on multi-label legal precedent treatment classification.
-
Can Large Language Models Really Recognize Your Name?
LLMs exhibit 20-40% lower recall on ambiguous human names for PII detection, worsening under prompt injections, as shown via the new AmBench benchmark.