Multi-Legal-Bench creates a sparse 5x6 task-jurisdiction matrix across six countries and reports that few-shot effects replicate, no model dominates, cross-lingual transfer tracks label alignment more than language family, and tokenizer fertility does not predict accuracy.
Phillip Rust, Jonas Pfeiffer, Ivan Vulić, Sebastian Ruder, and Iryna Gurevych
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
UA-Legal-Bench is a new five-task benchmark for Ukrainian legal reasoning that demonstrates task-dependent few-shot prompting effects and the need for macro-F1 over accuracy on imbalanced classes.
citing papers explorer
-
Multi-Legal-Bench: Evaluating LLMs on Legal Reasoning Across Jurisdictions, Languages, and Legal Traditions
Multi-Legal-Bench creates a sparse 5x6 task-jurisdiction matrix across six countries and reports that few-shot effects replicate, no model dominates, cross-lingual transfer tracks label alignment more than language family, and tokenizer fertility does not predict accuracy.
-
UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning
UA-Legal-Bench is a new five-task benchmark for Ukrainian legal reasoning that demonstrates task-dependent few-shot prompting effects and the need for macro-F1 over accuracy on imbalanced classes.