Phillip Rust, Jonas Pfeiffer, Ivan Vulić, Sebastian Ruder, and Iryna Gurevych

URLhttps://arxiv · arXiv 2306.09237

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Multi-Legal-Bench: Evaluating LLMs on Legal Reasoning Across Jurisdictions, Languages, and Legal Traditions

cs.CL · 2026-05-28 · unverdicted · novelty 8.0

Multi-Legal-Bench creates a sparse 5x6 task-jurisdiction matrix across six countries and reports that few-shot effects replicate, no model dominates, cross-lingual transfer tracks label alignment more than language family, and tokenizer fertility does not predict accuracy.

UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning

cs.CL · 2026-05-27 · unverdicted · novelty 7.0

UA-Legal-Bench is a new five-task benchmark for Ukrainian legal reasoning that demonstrates task-dependent few-shot prompting effects and the need for macro-F1 over accuracy on imbalanced classes.

citing papers explorer

Showing 2 of 2 citing papers.

Multi-Legal-Bench: Evaluating LLMs on Legal Reasoning Across Jurisdictions, Languages, and Legal Traditions cs.CL · 2026-05-28 · unverdicted · none · ref 14
Multi-Legal-Bench creates a sparse 5x6 task-jurisdiction matrix across six countries and reports that few-shot effects replicate, no model dominates, cross-lingual transfer tracks label alignment more than language family, and tokenizer fertility does not predict accuracy.
UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning cs.CL · 2026-05-27 · unverdicted · none · ref 12
UA-Legal-Bench is a new five-task benchmark for Ukrainian legal reasoning that demonstrates task-dependent few-shot prompting effects and the need for macro-F1 over accuracy on imbalanced classes.

Phillip Rust, Jonas Pfeiffer, Ivan Vulić, Sebastian Ruder, and Iryna Gurevych

fields

years

verdicts

representative citing papers

citing papers explorer