JU\'A -- A Benchmark for Information Retrieval in Brazilian Legal Text Collections

· 2026 · cs.IR · arXiv 2604.06098

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Legal information retrieval in Portuguese remains difficult to evaluate systematically because available datasets differ widely in document type, query style, and relevance definition. We present JU\'A, a public benchmark for Brazilian legal retrieval designed to support more reproducible and comparable evaluation across heterogeneous legal collections. More broadly, JU\'A is intended not only as a benchmark, but as a continuous evaluation infrastructure for Brazilian legal IR, combining shared protocols, common ranking metrics, fixed splits when applicable, and a public leaderboard. The benchmark covers jurisprudence retrieval as well as broader legislative, regulatory, and question-driven legal search. We evaluate lexical, dense, and BM25-based reranking pipelines, including a domain-adapted Qwen embedding model fine-tuned on JU\'A-aligned supervision. Results show that the benchmark is sufficiently heterogeneous to distinguish retrieval paradigms and reveal substantial cross-dataset trade-offs. Domain adaptation yields its clearest gains on the supervision-aligned JU\'A-Juris subset, while BM25 remains highly competitive on other collections, especially in settings with strong lexical and institutional phrasing cues. Overall, JU\'A provides a practical evaluation framework for studying legal retrieval across multiple Brazilian legal domains under a common benchmark design.

representative citing papers

Domain-Adaptive Dense Retrieval for Brazilian Legal Search

cs.IR · 2026-05-05 · unverdicted · novelty 4.0

Mixed training of Qwen3-Embedding-4B on legal data plus SQuAD-pt yields higher average NDCG@10 (0.447), MRR@10 (0.595), and MAP@10 (0.308) across six Portuguese retrieval datasets than legal-only or base models, with largest gains on out-of-domain question-based search.

citing papers explorer

Showing 1 of 1 citing paper.

Domain-Adaptive Dense Retrieval for Brazilian Legal Search cs.IR · 2026-05-05 · unverdicted · none · ref 18 · internal anchor
Mixed training of Qwen3-Embedding-4B on legal data plus SQuAD-pt yields higher average NDCG@10 (0.447), MRR@10 (0.595), and MAP@10 (0.308) across six Portuguese retrieval datasets than legal-only or base models, with largest gains on out-of-domain question-based search.

JU\'A -- A Benchmark for Information Retrieval in Brazilian Legal Text Collections

fields

years

verdicts

representative citing papers

citing papers explorer