Mixed training of Qwen3-Embedding-4B on legal data plus SQuAD-pt yields higher average NDCG@10 (0.447), MRR@10 (0.595), and MAP@10 (0.308) across six Portuguese retrieval datasets than legal-only or base models, with largest gains on out-of-domain question-based search.
JU\'A -- A Benchmark for Information Retrieval in Brazilian Legal Text Collections
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Legal information retrieval in Portuguese remains difficult to evaluate systematically because available datasets differ widely in document type, query style, and relevance definition. We present JU\'A, a public benchmark for Brazilian legal retrieval designed to support more reproducible and comparable evaluation across heterogeneous legal collections. More broadly, JU\'A is intended not only as a benchmark, but as a continuous evaluation infrastructure for Brazilian legal IR, combining shared protocols, common ranking metrics, fixed splits when applicable, and a public leaderboard. The benchmark covers jurisprudence retrieval as well as broader legislative, regulatory, and question-driven legal search. We evaluate lexical, dense, and BM25-based reranking pipelines, including a domain-adapted Qwen embedding model fine-tuned on JU\'A-aligned supervision. Results show that the benchmark is sufficiently heterogeneous to distinguish retrieval paradigms and reveal substantial cross-dataset trade-offs. Domain adaptation yields its clearest gains on the supervision-aligned JU\'A-Juris subset, while BM25 remains highly competitive on other collections, especially in settings with strong lexical and institutional phrasing cues. Overall, JU\'A provides a practical evaluation framework for studying legal retrieval across multiple Brazilian legal domains under a common benchmark design.
fields
cs.IR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Domain-Adaptive Dense Retrieval for Brazilian Legal Search
Mixed training of Qwen3-Embedding-4B on legal data plus SQuAD-pt yields higher average NDCG@10 (0.447), MRR@10 (0.595), and MAP@10 (0.308) across six Portuguese retrieval datasets than legal-only or base models, with largest gains on out-of-domain question-based search.