pith. sign in
Pith Number

pith:W35KERMQ

pith:2026:W35KERMQ6OS3HNMCCKUXYO7RL5
not attested not anchored not stored refs resolved

Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free

Jaromir Savelka, Kevin Ashley, Li Zhang

Retrieval in a frozen embedding space assigns multiple legal labels to documents with competitive accuracy, strong data efficiency, and no risk of hallucinating outside the taxonomy.

arxiv:2605.16767 v1 · 2026-05-16 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{W35KERMQ6OS3HNMCCKUXYO7RL5}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Across three legal datasets, retrieval achieves competitive accuracy and strong data efficiency; on Eurlex, Qwen-8B retrieval improves Macro-F1 from 40.41 (GPT-5.2, zero-shot) to 49.12 while reducing estimated compute by 20-30 times compared to fine-tuning, and with N=100 training samples nearly doubles Micro-F1 over hierarchical Legal-BERT on ECtHR-A.

C2weakest assumption

That similarity in the frozen retrieval embedding space reliably indicates label applicability for long, fact-intensive legal documents without any task-specific adaptation or fine-tuning of the embedder.

C3one line summary

Retrieval with frozen embeddings and k-NN delivers competitive accuracy, high data efficiency, and zero hallucinations on legal multi-label annotation across ECtHR and Eurlex datasets.

References

37 extracted · 37 resolved · 8 Pith anchors

[1] K. D. Ashley, Artificial intelligence and legal analytics: new tools for law practice in the digital age, Cambridge University Press, 2017 2017
[2] I. Chalkidis, A. Jana, D. Hartung, M. Bommarito, I. Androutsopoulos, D. Katz, N. Aletras, Lexglue: A benchmark dataset for legal language understanding in english, in: Proceedings of the 60th Annual M 2022
[3] N. Aletras, D. Tsarapatsanis, D. Preoţiuc-Pietro, V. Lampos, Predicting judicial decisions of the european court of human rights: A natural language processing perspective, PeerJ computer science 2 (2 2016
[4] I. Chalkidis, E. Fergadiotis, P. Malakasiotis, I. Androutsopoulos, Large-scale multi-label text classification on eu legislation, in: Proceedings of the 57th annual meeting of the association for comp 2019
[5] W.-C. Chang, H.-F. Yu, K. Zhong, Y. Yang, I. S. Dhillon, Taming pretrained transformers for extreme multi-label text classification, in: Proceedings of the 26th ACM SIGKDD international conference on 2020

Formal links

1 machine-checked theorem link

Receipt and verification
First computed 2026-05-20T00:03:20.875166Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

b6faa24590f3a5b3b58212a97c3bf15f7df37a609721db66b9245c926346757f

Aliases

arxiv: 2605.16767 · arxiv_version: 2605.16767v1 · doi: 10.48550/arxiv.2605.16767 · pith_short_12: W35KERMQ6OS3 · pith_short_16: W35KERMQ6OS3HNMC · pith_short_8: W35KERMQ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/W35KERMQ6OS3HNMCCKUXYO7RL5 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b6faa24590f3a5b3b58212a97c3bf15f7df37a609721db66b9245c926346757f
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "075401109011c74f96c342b9fd43906cd997048e64e9ba584719b6d5d15c97d7",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-16T02:40:01Z",
    "title_canon_sha256": "a5fe3bfcf5f47643c3dbf1f5828ad488804fb62b3b4818b700e704b1145b1e27"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.16767",
    "kind": "arxiv",
    "version": 1
  }
}