pith. sign in

arxiv: 2601.14348 · v3 · pith:NRPEBZGRnew · submitted 2026-01-20 · 💻 cs.IR

Legal Retrieval for Public Defenders

Pith reviewed 2026-05-21 15:02 UTC · model grok-4.3

classification 💻 cs.IR
keywords legal retrievalpublic defendersinformation retrievalquery expansionlegal AIdomain adaptationbenchmarksappellate briefs
0
0 comments X

The pith

Domain knowledge adaptations improve retrieval quality for public defense legal research where standard benchmarks fail.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates the NJ BriefBank, a tool to retrieve relevant appellate briefs for public defenders to ease their heavy research workload. It finds that off-the-shelf retrieval benchmarks do not perform well on this real-world task. Adding specific domain knowledge such as expanding queries with legal reasoning, using public defense data, and generating synthetic examples raises the quality of results. The authors release a taxonomy of typical defender queries along with a manually annotated test set that matches expert defender judgments closely. This provides a practical example of tailoring AI retrieval to support constitutional legal aid under resource constraints.

Core claim

We show that existing retrieval benchmarks fail to transfer to real public defense research, however adding domain knowledge improves retrieval quality. This includes query expansion with legal reasoning, domain-specific data and curated synthetic examples. To facilitate further research, we release a taxonomy of realistic defender search queries and a manually annotated evaluation dataset for public defense retrieval. This benchmark is highly correlated with a proprietary retrieval dataset annotated by experienced public defenders.

What carries the argument

NJ BriefBank, a domain-adapted retrieval system using query expansion with legal reasoning and synthetic data curation to surface appellate briefs.

If this is right

  • Retrieval systems for legal professionals require domain-specific tuning rather than relying on general benchmarks.
  • Public defenders can access more relevant precedents faster, potentially improving case preparation efficiency.
  • Releasing annotated datasets and query taxonomies allows the community to build better legal AI tools.
  • AI assistance in public defense can help address constitutional rights to counsel amid high caseloads.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Applying similar adaptations could benefit retrieval tasks in other specialized legal fields like criminal appeals or civil rights.
  • Live deployment and user studies with defenders would test if improved retrieval leads to actual time savings or better outcomes.
  • The method of combining legal reasoning in queries might extend to other knowledge-intensive professional search domains.
  • Integration with writing assistance tools could further reduce the burden on under-resourced public defense offices.

Load-bearing premise

The manually annotated evaluation dataset and its correlation with defender annotations fully represent the day-to-day search needs of public defenders.

What would settle it

Collecting a fresh set of queries and relevance judgments from active public defenders and finding that the domain knowledge methods show no improvement over baselines would falsify the central claim.

read the original abstract

AI tools are suggested as solutions to assist public agencies with heavy workloads. In public defense -- where a constitutional right to counsel meets the complexities of law, overwhelming caseloads, and constrained resources -- practitioners face especially taxing conditions. Yet, there is little evidence of how AI could meaningfully support defenders' day-to-day work. In partnership with the New Jersey Office of the Public Defender, we develop the NJ BriefBank, a retrieval tool which surfaces relevant appellate briefs to streamline legal research and writing. We show that existing retrieval benchmarks fail to transfer to real public defense research, however adding domain knowledge improves retrieval quality. This includes query expansion with legal reasoning, domain-specific data and curated synthetic examples. To facilitate further research, we release a taxonomy of realistic defender search queries and a manually annotated evaluation dataset for public defense retrieval. This benchmark is highly correlated with a proprietary retrieval dataset annotated by experienced public defenders. Our work improves on the status quo of realistic legal retrieval benchmarking and illustrates one approach to applying AI in a real-world public interest setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces NJ BriefBank, a retrieval system developed with the New Jersey Office of the Public Defender to surface relevant appellate briefs for legal research. It claims that standard IR benchmarks fail to transfer to public-defense workflows, while domain-specific enhancements—query expansion incorporating legal reasoning, domain data, and curated synthetic examples—yield measurable improvements. To support further work, the authors release a taxonomy of realistic defender queries and a manually annotated evaluation dataset that shows high correlation with a proprietary set annotated by experienced public defenders.

Significance. If the evaluation holds, the work offers a concrete demonstration of how IR techniques can be adapted to a high-stakes, resource-constrained public-interest domain and supplies reusable resources (taxonomy and dataset) that could improve benchmarking practices in legal retrieval. The practitioner partnership is a notable strength.

major comments (2)
  1. [Evaluation / Benchmark Construction] The central claims—that existing benchmarks fail to transfer and that domain-knowledge additions improve retrieval—rest on the representativeness of the released taxonomy and manually annotated dataset. The manuscript supports this primarily through reported high correlation with the proprietary defender-annotated set, but does not detail the annotation guidelines, inter-annotator agreement, or coverage of deeper workflow elements such as statute-to-fact mapping and multi-document precedent chaining. Without these, both the negative transfer result and the positive gains risk being artifacts of the evaluation set.
  2. [Results] The abstract and results sections report correlation with expert annotations and overall improvement from domain knowledge, yet provide no quantitative metrics (e.g., nDCG, MAP, or recall@K values), baseline details, or ablation results isolating the contribution of legal-reasoning expansion versus domain data versus synthetic examples. This absence prevents assessment of whether the reported gains are substantial or merely incremental.
minor comments (2)
  1. [Methods] Clarify the exact composition of the synthetic examples and how they were curated to avoid circularity with the evaluation queries.
  2. [Discussion] Add a limitations paragraph discussing potential mismatches between the query taxonomy and the full range of daily defender search tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and outline the revisions we will make to improve the clarity and rigor of the evaluation and results sections.

read point-by-point responses
  1. Referee: [Evaluation / Benchmark Construction] The central claims—that existing benchmarks fail to transfer and that domain-knowledge additions improve retrieval—rest on the representativeness of the released taxonomy and manually annotated dataset. The manuscript supports this primarily through reported high correlation with the proprietary defender-annotated set, but does not detail the annotation guidelines, inter-annotator agreement, or coverage of deeper workflow elements such as statute-to-fact mapping and multi-document precedent chaining. Without these, both the negative transfer result and the positive gains risk being artifacts of the evaluation set.

    Authors: We agree that greater transparency regarding the annotation process and taxonomy coverage is needed to substantiate the claims. In the revised manuscript we will add a dedicated subsection that describes the annotation guidelines, reports inter-annotator agreement statistics, and explicitly discusses how the taxonomy and dataset address key workflow elements including statute-to-fact mapping and multi-document precedent chaining. These additions will help demonstrate that the evaluation resources are representative of public-defense practice. revision: yes

  2. Referee: [Results] The abstract and results sections report correlation with expert annotations and overall improvement from domain knowledge, yet provide no quantitative metrics (e.g., nDCG, MAP, or recall@K values), baseline details, or ablation results isolating the contribution of legal-reasoning expansion versus domain data versus synthetic examples. This absence prevents assessment of whether the reported gains are substantial or merely incremental.

    Authors: We acknowledge that the current results presentation would benefit from more granular quantitative reporting. We will revise the results section to include specific metrics such as nDCG, MAP, and recall@K, provide full baseline descriptions, and add ablation experiments that isolate the individual contributions of legal-reasoning query expansion, domain-specific data, and synthetic examples. This will enable readers to assess the practical significance of the observed improvements. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on released external annotations and correlations rather than self-referential fitting or derivation.

full rationale

The paper reports empirical retrieval experiments on a new public-defense benchmark, shows failure of prior benchmarks to transfer, and demonstrates gains from domain-specific query expansion, data, and synthetic examples. It releases a taxonomy and manually annotated dataset whose correlation with a separate proprietary defender-annotated set is offered as external validation. No equations, predictions, or first-principles derivations are present that reduce to the paper's own inputs by construction. The load-bearing assumption (representativeness of the new benchmark) is acknowledged as an empirical limitation rather than a definitional or self-citation loop. This is a standard non-circular empirical IR paper whose central results are falsifiable against the released data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the assumption that defender-provided annotations and synthetic examples capture the relevant distribution of real queries; no new physical or mathematical entities are introduced.

axioms (1)
  • domain assumption Standard information retrieval metrics (e.g., precision, recall, or ranking quality) are appropriate proxies for usefulness in legal research.
    Invoked when claiming that domain knowledge improves retrieval quality.

pith-pipeline@v0.9.0 · 5716 in / 1171 out tokens · 104202 ms · 2026-05-21T15:02:27.172756+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Controlling Authority Retrieval: A Missing Retrieval Objective for Authority-Governed Knowledge

    cs.IR 2026-04 unverdicted novelty 7.0

    CAR is a new retrieval objective that targets the currently active authority set rather than most-similar documents, with theorems on coverage conditions and evaluations showing two-stage methods outperform dense retr...

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · cited by 1 Pith paper · 8 internal anchors

  1. [1]

    Wainwright, 372 U.S

    Gideon v. Wainwright, 372 U.S. 335 (1963), 1963. URLhttps://supreme.justia.com/cases/ federal/us/372/335/

  2. [2]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfs- son, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Dur- mus, ...

  3. [3]

    The massive legal embedding benchmark (mleb), 2025

    Umar Butler, Abdur-Rahman Butler, and Adrian Lucas Malec. The massive legal embedding benchmark (mleb), 2025. URLhttps://arxiv.org/abs/2510.19365

  4. [4]

    AIR-bench: Automated heterogeneous information retrieval benchmark

    Jianlyu Chen, Nan Wang, Chaofan Li, Bo Wang, Shitao Xiao, Han Xiao, Hao Liao, Defu Lian, and Zheng Liu. AIR-bench: Automated heterogeneous information retrieval benchmark. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors, Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume ...

  5. [5]

    How Can AI Augment Access to Justice? Public Defenders' Perspectives on AI Adoption

    Inyoung Cheong, Patty Liu, Dominik Stammbach, and Peter Henderson. How can ai augment access to justice? public defenders’ perspectives on ai adoption, 2025. URLhttps://arxiv. org/abs/2510.22933. 15 Legal Retrieval for Public Defenders

  6. [6]

    Record clearance at scale: How clear my record helped reduce or dismiss 144,000 convictions in california, 2020

    Code for America. Record clearance at scale: How clear my record helped reduce or dismiss 144,000 convictions in california, 2020. URLhttps://codeforamerica.org/news/ record-clearance-at-scale-how-clear-my-record-helped-reduce-or-dismiss-144-000-convictions-in-california/. Accessed: 2025-08-03

  7. [7]

    Modeling law search as prediction.Artificial Intelligence and Law, 29(1):3–34, 2021

    Faraz Dadgostari, Mauricio Guim, Peter A Beling, Michael A Livermore, and Daniel N Rock- more. Modeling law search as prediction.Artificial Intelligence and Law, 29(1):3–34, 2021

  8. [8]

    Mind2Web: Towards a Generalist Agent for the Web

    Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web, 2023. URLhttps://arxiv.org/abs/ 2306.06070

  9. [9]

    Lawma: The power of specialization for legal annotation, 2025

    Ricardo Dominguez-Olmedo, Vedant Nanda, Rediet Abebe, Stefan Bechtold, Christoph Engel, Jens Frankenreiter, Krishna Gummadi, Moritz Hardt, and Michael Livermore. Lawma: The power of specialization for legal annotation, 2025. URLhttps://arxiv.org/abs/2407.16615

  10. [10]

    Barry C. Edwards. Why appeals courts rarely reverse lower courts: An experimental study to explore affirmation bias.Emory Law Journal Online, 68:1035–1073, 2019. URLhttps: //scholarlycommons.law.emory.edu/elj-online/7

  11. [11]

    Scaling deep contrastive learning batch size under memory limited setup, 2021

    Luyu Gao, Yunyi Zhang, Jiawei Han, and Jamie Callan. Scaling deep contrastive learning batch size under memory limited setup, 2021. URLhttps://arxiv.org/abs/2101.06983

  12. [12]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ah- mad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, and Zhiyu Ma et al. The llama 3 herd of models, 2024. URLhttps://arxiv.org/abs/2407.21783

  13. [13]

    The illusion of readiness: Stress testing large frontier models on multimodal medi- cal benchmarks, 2025

    Yu Gu, Jingjing Fu, Xiaodong Liu, Jeya Maria Jose Valanarasu, Noel Codella, Reuben Tan, Qianchu Liu, Ying Jin, Sheng Zhang, Jinyu Wang, Rui Wang, Lei Song, Guanghui Qin, Naoto Usuyama, Cliff Wong, Cheng Hao, Hohin Lee, Praneeth Sanapathi, Sarah Hilado, Bian Jiang, Javier Alvarez-Valle, Mu Wei, Jianfeng Gao, Eric Horvitz, Matt Lungren, Hoifung Poon, and Pa...

  14. [14]

    Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel N

    Neel Guha, Julian Nyarko, Daniel E. Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel N. Rockmore, Diego Zambrano, Dmitry Talisman, Enam Hoque, Faiz Surani, Frank Fagan, Galit Sarfaty, Gregory M. Dickinson, Haggai Porat, Jason Hegland, Jessica Wu, Joe Nudell, Joel Niklaus, John Nay, Jonathan H. Cho...

  15. [15]

    CLERC: A dataset for U

    Abe Bohan Hou, Orion Weller, Guanghui Qin, Eugene Yang, Dawn Lawrie, Nils Holzenberger, Andrew Blair-Stanek, and Benjamin Van Durme. CLERC: A dataset for U. S. legal case retrieval and retrieval-augmented analysis generation. In Luis Chiruzzo, Alan Ritter, and Lu Wang, ed- itors,Findings of the Association for Computational Linguistics: NAACL 2025, pages ...

  16. [16]

    Ho, Mark S

    Zihan Huang, Charles Low, Mengqiu Teng, Hongyi Zhang, Daniel E. Ho, Mark S. Krass, and Matthias Grabmair. Context-aware legal citation recommendation using deep learning. InPro- ceedings of the Eighteenth International Conference on Artificial Intelligence and Law, ICAIL ’21, page 79–88, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9...

  17. [17]

    Tetreault, and Daryna Dementieva

    Antonia Karamolegkou, Angana Borah, Eunjung Cho, Sagnik Ray Choudhury, Martina Gal- letti, Rajarshi Ghosh, Pranav Gupta, Oana Ignat, Priyanka Kargupta, Neema Kotonya, Hemank Lamba, Sun-Joo Lee, Arushi Mangla, Ishani Mondal, Deniz Nazarova, Poli Nemkova, Dina Pisarevskaya, Naquee Rizwan, Nazanin Sabri, Dominik Stammbach, Anna Steinberg, David Tomás, Steven...

  18. [18]

    Gpt-4 passes the bar exam.Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineer- ing Sciences, 382(2270):20230254, 02 2024

    Daniel Martin Katz, Michael James Bommarito, Shang Gao, and Pablo Arredondo. Gpt-4 passes the bar exam.Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineer- ing Sciences, 382(2270):20230254, 02 2024. ISSN 1364-503X. doi: 10.1098/rsta.2023.0254. URL https://doi.org/10.1098/rsta.2023.0254

  19. [19]

    Coliee 2022 summary: Methods for legal document retrieval and entailment

    Mi-Young Kim, Juliano Rabelo, Randy Goebel, Masaharu Yoshioka, Yoshinobu Kano, and Ken Satoh. Coliee 2022 summary: Methods for legal document retrieval and entailment. InNew Frontiers in Artificial Intelligence: JSAI-IsAI 2022 Workshop, JURISIN 2022, and JSAI 2022 Interna- tional Session, Kyoto, Japan, June 12–17, 2022, Revised Selected Papers, page 51–67...

  20. [20]

    NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

    Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman, Mohammad Shoeybi, Bryan Catan- zaro, and Wei Ping. Nv-embed: Improved techniques for training llms as generalist embedding models.arXiv preprint arXiv:2405.17428, 2024

  21. [21]

    Bm25s: Orders of magnitude faster lexical search via eager sparse scoring, 2024

    Xing Han Lù. Bm25s: Orders of magnitude faster lexical search via eager sparse scoring, 2024. URLhttps://arxiv.org/abs/2407.03618

  22. [22]

    emnlp-main.608

    Robert Mahari, Dominik Stammbach, Elliott Ash, and Alex Pentland. The law and NLP: Bridg- ing disciplinary disconnects. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3445–3454, Singapore, Decem- ber 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.fi...

  23. [23]

    URLhttps://aclanthology.org/2023.findings-emnlp.224/

  24. [24]

    LePaRD: A large-scale dataset of judicial citations to precedent

    Robert Mahari, Dominik Stammbach, Elliott Ash, and Alex Pentland. LePaRD: A large-scale dataset of judicial citations to precedent. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Vol- ume 1: Long Papers), pages 9863–9877, Bangkok, Thailand, August 2024. Ass...

  25. [25]

    Plain English summarization of contracts

    Laura Manor and Junyi Jessy Li. Plain English summarization of contracts. InProceedings of the Natural Legal Language Processing Workshop 2019, pages 1–11, Minneapolis, Minnesota, June

  26. [26]

    URLhttps://www.aclweb.org/anthology/ W19-2201

    Association for Computational Linguistics. URLhttps://www.aclweb.org/anthology/ W19-2201

  27. [27]

    MTEB : Massive text embedding benchmark

    Niklas Muennighoff, Nouamane Tazi, Loic Magne, and Nils Reimers. MTEB: Massive text embedding benchmark. In Andreas Vlachos and Isabelle Augenstein, editors,Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2014–2037, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics. d...

  28. [28]

    Operator system card

    OpenAI. Operator system card. System card, OpenAI, January 2025. URLhttps://cdn.openai. com/operator_system_card.pdf

  29. [29]

    Map- ping global dynamics of benchmark creation and saturation in artificial intelligence.Nature Communications, 13(1):6793, 2022

    Simon Ott, Adriano Barbosa-Silva, Kathrin Blagec, Jan Brauner, and Matthias Samwald. Map- ping global dynamics of benchmark creation and saturation in artificial intelligence.Nature Communications, 13(1):6793, 2022

  30. [30]

    Pace, Malia N

    Nicholas M. Pace, Malia N. Brink, Cynthia G. Lee, and Stephen F. Hanlon. National public defense workload study. Technical report, RAND Corporation, 2023

  31. [31]

    Scikit-learn: Machine learning in python.J

    Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vander- plas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit-learn: Machine learning in python.J. Mach. Learn. Res., 12(null):282...

  32. [32]

    olmocr: Unlocking trillions of tokens in pdfs with vi- sion language models.arXiv preprint arXiv:2502.18443, 2025a

    Jake Poznanski, Jon Borchardt, Jason Dunkelberger, Regan Huff, Daniel Lin, Aman Rangapur, Christopher Wilhelm, Kyle Lo, and Luca Soldaini. olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models, 2025. URLhttps://arxiv.org/abs/2502.18443

  33. [33]

    Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

    Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT- networks. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992,...

  34. [34]

    Daniel Schwarcz and Jonathan H. Choi. Ai tools for lawyers: A practical guide.Minnesota Law Review Headnotes, 108:1, March 2023. doi: 10.2139/ssrn.4404017. URLhttps://ssrn.com/ abstract=4404017. Minnesota Legal Studies Research Paper

  35. [35]

    Evaluating chunking strategies for retrieval

    Brandon Smith and Anton Troynikov. Evaluating chunking strategies for retrieval. Technical report, Chroma, July 2024. URLhttps://research.trychroma.com/evaluating-chunking

  36. [36]

    Empowering Legal Aid

    Stanford Law School Legal Design Lab. Empowering legal aid: Developing ai co-pilots for eviction defense and reentry debt mitigation. Press release / in-brief article by Stanford Law School, 2025. URLhttps://law.stanford.edu/press/empowering-legal-aid/. Stanford Law School, “Empowering Legal Aid” initiative description; accessed August 3, 2025. 18 Legal R...

  37. [37]

    Manning, Peter Hender- son, and Daniel E

    Faiz Surani, Mirac Suzgun, Vyoma Raman, Christopher D. Manning, Peter Henderson, and Daniel E. Ho. Ai for scaling legal reform: Mapping and redacting racial covenants in santa clara county, 2025. URLhttps://arxiv.org/abs/2503.03888

  38. [38]

    Visualizing data using t-sne.Journal of Machine Learning Research, 9(86):2579–2605, 2008

    Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research, 9(86):2579–2605, 2008. URLhttp://jmlr.org/papers/v9/vandermaaten08a. html

  39. [39]

    Text Embeddings by Weakly-Supervised Contrastive Pre-training

    Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Ma- jumder, and Furu Wei. Text embeddings by weakly-supervised contrastive pre-training.arXiv preprint arXiv:2212.03533, 2022

  40. [40]

    Improving text embeddings with large language models,

    Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei. Im- proving text embeddings with large language models.arXiv preprint arXiv:2401.00368, 2023

  41. [41]

    Do llms truly understand when a precedent is overruled?, 2025

    Li Zhang, Jaromir Savelka, and Kevin Ashley. Do llms truly understand when a precedent is overruled?, 2025. URLhttps://arxiv.org/abs/2510.20941

  42. [42]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. Qwen3 embedding: Advancing text embedding and reranking through foundation models, 2025. URLhttps:// arxiv.org/abs/2506.05176

  43. [43]

    WildChat: 1M ChatGPT Interaction Logs in the Wild

    Wenting Zhao, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, and Yuntian Deng. Wildchat: 1m chatgpt interaction logs in the wild, 2024. URLhttps://arxiv.org/abs/2405.01470

  44. [44]

    Manning, Peter Henderson, and Daniel E

    Lucia Zheng, Neel Guha, Javokhir Arifov, Sarah Zhang, Michal Skreta, Christopher D. Man- ning, Peter Henderson, and Daniel E. Ho. A reasoning-focused legal retrieval benchmark. InProceedings of the 2025 Symposium on Computer Science and Law, CSLAW ’25, page 169–193, New York, NY, USA, 2025. Association for Computing Machinery. ISBN 9798400714214. doi: 10....

  45. [45]

    Infer the key legal issue(s) raised by the query

  46. [46]

    State the applicable legal rule(s) in general doctrinal terms

  47. [47]

    Optionally provide a brief legal analysis (reasoning) if it helps clarify the issue and rule

  48. [48]

    merger of offenses

    Construct an augmented search query that incorporates the original query plus useful legal reasoning signals (issues, rules, key concepts, doctrinal terms). This augmented query may be in any style (keywords, IRAC-style summary, or a well-structured legal question), as long as it is helpful for retrieving relevant cases and statutes. Important constraints...