pith. sign in

arxiv: 2512.12643 · v2 · submitted 2025-12-14 · 💻 cs.CL

LexRel: Benchmarking Legal Relation Extraction for Chinese Civil Cases

Pith reviewed 2026-05-16 22:40 UTC · model grok-4.3

classification 💻 cs.CL
keywords legal relation extractionChinese civil casesLLM benchmarkinglegal AIrelation extraction taskhierarchical taxonomyexpert annotation
0
0 comments X

The pith

Large language models struggle to identify legal relations in Chinese civil cases, though using such relations improves downstream legal AI performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes a hierarchical schema for legal relations in Chinese civil cases, complete with argument definitions, and releases LexRel as an expert-annotated benchmark for the relation extraction task. Tests on state-of-the-art LLMs reveal clear weaknesses in their ability to correctly identify these relations. The authors also find that supplying models with legal relation details produces clear gains on other legal AI tasks. Readers should care because legal relations form a key structure for understanding civil disputes, yet AI tools have had no reliable way to extract them until now. The benchmark provides a standard testbed to drive progress in this area.

Core claim

The central discovery is a new schema and benchmark called LexRel for legal relation extraction in Chinese civil cases. Current LLMs perform poorly on the task of identifying these relations accurately. Explicitly incorporating legal relation information leads to performance improvements on downstream legal AI tasks.

What carries the argument

The LexRel benchmark, which rests on a hierarchical taxonomy of legal relations together with explicit definitions of their arguments.

If this is right

  • Improved accuracy in legal relation extraction will support more reliable dispute resolution systems.
  • Downstream legal AI tasks will see performance gains from explicit relation information.
  • The benchmark allows systematic development and comparison of new extraction methods.
  • Legal AI models can become more aligned with structured legal analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The schema could be extended to criminal or other legal domains for broader benchmarks.
  • LLMs might require domain-specific fine-tuning or external knowledge to close the performance gap.
  • Integration with existing legal ontologies could enhance the utility of the extracted relations.
  • Real-world deployment would need testing on diverse case types beyond the benchmark's coverage.

Load-bearing premise

The chosen schema and annotations provide a complete and unbiased representation of legal relations across Chinese civil cases.

What would settle it

Finding a large set of real Chinese civil cases where the annotated relations miss key elements present in expert legal analysis, or an LLM achieving high accuracy on LexRel without relation-specific training.

Figures

Figures reproduced from arXiv: 2512.12643 by Chenyang Li, Huiyuan Xie, Ranjuexiao Hu, Weixing Shen, Yida Cai, Yun Liu, Yuxiao Ye, Zhenghao Liu, Zhiyuan Liu.

Figure 1
Figure 1. Figure 1: The construction workflow of LexRel is shown (English translation of the original Chinese text). The type extraction task involves extracting types from factual text by referencing taxonomy, while the argument extraction task involves extracting arguments from factual text and types by referencing definition of arguments. The red and blue annotations denote subject and object, as well as information that m… view at source ↗
Figure 2
Figure 2. Figure 2: (A) and (B) show the distributions of top 100 most frequent legal relation types in [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Pareto distribution of causes of action, demon [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Legal relations serve as an important analytical framework for dispute resolution in civil cases. However, legal relations in Chinese civil cases remain underexplored in the field of legal AI, largely due to the absence of comprehensive schemas. In this work, we first introduce a comprehensive schema for legal relations in civil cases, which contains a hierarchical taxonomy and definitions of arguments. Based on this schema, we formulate a legal relation extraction task and present LexRel, an expert-annotated benchmark for legal relation extraction in the Chinese civil law domain. We use LexRel to evaluate state-of-the-art large language models (LLMs) on legal relation extraction, showing that current LLMs exhibit significant limitations in accurately identifying civil legal relations. Furthermore, we demonstrate that explicitly incorporating information about legal relations leads to promising performance gains on other downstream legal AI tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces a hierarchical taxonomy and argument definitions for legal relations in Chinese civil cases, formulates a relation extraction task, and releases LexRel, an expert-annotated benchmark dataset. It evaluates state-of-the-art LLMs on this task, reports significant limitations in their performance, and shows that explicitly incorporating legal-relation information yields performance gains on other downstream legal AI tasks.

Significance. If the benchmark proves reliable, the work provides the first comprehensive schema and gold-standard dataset for legal relation extraction in Chinese civil law, a previously underexplored area. Demonstrating both LLM shortcomings and downstream gains from relation-aware modeling would supply a concrete foundation for improving legal NLP systems and could influence schema design in other low-resource legal domains.

major comments (3)
  1. [Annotation Process] The annotation section provides no inter-annotator agreement statistics (e.g., Cohen’s kappa or Fleiss’ kappa) or details on how disagreements were resolved. Without these quantitative measures, the claim that current LLMs exhibit “significant limitations” cannot be confidently attributed to model shortcomings rather than annotation noise or schema ambiguity.
  2. [Benchmark Construction and Experiments] The benchmark construction and evaluation sections omit coverage statistics across civil-case subtypes, data-split descriptions, and the precise definition of the evaluation metrics (e.g., exact matching criteria for arguments). These omissions prevent assessment of whether the reported LLM failures generalize or are artifacts of incomplete schema coverage.
  3. [Downstream Task Evaluation] In the downstream-task experiments, the manuscript does not specify how legal-relation information is injected (e.g., prompt template, fine-tuning objective) or report statistical significance tests against strong baselines. Consequently, the “promising performance gains” cannot be evaluated as robust evidence for the utility of the schema.
minor comments (2)
  1. [Schema Definition] The hierarchical taxonomy diagram (Figure 1) would benefit from explicit edge labels indicating parent–child relation types to improve readability.
  2. [Related Work] A small number of citations to prior Chinese legal NLP benchmarks (e.g., CAIL, JEC-QA) are missing from the related-work section; adding them would better situate LexRel’s novelty.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will incorporate the suggested improvements in the revised manuscript.

read point-by-point responses
  1. Referee: [Annotation Process] The annotation section provides no inter-annotator agreement statistics (e.g., Cohen’s kappa or Fleiss’ kappa) or details on how disagreements were resolved. Without these quantitative measures, the claim that current LLMs exhibit “significant limitations” cannot be confidently attributed to model shortcomings rather than annotation noise or schema ambiguity.

    Authors: We agree that quantitative measures of annotation quality are necessary. The LexRel annotations were performed by two legal experts with extensive experience in Chinese civil cases; all disagreements were resolved through iterative discussion and consensus. We will add Cohen’s kappa statistics along with a description of the resolution process in the revised annotation section. This will strengthen the attribution of LLM limitations to model capabilities rather than annotation issues. revision: yes

  2. Referee: [Benchmark Construction and Experiments] The benchmark construction and evaluation sections omit coverage statistics across civil-case subtypes, data-split descriptions, and the precise definition of the evaluation metrics (e.g., exact matching criteria for arguments). These omissions prevent assessment of whether the reported LLM failures generalize or are artifacts of incomplete schema coverage.

    Authors: We acknowledge these omissions limit reproducibility. In the revision we will add: (1) coverage statistics showing relation distribution across civil-case subtypes (contract, tort, family, etc.), (2) explicit train/validation/test split descriptions including sizes and any stratification, and (3) precise metric definitions with exact matching criteria for arguments and relations. These additions will enable readers to assess generalization of the reported LLM shortcomings. revision: yes

  3. Referee: [Downstream Task Evaluation] In the downstream-task experiments, the manuscript does not specify how legal-relation information is injected (e.g., prompt template, fine-tuning objective) or report statistical significance tests against strong baselines. Consequently, the “promising performance gains” cannot be evaluated as robust evidence for the utility of the schema.

    Authors: We will clarify the injection methods by including the exact prompt templates and any fine-tuning objectives used to incorporate legal-relation information. We will also add statistical significance tests (paired t-tests or McNemar’s test) against the strong baselines to demonstrate that the observed gains are robust rather than due to variance. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark creation with independent annotation

full rationale

The paper's core contribution is the creation of a new hierarchical schema for Chinese civil legal relations followed by expert annotation of the LexRel benchmark and subsequent LLM evaluation plus downstream-task experiments. No equations, fitted parameters, or derivations are present that reduce any claimed result to the inputs by construction. The schema and annotations are presented as novel expert work rather than outputs of a prior model or self-referential fit; LLM performance numbers and downstream gains are measured against this externally annotated gold standard. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify the central claims. The work is therefore self-contained as an empirical resource paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the assumption that legal relations admit a stable hierarchical taxonomy that experts can annotate consistently; no free parameters or invented physical entities are involved.

axioms (1)
  • domain assumption Legal relations in Chinese civil cases can be organized into a stable hierarchical taxonomy with well-defined arguments.
    This underpins the schema creation and is presented as comprehensive without external validation cited.
invented entities (1)
  • LexRel benchmark dataset no independent evidence
    purpose: To evaluate LLMs on legal relation extraction and demonstrate downstream benefits
    Newly created expert-annotated resource introduced by the authors.

pith-pipeline@v0.9.0 · 5463 in / 1205 out tokens · 28373 ms · 2026-05-16T22:40:34.547734+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

  1. [1]

    Chatlaw: A multi-agent collaborative legal assistant with knowledge graph enhanced mixture-of-experts large language model, 2024

    ChatLaw: A multi-agent collaborative legal assistant with knowledge graph enhanced mixture- of-experts large language model.arXiv preprint arXiv:2306.16092. Yongfu Dai, Duanyu Feng, Jimin Huang, Haochen Jia, Qianqian Xie, Yifang Zhang, Weiguang Han, Wei Tian, and Hao Wang. 2023. LAiW: A Chinese legal large language models benchmark.arXiv preprint arXiv:23...

  2. [2]

    Disc-lawllm: Fine-tuning large language models for intelligent legal services.arXiv preprint arXiv:2309.11325, 2023

    Disc-LawLLM: Fine-tuning large language models for intelligent legal services.arXiv preprint arXiv:2309.11325. Yaowei Zheng, Richong Zhang, Junhao Zhang, Yan- han Ye, Zheyan Luo, Zhangchi Feng, and Yongqiang Ma. 2024. Llamafactory: Unified efficient fine- tuning of 100+ language models.arXiv preprint arXiv:2403.13372. A Taxonomy The complete content of ta...

  3. [3]

    4Our schema, dataset and code will be released upon ac- ceptance

    framework, and the SFT configuration files are also included. 4Our schema, dataset and code will be released upon ac- ceptance. Types Prompt Legal Relations of Personality Rights Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: The r...

  4. [7]

    Legal Relations of Status Rights Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

    Only one set needs to be output for the same subject and object. Legal Relations of Status Rights Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: The right holder themselves and all other subjects are the obligors (generally, multip...

  5. [11]

    Tortious Legal Relations Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

    Only one set needs to be output for the same subject and object. Tortious Legal Relations Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: The infringer is the subject that implements the infringement act, and the infringed party is ...

  6. [15]

    Only one set needs to be output for the same subject and object. Table 5: Definitions and prompt templates used inargument extractionforLegal Relations of Personality Rights, Legal Relations of Status RightsandTortious Legal Relationsare shown (English translation for the original Chinese text). Types Prompt Legal Relations of Intellectual Property Please...

  7. [17]

    Required fields: {‘subject’: ‘’, ‘object’: ‘’}

  8. [19]

    Legal Relations of Real Rights Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

    Only one set needs to be output for the same subject and object. Legal Relations of Real Rights Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: The right holder himself/herself and all subjects related to the rights and obligations ...

  9. [23]

    Legal Relations of Success Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

    Only one set needs to be output for the same subject and object. Legal Relations of Success Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: Heir and decedent. (Generally, both need to be output simultaneously). Object definition: Sp...

  10. [27]

    Only one set needs to be output for the same subject and object. Table 6: Definitions and prompt templates used inargument extractionforLegal Relations of Intellectual Property, Legal Relations of Real RightsandLegal Relations of Successare shown (English translation for the original Chinese text). Types Prompt Contractual Legal Relations Please extract t...

  11. [31]

    Legal Relation of Unjust Enrichment Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

    Only one set needs to be output for the same subject and object. Legal Relation of Unjust Enrichment Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: A beneficiary is the subject that gains benefits without legal basis, while a victi...

  12. [35]

    Legal Relation of Negotiorum Gestio Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

    Only one set needs to be output for the same subject and object. Legal Relation of Negotiorum Gestio Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: A manager (creditor) refers to a person who, without being entrusted or bound by le...

  13. [39]

    Only one set needs to be output for the same subject and object. Table 7: Definitions and prompt templates used inargument extractionforContractual Legal Relations,Legal Relation of Unjust EnrichmentandLegal Relation of Negotiorum Gestioare shown (English translation for the original Chinese text). Types Prompt Legal Relation of Letters of Credit Please e...

  14. [43]

    document

    Only one set needs to be output for the same subject and object. Legal Relation of Independent Guarantees Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: Issuer (bank/financial institution), beneficiary, applicant (can be an instruc...

  15. [47]

    underlying transaction contract

    Only one set needs to be output for the same subject and object. Legal Relation of Bills Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: A subject is specific and mainly includes: the drawer, the payee, the payer, the holder of the ...

  16. [48]

    Each set of results is presented as a standard JSON object

  17. [49]

    Required fields: {‘subject’: ‘’, ‘object’: ‘’, ‘content’: ‘’}

  18. [50]

    Each set is in a separate paragraph without numbering or sorting

  19. [51]

    Only one set needs to be output for the same subject and object. Table 8: Definitions and prompt templates used inargument extractionforLegal Relation of Letters of Credit,Legal Relation of Independent GuaranteesandLegal Relation of Billsare shown (English translation for the original Chinese text)