LexRel: Benchmarking Legal Relation Extraction for Chinese Civil Cases

Chenyang Li; Huiyuan Xie; Ranjuexiao Hu; Weixing Shen; Yida Cai; Yun Liu; Yuxiao Ye; Zhenghao Liu; Zhiyuan Liu

arxiv: 2512.12643 · v2 · submitted 2025-12-14 · 💻 cs.CL

LexRel: Benchmarking Legal Relation Extraction for Chinese Civil Cases

Yida Cai , Ranjuexiao Hu , Huiyuan Xie , Chenyang Li , Yun Liu , Yuxiao Ye , Zhenghao Liu , Weixing Shen

show 1 more author

Zhiyuan Liu

This is my paper

Pith reviewed 2026-05-16 22:40 UTC · model grok-4.3

classification 💻 cs.CL

keywords legal relation extractionChinese civil casesLLM benchmarkinglegal AIrelation extraction taskhierarchical taxonomyexpert annotation

0 comments

The pith

Large language models struggle to identify legal relations in Chinese civil cases, though using such relations improves downstream legal AI performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes a hierarchical schema for legal relations in Chinese civil cases, complete with argument definitions, and releases LexRel as an expert-annotated benchmark for the relation extraction task. Tests on state-of-the-art LLMs reveal clear weaknesses in their ability to correctly identify these relations. The authors also find that supplying models with legal relation details produces clear gains on other legal AI tasks. Readers should care because legal relations form a key structure for understanding civil disputes, yet AI tools have had no reliable way to extract them until now. The benchmark provides a standard testbed to drive progress in this area.

Core claim

The central discovery is a new schema and benchmark called LexRel for legal relation extraction in Chinese civil cases. Current LLMs perform poorly on the task of identifying these relations accurately. Explicitly incorporating legal relation information leads to performance improvements on downstream legal AI tasks.

What carries the argument

The LexRel benchmark, which rests on a hierarchical taxonomy of legal relations together with explicit definitions of their arguments.

If this is right

Improved accuracy in legal relation extraction will support more reliable dispute resolution systems.
Downstream legal AI tasks will see performance gains from explicit relation information.
The benchmark allows systematic development and comparison of new extraction methods.
Legal AI models can become more aligned with structured legal analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The schema could be extended to criminal or other legal domains for broader benchmarks.
LLMs might require domain-specific fine-tuning or external knowledge to close the performance gap.
Integration with existing legal ontologies could enhance the utility of the extracted relations.
Real-world deployment would need testing on diverse case types beyond the benchmark's coverage.

Load-bearing premise

The chosen schema and annotations provide a complete and unbiased representation of legal relations across Chinese civil cases.

What would settle it

Finding a large set of real Chinese civil cases where the annotated relations miss key elements present in expert legal analysis, or an LLM achieving high accuracy on LexRel without relation-specific training.

Figures

Figures reproduced from arXiv: 2512.12643 by Chenyang Li, Huiyuan Xie, Ranjuexiao Hu, Weixing Shen, Yida Cai, Yun Liu, Yuxiao Ye, Zhenghao Liu, Zhiyuan Liu.

**Figure 1.** Figure 1: The construction workflow of LexRel is shown (English translation of the original Chinese text). The type extraction task involves extracting types from factual text by referencing taxonomy, while the argument extraction task involves extracting arguments from factual text and types by referencing definition of arguments. The red and blue annotations denote subject and object, as well as information that m… view at source ↗

**Figure 2.** Figure 2: (A) and (B) show the distributions of top 100 most frequent legal relation types in [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Pareto distribution of causes of action, demon [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Legal relations serve as an important analytical framework for dispute resolution in civil cases. However, legal relations in Chinese civil cases remain underexplored in the field of legal AI, largely due to the absence of comprehensive schemas. In this work, we first introduce a comprehensive schema for legal relations in civil cases, which contains a hierarchical taxonomy and definitions of arguments. Based on this schema, we formulate a legal relation extraction task and present LexRel, an expert-annotated benchmark for legal relation extraction in the Chinese civil law domain. We use LexRel to evaluate state-of-the-art large language models (LLMs) on legal relation extraction, showing that current LLMs exhibit significant limitations in accurately identifying civil legal relations. Furthermore, we demonstrate that explicitly incorporating information about legal relations leads to promising performance gains on other downstream legal AI tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LexRel gives a new hierarchical schema and expert-annotated benchmark for legal relation extraction in Chinese civil cases, but the claims about LLM limits rest on unshown annotation details.

read the letter

The main point is that this paper introduces LexRel, a benchmark dataset for extracting legal relations from Chinese civil cases, along with a hierarchical taxonomy and argument definitions created with expert input. They evaluate several LLMs on the task, report that the models struggle, and then show that feeding relation information into other legal AI tasks produces gains. That combination is new for this specific domain. The focus on Chinese civil law fills a gap that prior legal NLP work has not addressed with this level of structure, and the downstream experiment is a direct way to show why the relations might matter in practice. The paper does a clean job of turning the schema into a concrete extraction task and running the obvious LLM baselines. The soft spot is the missing quantitative checks on the annotation itself. There are no inter-annotator agreement numbers, no coverage statistics across case subtypes, and no explicit test that the taxonomy captures the full range of relations that appear in real civil disputes. Without those, the reported LLM limitations could partly reflect annotation noise or gaps rather than model shortcomings, and the downstream gains could be tied to the same unverified schema. If the full paper supplies those figures and they are reasonable, the central claims strengthen considerably. This is for researchers working on legal information extraction or domain-specific benchmarks, especially anyone who needs structured data for Chinese or civil-law settings. A reader who wants a new test set for relation extraction would get concrete value from it. I would send it for peer review because the schema and dataset are substantive contributions that deserve checking, even if the evaluation needs tighter validation on the annotation side.

Referee Report

3 major / 2 minor

Summary. The paper introduces a hierarchical taxonomy and argument definitions for legal relations in Chinese civil cases, formulates a relation extraction task, and releases LexRel, an expert-annotated benchmark dataset. It evaluates state-of-the-art LLMs on this task, reports significant limitations in their performance, and shows that explicitly incorporating legal-relation information yields performance gains on other downstream legal AI tasks.

Significance. If the benchmark proves reliable, the work provides the first comprehensive schema and gold-standard dataset for legal relation extraction in Chinese civil law, a previously underexplored area. Demonstrating both LLM shortcomings and downstream gains from relation-aware modeling would supply a concrete foundation for improving legal NLP systems and could influence schema design in other low-resource legal domains.

major comments (3)

[Annotation Process] The annotation section provides no inter-annotator agreement statistics (e.g., Cohen’s kappa or Fleiss’ kappa) or details on how disagreements were resolved. Without these quantitative measures, the claim that current LLMs exhibit “significant limitations” cannot be confidently attributed to model shortcomings rather than annotation noise or schema ambiguity.
[Benchmark Construction and Experiments] The benchmark construction and evaluation sections omit coverage statistics across civil-case subtypes, data-split descriptions, and the precise definition of the evaluation metrics (e.g., exact matching criteria for arguments). These omissions prevent assessment of whether the reported LLM failures generalize or are artifacts of incomplete schema coverage.
[Downstream Task Evaluation] In the downstream-task experiments, the manuscript does not specify how legal-relation information is injected (e.g., prompt template, fine-tuning objective) or report statistical significance tests against strong baselines. Consequently, the “promising performance gains” cannot be evaluated as robust evidence for the utility of the schema.

minor comments (2)

[Schema Definition] The hierarchical taxonomy diagram (Figure 1) would benefit from explicit edge labels indicating parent–child relation types to improve readability.
[Related Work] A small number of citations to prior Chinese legal NLP benchmarks (e.g., CAIL, JEC-QA) are missing from the related-work section; adding them would better situate LexRel’s novelty.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will incorporate the suggested improvements in the revised manuscript.

read point-by-point responses

Referee: [Annotation Process] The annotation section provides no inter-annotator agreement statistics (e.g., Cohen’s kappa or Fleiss’ kappa) or details on how disagreements were resolved. Without these quantitative measures, the claim that current LLMs exhibit “significant limitations” cannot be confidently attributed to model shortcomings rather than annotation noise or schema ambiguity.

Authors: We agree that quantitative measures of annotation quality are necessary. The LexRel annotations were performed by two legal experts with extensive experience in Chinese civil cases; all disagreements were resolved through iterative discussion and consensus. We will add Cohen’s kappa statistics along with a description of the resolution process in the revised annotation section. This will strengthen the attribution of LLM limitations to model capabilities rather than annotation issues. revision: yes
Referee: [Benchmark Construction and Experiments] The benchmark construction and evaluation sections omit coverage statistics across civil-case subtypes, data-split descriptions, and the precise definition of the evaluation metrics (e.g., exact matching criteria for arguments). These omissions prevent assessment of whether the reported LLM failures generalize or are artifacts of incomplete schema coverage.

Authors: We acknowledge these omissions limit reproducibility. In the revision we will add: (1) coverage statistics showing relation distribution across civil-case subtypes (contract, tort, family, etc.), (2) explicit train/validation/test split descriptions including sizes and any stratification, and (3) precise metric definitions with exact matching criteria for arguments and relations. These additions will enable readers to assess generalization of the reported LLM shortcomings. revision: yes
Referee: [Downstream Task Evaluation] In the downstream-task experiments, the manuscript does not specify how legal-relation information is injected (e.g., prompt template, fine-tuning objective) or report statistical significance tests against strong baselines. Consequently, the “promising performance gains” cannot be evaluated as robust evidence for the utility of the schema.

Authors: We will clarify the injection methods by including the exact prompt templates and any fine-tuning objectives used to incorporate legal-relation information. We will also add statistical significance tests (paired t-tests or McNemar’s test) against the strong baselines to demonstrate that the observed gains are robust rather than due to variance. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark creation with independent annotation

full rationale

The paper's core contribution is the creation of a new hierarchical schema for Chinese civil legal relations followed by expert annotation of the LexRel benchmark and subsequent LLM evaluation plus downstream-task experiments. No equations, fitted parameters, or derivations are present that reduce any claimed result to the inputs by construction. The schema and annotations are presented as novel expert work rather than outputs of a prior model or self-referential fit; LLM performance numbers and downstream gains are measured against this externally annotated gold standard. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify the central claims. The work is therefore self-contained as an empirical resource paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the assumption that legal relations admit a stable hierarchical taxonomy that experts can annotate consistently; no free parameters or invented physical entities are involved.

axioms (1)

domain assumption Legal relations in Chinese civil cases can be organized into a stable hierarchical taxonomy with well-defined arguments.
This underpins the schema creation and is presented as comprehensive without external validation cited.

invented entities (1)

LexRel benchmark dataset no independent evidence
purpose: To evaluate LLMs on legal relation extraction and demonstrate downstream benefits
Newly created expert-annotated resource introduced by the authors.

pith-pipeline@v0.9.0 · 5463 in / 1205 out tokens · 28373 ms · 2026-05-16T22:40:34.547734+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Chatlaw: A multi-agent collaborative legal assistant with knowledge graph enhanced mixture-of-experts large language model, 2024

ChatLaw: A multi-agent collaborative legal assistant with knowledge graph enhanced mixture- of-experts large language model.arXiv preprint arXiv:2306.16092. Yongfu Dai, Duanyu Feng, Jimin Huang, Haochen Jia, Qianqian Xie, Yifang Zhang, Weiguang Han, Wei Tian, and Hao Wang. 2023. LAiW: A Chinese legal large language models benchmark.arXiv preprint arXiv:23...

work page arXiv 2023
[2]

Disc-lawllm: Fine-tuning large language models for intelligent legal services.arXiv preprint arXiv:2309.11325, 2023

Disc-LawLLM: Fine-tuning large language models for intelligent legal services.arXiv preprint arXiv:2309.11325. Yaowei Zheng, Richong Zhang, Junhao Zhang, Yan- han Ye, Zheyan Luo, Zhangchi Feng, and Yongqiang Ma. 2024. Llamafactory: Unified efficient fine- tuning of 100+ language models.arXiv preprint arXiv:2403.13372. A Taxonomy The complete content of ta...

work page arXiv 2024
[3]

4Our schema, dataset and code will be released upon ac- ceptance

framework, and the SFT configuration files are also included. 4Our schema, dataset and code will be released upon ac- ceptance. Types Prompt Legal Relations of Personality Rights Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: The r...

work page
[7]

Legal Relations of Status Rights Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

Only one set needs to be output for the same subject and object. Legal Relations of Status Rights Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: The right holder themselves and all other subjects are the obligors (generally, multip...

work page
[11]

Tortious Legal Relations Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

Only one set needs to be output for the same subject and object. Tortious Legal Relations Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: The infringer is the subject that implements the infringement act, and the infringed party is ...

work page
[15]

Only one set needs to be output for the same subject and object. Table 5: Definitions and prompt templates used inargument extractionforLegal Relations of Personality Rights, Legal Relations of Status RightsandTortious Legal Relationsare shown (English translation for the original Chinese text). Types Prompt Legal Relations of Intellectual Property Please...

work page
[17]

Required fields: {‘subject’: ‘’, ‘object’: ‘’}

work page
[19]

Legal Relations of Real Rights Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

Only one set needs to be output for the same subject and object. Legal Relations of Real Rights Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: The right holder himself/herself and all subjects related to the rights and obligations ...

work page
[23]

Legal Relations of Success Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

Only one set needs to be output for the same subject and object. Legal Relations of Success Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: Heir and decedent. (Generally, both need to be output simultaneously). Object definition: Sp...

work page
[27]

Only one set needs to be output for the same subject and object. Table 6: Definitions and prompt templates used inargument extractionforLegal Relations of Intellectual Property, Legal Relations of Real RightsandLegal Relations of Successare shown (English translation for the original Chinese text). Types Prompt Contractual Legal Relations Please extract t...

work page
[31]

Legal Relation of Unjust Enrichment Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

Only one set needs to be output for the same subject and object. Legal Relation of Unjust Enrichment Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: A beneficiary is the subject that gains benefits without legal basis, while a victi...

work page
[35]

Legal Relation of Negotiorum Gestio Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

Only one set needs to be output for the same subject and object. Legal Relation of Negotiorum Gestio Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: A manager (creditor) refers to a person who, without being entrusted or bound by le...

work page
[39]

Only one set needs to be output for the same subject and object. Table 7: Definitions and prompt templates used inargument extractionforContractual Legal Relations,Legal Relation of Unjust EnrichmentandLegal Relation of Negotiorum Gestioare shown (English translation for the original Chinese text). Types Prompt Legal Relation of Letters of Credit Please e...

work page
[43]

document

Only one set needs to be output for the same subject and object. Legal Relation of Independent Guarantees Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: Issuer (bank/financial institution), beneficiary, applicant (can be an instruc...

work page
[47]

underlying transaction contract

Only one set needs to be output for the same subject and object. Legal Relation of Bills Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: A subject is specific and mainly includes: the drawer, the payee, the payer, the holder of the ...

work page
[48]

Each set of results is presented as a standard JSON object

work page
[49]

Required fields: {‘subject’: ‘’, ‘object’: ‘’, ‘content’: ‘’}

work page
[50]

Each set is in a separate paragraph without numbering or sorting

work page
[51]

Only one set needs to be output for the same subject and object. Table 8: Definitions and prompt templates used inargument extractionforLegal Relation of Letters of Credit,Legal Relation of Independent GuaranteesandLegal Relation of Billsare shown (English translation for the original Chinese text)

work page

[1] [1]

Chatlaw: A multi-agent collaborative legal assistant with knowledge graph enhanced mixture-of-experts large language model, 2024

ChatLaw: A multi-agent collaborative legal assistant with knowledge graph enhanced mixture- of-experts large language model.arXiv preprint arXiv:2306.16092. Yongfu Dai, Duanyu Feng, Jimin Huang, Haochen Jia, Qianqian Xie, Yifang Zhang, Weiguang Han, Wei Tian, and Hao Wang. 2023. LAiW: A Chinese legal large language models benchmark.arXiv preprint arXiv:23...

work page arXiv 2023

[2] [2]

Disc-lawllm: Fine-tuning large language models for intelligent legal services.arXiv preprint arXiv:2309.11325, 2023

Disc-LawLLM: Fine-tuning large language models for intelligent legal services.arXiv preprint arXiv:2309.11325. Yaowei Zheng, Richong Zhang, Junhao Zhang, Yan- han Ye, Zheyan Luo, Zhangchi Feng, and Yongqiang Ma. 2024. Llamafactory: Unified efficient fine- tuning of 100+ language models.arXiv preprint arXiv:2403.13372. A Taxonomy The complete content of ta...

work page arXiv 2024

[3] [3]

4Our schema, dataset and code will be released upon ac- ceptance

framework, and the SFT configuration files are also included. 4Our schema, dataset and code will be released upon ac- ceptance. Types Prompt Legal Relations of Personality Rights Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: The r...

work page

[4] [7]

Legal Relations of Status Rights Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

Only one set needs to be output for the same subject and object. Legal Relations of Status Rights Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: The right holder themselves and all other subjects are the obligors (generally, multip...

work page

[5] [11]

Tortious Legal Relations Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

Only one set needs to be output for the same subject and object. Tortious Legal Relations Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: The infringer is the subject that implements the infringement act, and the infringed party is ...

work page

[6] [15]

Only one set needs to be output for the same subject and object. Table 5: Definitions and prompt templates used inargument extractionforLegal Relations of Personality Rights, Legal Relations of Status RightsandTortious Legal Relationsare shown (English translation for the original Chinese text). Types Prompt Legal Relations of Intellectual Property Please...

work page

[7] [17]

Required fields: {‘subject’: ‘’, ‘object’: ‘’}

work page

[8] [19]

Legal Relations of Real Rights Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

Only one set needs to be output for the same subject and object. Legal Relations of Real Rights Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: The right holder himself/herself and all subjects related to the rights and obligations ...

work page

[9] [23]

Legal Relations of Success Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

Only one set needs to be output for the same subject and object. Legal Relations of Success Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: Heir and decedent. (Generally, both need to be output simultaneously). Object definition: Sp...

work page

[10] [27]

Only one set needs to be output for the same subject and object. Table 6: Definitions and prompt templates used inargument extractionforLegal Relations of Intellectual Property, Legal Relations of Real RightsandLegal Relations of Successare shown (English translation for the original Chinese text). Types Prompt Contractual Legal Relations Please extract t...

work page

[11] [31]

Legal Relation of Unjust Enrichment Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

Only one set needs to be output for the same subject and object. Legal Relation of Unjust Enrichment Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: A beneficiary is the subject that gains benefits without legal basis, while a victi...

work page

[12] [35]

Legal Relation of Negotiorum Gestio Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description

Only one set needs to be output for the same subject and object. Legal Relation of Negotiorum Gestio Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: A manager (creditor) refers to a person who, without being entrusted or bound by le...

work page

[13] [39]

Only one set needs to be output for the same subject and object. Table 7: Definitions and prompt templates used inargument extractionforContractual Legal Relations,Legal Relation of Unjust EnrichmentandLegal Relation of Negotiorum Gestioare shown (English translation for the original Chinese text). Types Prompt Legal Relation of Letters of Credit Please e...

work page

[14] [43]

document

Only one set needs to be output for the same subject and object. Legal Relation of Independent Guarantees Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: Issuer (bank/financial institution), beneficiary, applicant (can be an instruc...

work page

[15] [47]

underlying transaction contract

Only one set needs to be output for the same subject and object. Legal Relation of Bills Please extract the subject, object, and content of{relation type}from the given judgment documents without explanation or description. Factual text: {}. Subject definition: A subject is specific and mainly includes: the drawer, the payee, the payer, the holder of the ...

work page

[16] [48]

Each set of results is presented as a standard JSON object

work page

[17] [49]

Required fields: {‘subject’: ‘’, ‘object’: ‘’, ‘content’: ‘’}

work page

[18] [50]

Each set is in a separate paragraph without numbering or sorting

work page

[19] [51]

Only one set needs to be output for the same subject and object. Table 8: Definitions and prompt templates used inargument extractionforLegal Relation of Letters of Credit,Legal Relation of Independent GuaranteesandLegal Relation of Billsare shown (English translation for the original Chinese text)

work page