Civil Court Simulation with Large Language Models

Haitao Li; Kaiyuan Zhang; Qingyao Ai; Yifan Chen; Yiqun Liu; Yueyue Wu

arxiv: 2606.09632 · v1 · pith:IFIRI5KZnew · submitted 2026-06-08 · 💻 cs.CL

Civil Court Simulation with Large Language Models

Yifan Chen , Haitao Li , Kaiyuan Zhang , Yueyue Wu , Qingyao Ai , Yiqun Liu This is my paper

Pith reviewed 2026-06-27 16:43 UTC · model grok-4.3

classification 💻 cs.CL

keywords court simulationlarge language modelscivil litigationmulti-agent systemsChinese civil caseslegal judgmentstatute retrievalmemory module

0 comments

The pith

A multi-agent LLM framework organizes Chinese civil trials into five stages and produces reliable judgments on liability and remedies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a multi-agent simulation system that lets large language models stand in for the various parties and the court in Chinese civil litigation. It structures their interactions around the standard five-stage trial sequence while adding memory storage and statute lookup to keep long proceedings coherent. Civil cases are harder to model than criminal ones because claims, fault shares, and remedies can vary widely, so the authors test whether the LLM setup can still generate consistent outcomes. Experiments indicate the system handles liability allocation and multi-item disputes particularly well, and that better memory improves the whole process. A separate five-layer analysis examines how legal knowledge, available information, judge-like capabilities, role pressures, and social factors shape the simulation's behavior.

Core claim

The central claim is that a multi-agent court simulation framework for Chinese civil cases, organized through a five-stage civil trial procedure and integrating memory module and statute retrieval, produces reliable civil judgments with clear strengths in liability allocation and multi-item adjudication.

What carries the argument

Multi-agent LLM role interactions structured by a five-stage civil trial procedure, supported by a memory module and statute retrieval.

If this is right

The framework allows scalable simulation of civil litigation for training and practice where human participants are costly.
Memory quality directly determines the reliability of downstream judgments in long-running cases.
A five-layer factor model can diagnose how legal grounding, information access, judicial capability, role pressures, and social context influence simulation outcomes.
The approach extends court simulation from criminal to civil matters by accommodating variable claims and remedies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be used to run many parallel simulations of the same facts to estimate outcome distributions under different judicial styles.
If the memory and retrieval components are strengthened, the system might serve as a testbed for how changes in evidence presentation affect final awards.
Similar role-based structures could be applied to other flexible decision domains such as regulatory negotiations or insurance disputes.

Load-bearing premise

The five-stage civil trial procedure plus LLM role interactions sufficiently capture the flexibility of real civil claims, liability, and remedies.

What would settle it

Blind expert comparison of the simulated judgments against actual civil case records, measuring agreement on liability shares and remedy amounts.

Figures

Figures reproduced from arXiv: 2606.09632 by Haitao Li, Kaiyuan Zhang, Qingyao Ai, Yifan Chen, Yiqun Liu, Yueyue Wu.

**Figure 1.** Figure 1: Overview of the multi-agent civil court simulation framework. The plaintiff acts as the party initiating the civil action. This role presents claims, explains factual grounds, supports requested remedies, and responds to the defendant’s challenges. The defendant acts as the responding party. This role contests the plaintiff’s evidence, challenges liability, presents favorable facts, and seeks to reduce or … view at source ↗

**Figure 2.** Figure 2: Five-layer factor framework for controlled analysis of civil court simulation. Legal-Case Entity Layer concerns the direct legal and factual basis of adjudication, including facts, evidence, statutes, judicial interpretations, and other legal materials. Interventions at this layer examine whether weakening legal or factual grounding changes simulation quality, such as removing statute retrieval or removing… view at source ↗

**Figure 3.** Figure 3: Example of role-based interaction in the simulated civil trial [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

Court simulation bridges legal education and judicial practice, yet human-based simulations are costly and difficult to scale. Large language models (LLMs) offer a scalable alternative, but existing court-simulation research mainly focuses on criminal cases. Civil litigation is more common in practice and harder to simulate because its claims, liability, and remedies are more flexible. We present a multi-agent court simulation framework for Chinese civil cases. The framework organizes role-based interaction through a five-stage civil trial procedure and integrates memory module and statute retrieval to support long-process adjudication. Experiments show that the framework produces reliable civil judgments, with clear strengths in liability allocation and multi-item adjudication. Further experiments show that memory quality substantially affects downstream simulation quality. Through a five-layer factor framework, we analyze how legal grounding, information conditions, judicial capability and role orientation, organizational pressure, and social context affect the framework's reliability and behavior. These results support the effectiveness of the proposed framework for civil court simulation. The dataset and code are available at: https://github.com/foggpoy/Civil-Court.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This extends LLM court simulation to Chinese civil cases with a five-stage procedure plus memory and retrieval, but the reliability claims rest on unevaluated experiments with no external validation shown.

read the letter

The paper's core move is taking the multi-agent LLM court simulation idea that has been tried on criminal cases and adapting it to civil litigation, which is more common and more variable in claims, liability splits, and remedies. They lay out a five-stage trial flow, add a memory module to handle longer interactions, and wire in statute retrieval for Chinese law. That combination is not in the prior work they cite, and they release the code and dataset, which lowers the barrier for anyone who wants to test or extend it.

The experiments are described as showing reliable outputs especially on liability allocation and multi-item cases, plus an analysis of how legal grounding, information conditions, and other factors affect behavior. Those are reasonable things to measure in a simulation paper.

The main gap is that the abstract gives no numbers, no baselines, no sample sizes, and no comparison to real judgments or human expert ratings. The claim that the outputs are "reliable" therefore cannot be checked from what is presented. If the evaluation is only internal consistency or LLM self-scoring, it does not establish legal usefulness. The stress-test concern about missing external validation looks accurate based on the text.

This is aimed at the legal-AI and simulation-for-education niche rather than a broad methods advance. Someone already working on multi-agent legal systems or Chinese law tech might want the framework details and the released repo. It is coherent on its own terms and shows clear thinking about the civil procedure differences, so it is worth a serious referee even if the current evidence is thin.

Referee Report

3 major / 1 minor

Summary. The manuscript presents a multi-agent LLM framework for simulating Chinese civil court cases. It structures interactions via a five-stage civil trial procedure, incorporates memory modules and statute retrieval, and reports that experiments demonstrate reliable civil judgments particularly in liability allocation and multi-item adjudication. A five-layer factor framework is used to analyze influences on reliability, and the dataset and code are made available.

Significance. Should the experimental claims hold under rigorous validation, the work offers a scalable alternative to human-based simulations for civil litigation, filling a gap since most prior work focuses on criminal cases. The public availability of code and data strengthens reproducibility.

major comments (3)

[Abstract] Abstract: the claim that 'experiments show that the framework produces reliable civil judgments, with clear strengths in liability allocation and multi-item adjudication' supplies no metrics, baselines, sample sizes, or validation against real judgments, so the central experimental claim cannot be assessed.
[Framework description] Framework description and experiments: no comparison to actual court records, blinded expert review, or inter-rater metrics with human judges is described, so the assertion that outputs are 'reliable' in a legal sense does not follow from internal consistency or statute retrieval alone.
[Framework description] Five-stage procedure: the assumption that this procedure plus LLM role interactions sufficiently captures the flexibility of real civil claims, liability, and remedies is not tested against external standards, which is load-bearing for the claim that the simulation is effective.

minor comments (1)

[Abstract] The abstract would benefit from a brief quantitative summary of the reported experimental outcomes (e.g., agreement rates or accuracy figures) to allow readers to gauge the strength of the results immediately.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback. We address each major comment point by point below, clarifying the basis for our experimental claims while agreeing to revisions where the manuscript can be strengthened.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'experiments show that the framework produces reliable civil judgments, with clear strengths in liability allocation and multi-item adjudication' supplies no metrics, baselines, sample sizes, or validation against real judgments, so the central experimental claim cannot be assessed.

Authors: We agree the abstract would benefit from greater specificity. The manuscript's experiments consist of memory quality ablation studies and analysis through the five-layer factor framework (legal grounding, information conditions, judicial capability and role orientation, organizational pressure, and social context). We will revise the abstract to summarize these experimental components, including the number of simulated cases and the observed effects on judgment quality. revision: yes
Referee: [Framework description] Framework description and experiments: no comparison to actual court records, blinded expert review, or inter-rater metrics with human judges is described, so the assertion that outputs are 'reliable' in a legal sense does not follow from internal consistency or statute retrieval alone.

Authors: The reliability claims rest on the internal five-layer factor analysis and the demonstrated impact of memory modules on simulation outcomes, together with statute retrieval. We acknowledge that this does not constitute external legal validation. We will add an explicit limitations paragraph stating that the evaluation is internal and that direct comparison to real court records or blinded expert review was not performed. revision: partial
Referee: [Framework description] Five-stage procedure: the assumption that this procedure plus LLM role interactions sufficiently captures the flexibility of real civil claims, liability, and remedies is not tested against external standards, which is load-bearing for the claim that the simulation is effective.

Authors: The five-stage structure follows the standard Chinese civil procedure code, and the experiments show differential performance across liability allocation and multi-item decisions under varying information and memory conditions. We will revise the relevant sections to state that effectiveness is evidenced by the factor framework results rather than by direct external benchmarking, and we will note external validation as future work. revision: partial

standing simulated objections not resolved

Direct comparison against actual court records or blinded expert review of judgments was not conducted.

Circularity Check

0 steps flagged

No circularity in framework description or experimental claims

full rationale

The paper describes a multi-agent LLM-based simulation framework for Chinese civil cases structured around a five-stage trial procedure, memory module, and statute retrieval, then reports experimental outputs on judgment reliability. No equations, parameter fits, self-definitional reductions, or load-bearing self-citations appear in the abstract or framework description that would make any result equivalent to its inputs by construction. The central claims rest on direct experimental outputs rather than renamed priors or imported uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that LLMs can faithfully role-play judicial actors and that the five-stage template plus memory/retrieval suffice to model civil litigation flexibility; no free parameters or invented entities are stated in the abstract.

axioms (1)

domain assumption LLMs can simulate the flexible claims, liability determinations, and remedies of civil litigation through role-based multi-agent interaction
Invoked as the basis for the entire simulation framework and reliability claims.

pith-pipeline@v0.9.1-grok · 5721 in / 1157 out tokens · 17717 ms · 2026-06-27T16:43:36.326543+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 8 canonical work pages

[1]

Law & Society Review50(3), 703–732 (2016)

Black, R.C., Owens, R.J., Wedeking, J., Wohlfarth, P.C.: The influence of public sentiment on supreme court opinion clarity. Law & Society Review50(3), 703–732 (2016)

2016
[2]

In: Strategy on the United States Supreme Court, pp

Brenner, S., Whitmeyer, J.M.: The legal model. In: Strategy on the United States Supreme Court, pp. 3–10. Cambridge University Press, Cambridge (2009)

2009
[3]

International Review of Law and Economics76, 106171 (2023)

Chang, Y.C., Chen, K.P., Liao, J.C., Lin, C.C.: Ask more, awarded more: Evidence from taiwan’s courts. International Review of Law and Economics76, 106171 (2023). https://doi.org/10.1016/j.irle.2023.106171, https://www.sciencedirect.com/science/article/pii/S0144818823000492

work page doi:10.1016/j.irle.2023.106171 2023
[4]

Chen, G., Fan, L., Gong, Z., Xie, N., Li, Z., Liu, Z., Li, C., Qu, Q., Alinejad- Rokny, H., Ni, S., Yang, M.: Agentcourt: Simulating court with adversarial evolv- able lawyer agents (2025), https://arxiv.org/abs/2408.08089

arXiv 2025
[5]

Chen, J., Li, H., Qin, M., Zhou, Y., Ren, Y., Wang, W., Liu, Y., Wu, Y., Ai, Q.: Simulating dispute mediation with llm-based agents for legal research (2025), https://arxiv.org/abs/2509.06586

arXiv 2025
[6]

Administration & Society55(5), 921–952 (2023)

Colaux, É., Schiffino, N., Moyson, S.: Neither the magic bullet nor the big bad wolf: A systematic review of frontline judges’ attitudes and coping re- garding managerialization. Administration & Society55(5), 921–952 (2023). https://doi.org/10.1177/00953997231157748

work page doi:10.1177/00953997231157748 2023
[7]

Chen et al

DeepSeek-AI, Liu, A., Mei, A., Lin, B., Xue, B., Wang, B., Xu, B., Wu, B., Zhang, B., Lin, C., Dong, C., Lu, C., Zhao, C., Deng, C., Xu, C., Ruan, C., Dai, D., Guo, D., Yang, D., Chen, D., Li, E., Zhou, F., Lin, F., Dai, F., Hao, G., Chen, G., Li, G., Zhang, H., Xu, H., Li, H., Liang, H., Wei, H., Zhang, H., Luo, H., Ji, H., Ding, H., Tang, H., Cao, H., G...

Pith/arXiv arXiv 2025
[8]

Out of Bounds

Fay, S.A.: “Out of Bounds”: The Influence of Personal and Institutionalized Bounded Rationality on Judicial Decision Making. In: Using Organizational The- ory to Study, Explain, and Understand Criminal Legal Organizations, pp. 17–33. SpringerNatureSwitzerland(2024).https://doi.org/10.1007/978-3-031-66285-0_2

work page doi:10.1007/978-3-031-66285-0_2 2024
[9]

GLM-5-Team, :, Zeng, A., Lv, X., Hou, Z., Du, Z., Zheng, Q., Chen, B., Yin, D., Ge, C., Huang, C., Xie, C., Zhu, C., Yin, C., Wang, C., Pan, G., Zeng, H., Zhang, H., Wang, H., Chen, H., Zhang, J., Jiao, J., Guo, J., Wang, J., Du, J., Wu, J., Wang, K., Li, L., Fan, L., Zhong, L., Liu, M., Zhao, M., Du, P., Dong, Q., Lu, R., Shuang-Li, Cao, S., Liu, S., Jia...

Pith/arXiv arXiv 2026
[10]

The Innovation 7(6), 101253 (2026) https://doi.org/10.1016/j.xinn.2025.101253

Gu, J., Jiang, X., Shi, Z., Tan, H., Zhai, X., Xu, C., Li, W., Shen, Y., Ma, S., Liu, H., Wang, S., Zhang, K., Lin, Z., Zhang, B., Ni, L., Gao, W., Wang, Y., Guo, J.: A survey on llm-as-a-judge. The In- novation p. 101253 (2026). https://doi.org/10.1016/j.xinn.2025.101253, https://www.sciencedirect.com/science/article/pii/S2666675825004564

work page doi:10.1016/j.xinn.2025.101253 2026
[11]

Lai, J., Gan, W., Wu, J., Qi, Z., Yu, P.S.: Large language models in law: A survey (2023), https://arxiv.org/abs/2312.03718

arXiv 2023
[12]

In: Proceedings of the 34th Interna- Civil Court Simulation with Large Language Models 15 tional Conference on Neural Information Processing Systems

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge-intensive nlp tasks. In: Proceedings of the 34th Interna- Civil Court Simulation with Large Language Models 15 tional Conference on Neural Information Processing Sy...

2020
[13]

Li, D., Jiang, B., Huang, L., Beigi, A., Zhao, C., Tan, Z., Bhattacharjee, A., Jiang, Y., Chen, C., Wu, T., Shu, K., Cheng, L., Liu, H.: From gen- eration to judgment: Opportunities and challenges of llm-as-a-judge (2025), https://arxiv.org/abs/2411.16594

arXiv 2025
[14]

2025 , isbn =

Li, H., Chen, Y., YiRan, H., Ai, Q., Chen, J., Yang, X., Yang, J., Wu, Y., Liu, Z., Liu, Y.: Lexrag: Benchmarking retrieval-augmented generation in multi-turn legal consultation conversation. In: Proceedings of the 48th In- ternational ACM SIGIR Conference on Research and Development in Infor- mation Retrieval. p. 3606–3615. SIGIR ’25, Association for Com...

work page doi:10.1145/3726302.3730340 2025
[15]

In: Proceedings of the 47th In- ternational ACM SIGIR Conference on Research and Development in Infor- mation Retrieval

Li, H., Shao, Y., Wu, Y., Ai, Q., Ma, Y., Liu, Y.: Lecardv2: A large- scale chinese legal case retrieval dataset. In: Proceedings of the 47th In- ternational ACM SIGIR Conference on Research and Development in Infor- mation Retrieval. p. 2251–2260. SIGIR ’24, Association for Computing Ma- chinery, New York, NY, USA (2024). https://doi.org/10.1145/3626772....

work page doi:10.1145/3626772.3657887 2024
[16]

Li, H., Ye, J., Hu, Y., Chen, J., Ai, Q., Wu, Y., Chen, J., Chen, Y., Luo, C., Zhou, Q., Liu, Y.: Casegen: A benchmark for multi-stage legal case documents generation (2025), https://arxiv.org/abs/2502.17943

arXiv 2025
[17]

Luo, J., Zhang, W., Yuan, Y., Zhao, Y., Yang, J., Gu, Y., Wu, B., Chen, B., Qiao, Z., Long, Q., Tu, R., Luo, X., Ju, W., Xiao, Z., Wang, Y., Xiao, M., Liu, C., Yuan, J., Zhang, S., Jin, Y., Zhang, F., Wu, X., Zhao, H., Tao, D., Yu, P.S., Zhang, M.: Large language model agent: A survey on methodology, applications and challenges (2025), https://arxiv.org/a...

Pith/arXiv arXiv 2025
[18]

Generative agents: Interactive simulacra of human behavior,

Park, J.S., O’Brien, J., Cai, C.J., Morris, M.R., Liang, P., Bernstein, M.S.: Generative agents: Interactive simulacra of human behavior. In: Pro- ceedings of the 36th Annual ACM Symposium on User Interface Soft- ware and Technology. UIST ’23, Association for Computing Machin- ery, New York, NY, USA (2023). https://doi.org/10.1145/3586183.3606763, https:/...

work page doi:10.1145/3586183.3606763 2023
[19]

https://qwen.ai/blog?id=qwen3.5 (2026)

Qwen Team: Qwen3.5: Towards native multimodal agents. https://qwen.ai/blog?id=qwen3.5 (2026)

2026
[20]

Indiana Law Journal90(2), 695–739 (2015), https://www.repository.law.indiana.edu/ilj/vol90/iss2/6

Rachlinski, J.J., Wistrich, A.J., Guthrie, C.: Can judges make reliable numeric judgments? distorted damages and skewed sentences. Indiana Law Journal90(2), 695–739 (2015), https://www.repository.law.indiana.edu/ilj/vol90/iss2/6

2015
[21]

Chen et al

Team, K., Bai, T., Bai, Y., Bao, Y., Cai, S.H., Cao, Y., Charles, Y., Che, H.S., Chen, C., Chen, G., Chen, H., Chen, J., Chen, J., Chen, J., Chen, J., Chen, K., Chen, L., Chen, R., Chen, X., Chen, Y., Chen, Y., Chen, Y., Chen, Y., Chen, Y., Chen, Y., Chen, Y., Chen, Y., Chen, Z., Chen, Z., Cheng, D., Chu, M., Cui, J., Deng, J., Diao, M., Ding, H., Dong, M...

Pith/arXiv arXiv 2026
[22]

Decision12(3), 246–267 (2025)

Wojciechowski, B.W., White, L.C., Allefeld, C., Pothos, E.M.: Order effects and the evaluation bias in legal decision making. Decision12(3), 246–267 (2025). https://doi.org/10.1037/dec0000263

work page doi:10.1037/dec0000263 2025
[23]

Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., Awadallah, A.H., White, R.W., Burger, D., Wang, C.: Au- togen: Enabling next-gen llm applications via multi-agent conversation (2023), https://arxiv.org/abs/2308.08155

Pith/arXiv arXiv 2023
[24]

Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., Zheng, C., Liu, D., Zhou, F., Huang, F., Hu, F., Ge, H., Wei, H., Lin, H., Tang, J., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Zhou, J., Lin, J., Dang, K., Bao, K., Yang, K., Yu, L., Deng, L., Li, M., Xue, M., Li, M., Zhang, P., Wang, P., Zhu, Q...

Pith/arXiv arXiv 2025
[25]

Zhang, K., Li, J., Wu, Y., Li, H., Luo, C., Zou, S., Zhou, Y., Su, W., Ai, Q., Liu, Y.: Chinese court simulation with llm-based agent system (2025), https://arxiv.org/abs/2508.17322

arXiv 2025

[1] [1]

Law & Society Review50(3), 703–732 (2016)

Black, R.C., Owens, R.J., Wedeking, J., Wohlfarth, P.C.: The influence of public sentiment on supreme court opinion clarity. Law & Society Review50(3), 703–732 (2016)

2016

[2] [2]

In: Strategy on the United States Supreme Court, pp

Brenner, S., Whitmeyer, J.M.: The legal model. In: Strategy on the United States Supreme Court, pp. 3–10. Cambridge University Press, Cambridge (2009)

2009

[3] [3]

International Review of Law and Economics76, 106171 (2023)

Chang, Y.C., Chen, K.P., Liao, J.C., Lin, C.C.: Ask more, awarded more: Evidence from taiwan’s courts. International Review of Law and Economics76, 106171 (2023). https://doi.org/10.1016/j.irle.2023.106171, https://www.sciencedirect.com/science/article/pii/S0144818823000492

work page doi:10.1016/j.irle.2023.106171 2023

[4] [4]

Chen, G., Fan, L., Gong, Z., Xie, N., Li, Z., Liu, Z., Li, C., Qu, Q., Alinejad- Rokny, H., Ni, S., Yang, M.: Agentcourt: Simulating court with adversarial evolv- able lawyer agents (2025), https://arxiv.org/abs/2408.08089

arXiv 2025

[5] [5]

Chen, J., Li, H., Qin, M., Zhou, Y., Ren, Y., Wang, W., Liu, Y., Wu, Y., Ai, Q.: Simulating dispute mediation with llm-based agents for legal research (2025), https://arxiv.org/abs/2509.06586

arXiv 2025

[6] [6]

Administration & Society55(5), 921–952 (2023)

Colaux, É., Schiffino, N., Moyson, S.: Neither the magic bullet nor the big bad wolf: A systematic review of frontline judges’ attitudes and coping re- garding managerialization. Administration & Society55(5), 921–952 (2023). https://doi.org/10.1177/00953997231157748

work page doi:10.1177/00953997231157748 2023

[7] [7]

Chen et al

DeepSeek-AI, Liu, A., Mei, A., Lin, B., Xue, B., Wang, B., Xu, B., Wu, B., Zhang, B., Lin, C., Dong, C., Lu, C., Zhao, C., Deng, C., Xu, C., Ruan, C., Dai, D., Guo, D., Yang, D., Chen, D., Li, E., Zhou, F., Lin, F., Dai, F., Hao, G., Chen, G., Li, G., Zhang, H., Xu, H., Li, H., Liang, H., Wei, H., Zhang, H., Luo, H., Ji, H., Ding, H., Tang, H., Cao, H., G...

Pith/arXiv arXiv 2025

[8] [8]

Out of Bounds

Fay, S.A.: “Out of Bounds”: The Influence of Personal and Institutionalized Bounded Rationality on Judicial Decision Making. In: Using Organizational The- ory to Study, Explain, and Understand Criminal Legal Organizations, pp. 17–33. SpringerNatureSwitzerland(2024).https://doi.org/10.1007/978-3-031-66285-0_2

work page doi:10.1007/978-3-031-66285-0_2 2024

[9] [9]

GLM-5-Team, :, Zeng, A., Lv, X., Hou, Z., Du, Z., Zheng, Q., Chen, B., Yin, D., Ge, C., Huang, C., Xie, C., Zhu, C., Yin, C., Wang, C., Pan, G., Zeng, H., Zhang, H., Wang, H., Chen, H., Zhang, J., Jiao, J., Guo, J., Wang, J., Du, J., Wu, J., Wang, K., Li, L., Fan, L., Zhong, L., Liu, M., Zhao, M., Du, P., Dong, Q., Lu, R., Shuang-Li, Cao, S., Liu, S., Jia...

Pith/arXiv arXiv 2026

[10] [10]

The Innovation 7(6), 101253 (2026) https://doi.org/10.1016/j.xinn.2025.101253

Gu, J., Jiang, X., Shi, Z., Tan, H., Zhai, X., Xu, C., Li, W., Shen, Y., Ma, S., Liu, H., Wang, S., Zhang, K., Lin, Z., Zhang, B., Ni, L., Gao, W., Wang, Y., Guo, J.: A survey on llm-as-a-judge. The In- novation p. 101253 (2026). https://doi.org/10.1016/j.xinn.2025.101253, https://www.sciencedirect.com/science/article/pii/S2666675825004564

work page doi:10.1016/j.xinn.2025.101253 2026

[11] [11]

Lai, J., Gan, W., Wu, J., Qi, Z., Yu, P.S.: Large language models in law: A survey (2023), https://arxiv.org/abs/2312.03718

arXiv 2023

[12] [12]

In: Proceedings of the 34th Interna- Civil Court Simulation with Large Language Models 15 tional Conference on Neural Information Processing Systems

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge-intensive nlp tasks. In: Proceedings of the 34th Interna- Civil Court Simulation with Large Language Models 15 tional Conference on Neural Information Processing Sy...

2020

[13] [13]

Li, D., Jiang, B., Huang, L., Beigi, A., Zhao, C., Tan, Z., Bhattacharjee, A., Jiang, Y., Chen, C., Wu, T., Shu, K., Cheng, L., Liu, H.: From gen- eration to judgment: Opportunities and challenges of llm-as-a-judge (2025), https://arxiv.org/abs/2411.16594

arXiv 2025

[14] [14]

2025 , isbn =

Li, H., Chen, Y., YiRan, H., Ai, Q., Chen, J., Yang, X., Yang, J., Wu, Y., Liu, Z., Liu, Y.: Lexrag: Benchmarking retrieval-augmented generation in multi-turn legal consultation conversation. In: Proceedings of the 48th In- ternational ACM SIGIR Conference on Research and Development in Infor- mation Retrieval. p. 3606–3615. SIGIR ’25, Association for Com...

work page doi:10.1145/3726302.3730340 2025

[15] [15]

In: Proceedings of the 47th In- ternational ACM SIGIR Conference on Research and Development in Infor- mation Retrieval

Li, H., Shao, Y., Wu, Y., Ai, Q., Ma, Y., Liu, Y.: Lecardv2: A large- scale chinese legal case retrieval dataset. In: Proceedings of the 47th In- ternational ACM SIGIR Conference on Research and Development in Infor- mation Retrieval. p. 2251–2260. SIGIR ’24, Association for Computing Ma- chinery, New York, NY, USA (2024). https://doi.org/10.1145/3626772....

work page doi:10.1145/3626772.3657887 2024

[16] [16]

Li, H., Ye, J., Hu, Y., Chen, J., Ai, Q., Wu, Y., Chen, J., Chen, Y., Luo, C., Zhou, Q., Liu, Y.: Casegen: A benchmark for multi-stage legal case documents generation (2025), https://arxiv.org/abs/2502.17943

arXiv 2025

[17] [17]

Luo, J., Zhang, W., Yuan, Y., Zhao, Y., Yang, J., Gu, Y., Wu, B., Chen, B., Qiao, Z., Long, Q., Tu, R., Luo, X., Ju, W., Xiao, Z., Wang, Y., Xiao, M., Liu, C., Yuan, J., Zhang, S., Jin, Y., Zhang, F., Wu, X., Zhao, H., Tao, D., Yu, P.S., Zhang, M.: Large language model agent: A survey on methodology, applications and challenges (2025), https://arxiv.org/a...

Pith/arXiv arXiv 2025

[18] [18]

Generative agents: Interactive simulacra of human behavior,

Park, J.S., O’Brien, J., Cai, C.J., Morris, M.R., Liang, P., Bernstein, M.S.: Generative agents: Interactive simulacra of human behavior. In: Pro- ceedings of the 36th Annual ACM Symposium on User Interface Soft- ware and Technology. UIST ’23, Association for Computing Machin- ery, New York, NY, USA (2023). https://doi.org/10.1145/3586183.3606763, https:/...

work page doi:10.1145/3586183.3606763 2023

[19] [19]

https://qwen.ai/blog?id=qwen3.5 (2026)

Qwen Team: Qwen3.5: Towards native multimodal agents. https://qwen.ai/blog?id=qwen3.5 (2026)

2026

[20] [20]

Indiana Law Journal90(2), 695–739 (2015), https://www.repository.law.indiana.edu/ilj/vol90/iss2/6

Rachlinski, J.J., Wistrich, A.J., Guthrie, C.: Can judges make reliable numeric judgments? distorted damages and skewed sentences. Indiana Law Journal90(2), 695–739 (2015), https://www.repository.law.indiana.edu/ilj/vol90/iss2/6

2015

[21] [21]

Chen et al

Team, K., Bai, T., Bai, Y., Bao, Y., Cai, S.H., Cao, Y., Charles, Y., Che, H.S., Chen, C., Chen, G., Chen, H., Chen, J., Chen, J., Chen, J., Chen, J., Chen, K., Chen, L., Chen, R., Chen, X., Chen, Y., Chen, Y., Chen, Y., Chen, Y., Chen, Y., Chen, Y., Chen, Y., Chen, Y., Chen, Z., Chen, Z., Cheng, D., Chu, M., Cui, J., Deng, J., Diao, M., Ding, H., Dong, M...

Pith/arXiv arXiv 2026

[22] [22]

Decision12(3), 246–267 (2025)

Wojciechowski, B.W., White, L.C., Allefeld, C., Pothos, E.M.: Order effects and the evaluation bias in legal decision making. Decision12(3), 246–267 (2025). https://doi.org/10.1037/dec0000263

work page doi:10.1037/dec0000263 2025

[23] [23]

Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., Awadallah, A.H., White, R.W., Burger, D., Wang, C.: Au- togen: Enabling next-gen llm applications via multi-agent conversation (2023), https://arxiv.org/abs/2308.08155

Pith/arXiv arXiv 2023

[24] [24]

Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., Zheng, C., Liu, D., Zhou, F., Huang, F., Hu, F., Ge, H., Wei, H., Lin, H., Tang, J., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Zhou, J., Lin, J., Dang, K., Bao, K., Yang, K., Yu, L., Deng, L., Li, M., Xue, M., Li, M., Zhang, P., Wang, P., Zhu, Q...

Pith/arXiv arXiv 2025

[25] [25]

Zhang, K., Li, J., Wu, Y., Li, H., Luo, C., Zou, S., Zhou, Y., Su, W., Ai, Q., Liu, Y.: Chinese court simulation with llm-based agent system (2025), https://arxiv.org/abs/2508.17322

arXiv 2025