TypedCSIP: Typed Counterfactual Pretraining for Chinese Legislative Conflict Classification

Yao Liu

arxiv: 2605.25474 · v1 · pith:76TILCFBnew · submitted 2026-05-25 · 💻 cs.CL

TypedCSIP: Typed Counterfactual Pretraining for Chinese Legislative Conflict Classification

Yao Liu This is my paper

Pith reviewed 2026-06-29 22:04 UTC · model grok-4.3

classification 💻 cs.CL

keywords counterfactual pretraininglegal conflict classificationChinese legislationLCR-CN benchmarktyped interventionmacro-F1 evaluationpre-registered test

0 comments

The pith

TypedCSIP pretrains on expert revisions of law pairs to improve conflict-type classification by 0.9-1.3 points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TypedCSIP as a two-stage method for the LCR-CN conflict classification task. In stage one a shared encoder is pretrained on triplets consisting of a superior provision, a subordinate provision, and an expert-written minimal revision; the typed factor head is trained to label the revised pair as containing no conflict evidence under one of four legal-doctrine categories. In stage two the encoder is transferred to a five-way classification head that predicts conflict and doctrine type on unmodified pairs. On the 696-record test split the resulting model exceeds the strongest single-model baseline by statistically significant margins that meet a pre-registered 0.8 pp threshold with both bootstrap and t-test bounds above zero. The gain is also positive on a cold-start subset of unseen records, while the same encoder shows no benefit on a separate retrieval task.

Core claim

TypedCSIP pretrains a shared encoder with a typed Counterfactual Selective Intervention Pretraining objective on (superior, subordinate, expert-revised) triplets, requiring the typed factor head to classify the expert revision as carrying no conflict evidence; the encoder is then transferred to a five-way classification head that reads only the original pair at test time.

What carries the argument

The typed Counterfactual Selective Intervention Pretraining objective that treats expert minimal revisions as clean no-conflict counterfactuals for doctrine-type classification.

If this is right

The pretraining signal raises macro-F1 on unmodified test pairs by at least 0.9 pp on chinese-roberta-wwm-ext and 1.3 pp on SAILER.
Gains remain positive on the 244 Unseen-gB records.
The Stage-2 encoder specializes for conflict classification and does not improve superior-law retrieval.
Both cells pass the locked statistical rule requiring mean difference at least 0.8 pp with both seed-bootstrap and Student-t bounds above zero.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same revision-based signal might be tested on other legal corpora where minimal expert edits are available.
If the revisions are not fully minimal or doctrine-neutral, the pretraining objective could introduce label noise.
A direct comparison of typed versus untyped counterfactual heads would isolate the contribution of the doctrine-type supervision.

Load-bearing premise

Expert-written minimal revisions can be treated as clean counterfactuals that carry no conflict evidence and that this signal transfers to improve classification on unmodified pairs.

What would settle it

A mean per-seed macro-F1 difference below 0.8 pp or a 95% lower bound below zero on either backbone, under the pre-registered 18-seed protocol, would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2605.25474 by Yao Liu.

**Figure 1.** Figure 1: TypedCSIP architecture overview. Stage 1 pretrains the encoder with a typed CSIP loss whose head has four rows [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗

read the original abstract

TypedCSIP is a typed counterfactual pretraining method for the conflict-classification task of the LCR-CN benchmark (Zhao et al., 2026): given a (superior, subordinate) provision pair, predict whether the pair conflicts and which of four legal-doctrine types (Responsibility, Condition, Sanction, Definition) describes the inconsistency. We exploit LCR-CN's expert-written minimal revisions as training-time counterfactual supervision; at test time the classifier reads only the original pair. Stage 1 pretrains a shared encoder with a typed Counterfactual Selective Intervention Pretraining objective on (superior, subordinate, expert-revised) triplets, treating the expert revision as a counterfactual that the typed factor head must classify as carrying no conflict evidence. Stage 2 transfers the encoder to a five-way classification head. The confirmatory test was registered on the Open Science Framework before observing v6 measurements: 18 seeds, locked rule requiring mean per-seed difference at least 0.8 pp with both seed-bootstrap and Student-t 95% lower bounds above zero. On the 696-record test split, the v2 variant improves macro-F1 over the strongest single-model baseline by +0.916 pp on chinese-roberta-wwm-ext and +1.288 pp on the SAILER cross-backbone replication; both cells pass the rule. A cold-start stratified result on the 244 Unseen-gB records keeps the gain positive on both backbones. A cross-task diagnostic shows the Stage-2 encoder is classification-specialized and does not transfer to LCR-CN's superior-law retrieval task, so we scope the contribution to conflict classification. We release code, 72 pre-registered prediction files, matched-seed and MLM-control auxiliaries, and the OSF pre-registration record.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TypedCSIP shows a modest pre-registered macro-F1 gain on LCR-CN conflict classification by using expert revisions as typed no-conflict counterfactuals during pretraining.

read the letter

The main takeaway is a small but locked-in improvement: on the 696-record test split the v2 model beats the strongest baseline by 0.9-1.3 macro-F1 points across two backbones, clears the pre-registered 0.8 pp rule with both bootstrap and t-interval bounds, and holds on the 244-record unseen-gB cold-start subset.

What the paper actually contributes is the typed counterfactual selective intervention objective. It pretrains a shared encoder on (superior, subordinate, expert-revised) triplets so the typed factor head learns to treat the revision as carrying zero conflict evidence, then transfers the encoder to five-way classification. The cross-task diagnostic that the encoder does not improve superior-law retrieval is useful for scoping the claim. Matched-seed MLM controls and the release of code plus all 72 pre-registered prediction files give independent ways to check the numbers.

The central assumption—that expert minimal revisions function as clean no-conflict counterfactuals—looks reasonable given how the benchmark was built, and the paper does not appear to over-claim beyond conflict classification. The main limitation is that the effect size remains incremental on a narrow legal-NLP task; nothing here reorganizes broader methods.

This is worth a serious referee for groups working on legal text classification or controlled counterfactual pretraining. The pre-registration, controls, and artifacts make it a clean incremental result even if the practical payoff stays modest.

Referee Report

0 major / 3 minor

Summary. TypedCSIP is a two-stage typed counterfactual pretraining method for the conflict-classification task on the LCR-CN benchmark. Stage 1 pretrains a shared encoder on (superior, subordinate, expert-revised) triplets using a typed Counterfactual Selective Intervention Pretraining objective that treats expert revisions as no-conflict counterfactuals; Stage 2 transfers the encoder to a five-way classification head. On the 696-record test split the v2 variant reports macro-F1 gains of +0.916 pp (chinese-roberta-wwm-ext) and +1.288 pp (SAILER replication) that pass a pre-registered rule (mean per-seed difference ≥0.8 pp with both seed-bootstrap and Student-t 95% lower bounds >0); gains remain positive on the 244-record Unseen-gB cold-start subset. A cross-task diagnostic shows the encoder is specialized to conflict classification and does not improve superior-law retrieval.

Significance. If the result holds, the work supplies a scoped, statistically controlled improvement to legislative conflict classification that exploits expert counterfactuals without claiming cross-task transfer. Strengths include the locked pre-registration, multi-seed protocol with explicit thresholds, replication across two backbones, cold-start evaluation, matched-seed MLM controls, and public release of code plus 72 pre-registered prediction files; these elements provide independent support for the narrow claim.

minor comments (3)

Abstract: references to the 'v2 variant' and 'v6 measurements' are undefined; a short methods paragraph or footnote should state what these versions denote and how they differ from the registered protocol.
The typed factor head and its loss formulation are described at a high level; adding a short pseudocode block or diagram in §3 would improve reproducibility without lengthening the paper.
Table or figure captions for the main results should explicitly restate the pre-registered decision rule and the two backbone identifiers for quick reference.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed summary of our work, the positive assessment of its significance, and the recommendation for minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central claim is an empirical macro-F1 improvement on a locked 696-record test split, obtained by pretraining on expert minimal revisions from the external LCR-CN benchmark and transferring to a classification head. No equation or derivation reduces the reported gain to a quantity defined by parameters fitted on the test data itself. The method relies on an externally annotated benchmark and pre-registered statistical thresholds rather than self-definitional or self-citation load-bearing steps. Matched-seed MLM controls and cross-task diagnostics provide independent support outside the fitted values.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that expert revisions constitute valid counterfactuals free of conflict evidence; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Expert-written minimal revisions act as counterfactuals that carry no conflict evidence for the typed factor head.
This premise is invoked to define the Stage-1 pretraining objective on (superior, subordinate, expert-revised) triplets.

pith-pipeline@v0.9.1-grok · 5875 in / 1410 out tokens · 46712 ms · 2026-06-29T22:04:06.786372+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 19 canonical work pages · 2 internal anchors

[1]

A simple framework for contrastive learning of visual representations, in: International Conference on Machine Learning (ICML)

Chen, T., Kornblith, S., Norouzi, M., Hinton, G., 2020. A simple framework for contrastive learning of visual representations, in: International Conference on Machine Learning (ICML). Projection head discarded after pretraining

2020
[2]

Revisiting pre-trained models for Chinese natural language processing, in: Findings of the Association for Computational Linguistics: EMNLP 2020, pp

Cui, Y., Che, W., Liu, T., Qin, B., Wang, S., Hu, G., 2020. Revisiting pre-trained models for Chinese natural language processing, in: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 657–668.arXiv:2004.13922. chinese RoBERTa-WWM-ext (primary backbone in our exper- iments)

work page arXiv 2020
[3]

Deng, C., Mao, K., Dou, Z., 2024. Learning interpretable legal case retrieval via knowledge-guided case reformulation, in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Pro- cessing, pp. 1253–1265. URL:https://aclanthology.org/2024.emnlp-main.73/. knowledge-guided legal case retrieval (KELLER); Chinese legal IR

2024
[4]

Efron, Bootstrap Methods: Another Look at the Jack- knife, The Annals of Statistics7, 10.1214/aos/1176344552 (1979)

Efron, B., 1979. Bootstrap methods: Another look at the jackknife. The Annals of Statistics 7, 1–26. doi:10.1214/aos/1176344552

work page doi:10.1214/aos/1176344552 1979
[5]

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Gao, T., Yao, X., Chen, D., 2021. SimCSE: Simple contrastive learning of sentence embeddings, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6894–6910.arXiv:2104.08821

work page internal anchor Pith review Pith/arXiv arXiv 2021
[6]

Guha, N., Nyarko, J., Ho, D.E., Ré, C., et al., 2023. LegalBench: A collaboratively built benchmark for measuring legal reasoning in large language models, in: Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track.arXiv:2308.11462. 29

work page arXiv 2023
[7]

Clash-of-Leges: A bilingual dataset for conflict detection and explanation in statutory law

Italiani, P., Moro, G., Ragazzi, L., 2026. Clash-of-Leges: A bilingual dataset for conflict detection and explanation in statutory law. Expert Systems with Applications 300, 130182. doi:10.1016/j.eswa. 2025.130182. closest international prior work; binary conflict detection between legal articles, Italian Constitutional Court

work page doi:10.1016/j.eswa 2026
[8]

Retrieval contrastive learning for aspect-level sentiment classifica- tion

Jian, Z., Li, J., Wu, Q., Yao, J., 2024. Retrieval contrastive learning for aspect-level sentiment classifica- tion. Information Processing & Management 61. doi:10.1016/j.ipm.2023.103539. iP&M contrastive method precedent; ABSA SOTA

work page doi:10.1016/j.ipm.2023.103539 2024
[9]

Learning the difference that makes a difference with counterfactually-augmented data, in: International Conference on Learning Representations (ICLR)

Kaushik, D., Hovy, E., Lipton, Z.C., 2020. Learning the difference that makes a difference with counterfactually-augmented data, in: International Conference on Learning Representations (ICLR). URL:https://openreview.net/forum?id=Sklgs0NFvr. foundational counterfactually- augmented data (CAD) paper: human minimal revisions flip the gold label

2020
[10]

Dharshan Kumaran, Demis Hassabis, and James L

Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D., Clopath, C., Kumaran, D., Hadsell, R., 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sci- ences 114, 3521–3526. doi:10.1073/pnas.1611835114

work page doi:10.1073/pnas.1611835114 2017
[11]

Statistical significance tests for machine translation evaluation, in: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp

Koehn, P., 2004. Statistical significance tests for machine translation evaluation, in: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 388–395

2004
[12]

Li, H., Ai, Q., Chen, J., Dong, Q., Wu, Y., Liu, Y., Chen, C., Tian, Q., 2023. SAILER: Structure- aware pre-trained language model for legal case retrieval, in: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1035–1044. doi:10.1145/ 3539618.3591761. chinese legal BERT-encoder; we use as ...

work page arXiv 2023
[13]

Triplecontrastivelearningrepresentationboostingforsupervisedmulticlass tasks

Li, X., Liu, Z., Liu, S., 2025. Triplecontrastivelearningrepresentationboostingforsupervisedmulticlass tasks. Information Processing & Management 62, 104011. doi:10.1016/j.ipm.2024.104011. iP&M label-aware supervised contrastive multiclass precedent

work page doi:10.1016/j.ipm.2024.104011 2025
[14]

Li, Y., Xu, C., Long, G., Shen, T., Tao, C., Jiang, J., 2024. CCPrefix: Counterfactual contrastive prefix- tuning for many-class classification, in: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2977–2988. URL:https: //aclanthology.org/2024.eacl-long.181/, doi:10.18...

work page doi:10.18653/v1/2024.eacl-long.181 2024
[15]

Learning without forgetting

Li, Z., Hoiem, D., 2018. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 2935–2947. doi:10.1109/TPAMI.2017.2773081. 30

work page doi:10.1109/tpami.2017.2773081 2018
[16]

Ma, Y., Shao, Y., Wu, Y., Liu, Y., Zhang, R., Zhang, M., Ma, S., 2021. LeCaRD: A legal case retrieval dataset for Chinese law system, in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2342–2348. doi:10.1145/3404835.3463250

work page doi:10.1145/3404835.3463250 2021
[17]

The preregistration revolution

Nosek, B.A., Ebersole, C.R., DeHaven, A.C., Mellor, D.T., 2018. The preregistration revolution. Proceedings of the National Academy of Sciences 115, 2600–2606. doi:10.1073/pnas.1708274114

work page doi:10.1073/pnas.1708274114 2018
[18]

Qiu, X., Wang, Y., Guo, X., Zeng, Z., Yue, Y., Feng, Y., Miao, C., 2024a. PairCFR: Enhancing model training on paired counterfactually augmented data through contrastive learning, in: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL). URL:https: //aclanthology.org/2024.acl-long.646/. paired CAD + contrastive los...

2024
[19]

Qiu, Z., Duan, X., Cai, Z., 2024b. Evaluating grammatical well-formedness in large language models: A comparative study with human judgments, in: Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL). URL:https://aclanthology.org/2024.cmcl-1.16/, doi:10. 18653/v1/2024.cmcl-1.16. oSF pre-registration of three NLP experiment...

2024
[20]

Counterfactual contrastive learning: Robust representations via causal image synthesis, in: Data Engineering in Medical Imaging (DEMI) Workshop at MICCAI.arXiv:2403.09605

Roschewitz, M., De Sousa Ribeiro, F., Xia, T., Khara, G., Glocker, B., 2024. Counterfactual contrastive learning: Robust representations via causal image synthesis, in: Data Engineering in Medical Imaging (DEMI) Workshop at MICCAI.arXiv:2403.09605. counterfactual-as-positive contrastive; medical imaging not legal

work page arXiv 2024
[21]

False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant

Simmons, J.P., Nelson, L.D., Simonsohn, U., 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22, 1359–1366. doi:10.1177/0956797611417632

work page doi:10.1177/0956797611417632 2011
[22]

Increasing transparency through a multiverse analysis

Steegen, S., Tuerlinckx, F., Gelman, A., Vanpaemel, W., 2016. Increasing transparency through a multiverse analysis. Perspectives on Psychological Science 11, 702–712

2016
[23]

Legal judgment prediction via graph boosting with con- straints

Tong, S., Yuan, J., Zhang, P., Li, L., 2024. Legal judgment prediction via graph boosting with con- straints. Information Processing & Management 61, 103663. doi:10.1016/j.ipm.2024.103663. iP&M Chinese LJP precedent; multi-task with constraints

work page doi:10.1016/j.ipm.2024.103663 2024
[24]

Lawformer: A pre-trained language model for Chinese legal long documents

Xiao, C., Hu, X., Liu, Z., Tu, C., Sun, M., 2021. Lawformer: A pre-trained language model for Chinese legal long documents. AI Open Chinese legal long-document encoder, RoFormer-based

2021
[25]

CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction

Xiao, C., Zhong, H., Guo, Z., Tu, C., Liu, Z., Sun, M., Feng, Y., Han, X., Hu, Z., Wang, H., Xu, J., 2018. CAIL2018: Alarge-scalelegaldatasetforjudgmentprediction. arXivpreprintarXiv:1807.02478.cAIL benchmark for Chinese LJP — broadly used foundation for LJP method papers. 31

work page internal anchor Pith review Pith/arXiv arXiv 2018
[26]

LA-MGFM: A legal judgment prediction method via sememe- enhanced graph neural networks and multi-graph fusion mechanism

Zhao, Q., Gao, T., Guo, N., 2023. LA-MGFM: A legal judgment prediction method via sememe- enhanced graph neural networks and multi-graph fusion mechanism. Information Processing & Man- agement 60, 103455. doi:10.1016/j.ipm.2023.103455. iP&M legal NLP precedent; Chinese CAIL multi-task

work page doi:10.1016/j.ipm.2023.103455 2023
[27]

Zhao, S., Xu, Y., Chen, Z., Qiao, F., Chen, H., Li, X., Lin, S., Ji, Z., Li, Y., Wang, W.,
[28]

Scientific Data URL:https://www.nature.com/articles/s41597-026-07195-2, doi:10.1038/ s41597-026-07195-2

Bridging the gap in Chinese legal conflict review: A dataset, benchmark tasks, and frame- work. Scientific Data URL:https://www.nature.com/articles/s41597-026-07195-2, doi:10.1038/ s41597-026-07195-2. lCR-CN dataset, 6995 annotated provisions, 5-class conflict taxonomy
[29]

Enhancing pre-trained language models with Chinese character mor- phological knowledge

Zheng, Z., Wu, X., Liu, X., 2025. Enhancing pre-trained language models with Chinese character mor- phological knowledge. Information Processing & Management 62. doi:10.1016/j.ipm.2024.103945. iP&M 2-stage Chinese contrastive pretraining precedent (methodology twin). 32

work page doi:10.1016/j.ipm.2024.103945 2025

[1] [1]

A simple framework for contrastive learning of visual representations, in: International Conference on Machine Learning (ICML)

Chen, T., Kornblith, S., Norouzi, M., Hinton, G., 2020. A simple framework for contrastive learning of visual representations, in: International Conference on Machine Learning (ICML). Projection head discarded after pretraining

2020

[2] [2]

Revisiting pre-trained models for Chinese natural language processing, in: Findings of the Association for Computational Linguistics: EMNLP 2020, pp

Cui, Y., Che, W., Liu, T., Qin, B., Wang, S., Hu, G., 2020. Revisiting pre-trained models for Chinese natural language processing, in: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 657–668.arXiv:2004.13922. chinese RoBERTa-WWM-ext (primary backbone in our exper- iments)

work page arXiv 2020

[3] [3]

Deng, C., Mao, K., Dou, Z., 2024. Learning interpretable legal case retrieval via knowledge-guided case reformulation, in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Pro- cessing, pp. 1253–1265. URL:https://aclanthology.org/2024.emnlp-main.73/. knowledge-guided legal case retrieval (KELLER); Chinese legal IR

2024

[4] [4]

Efron, Bootstrap Methods: Another Look at the Jack- knife, The Annals of Statistics7, 10.1214/aos/1176344552 (1979)

Efron, B., 1979. Bootstrap methods: Another look at the jackknife. The Annals of Statistics 7, 1–26. doi:10.1214/aos/1176344552

work page doi:10.1214/aos/1176344552 1979

[5] [5]

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Gao, T., Yao, X., Chen, D., 2021. SimCSE: Simple contrastive learning of sentence embeddings, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6894–6910.arXiv:2104.08821

work page internal anchor Pith review Pith/arXiv arXiv 2021

[6] [6]

Guha, N., Nyarko, J., Ho, D.E., Ré, C., et al., 2023. LegalBench: A collaboratively built benchmark for measuring legal reasoning in large language models, in: Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track.arXiv:2308.11462. 29

work page arXiv 2023

[7] [7]

Clash-of-Leges: A bilingual dataset for conflict detection and explanation in statutory law

Italiani, P., Moro, G., Ragazzi, L., 2026. Clash-of-Leges: A bilingual dataset for conflict detection and explanation in statutory law. Expert Systems with Applications 300, 130182. doi:10.1016/j.eswa. 2025.130182. closest international prior work; binary conflict detection between legal articles, Italian Constitutional Court

work page doi:10.1016/j.eswa 2026

[8] [8]

Retrieval contrastive learning for aspect-level sentiment classifica- tion

Jian, Z., Li, J., Wu, Q., Yao, J., 2024. Retrieval contrastive learning for aspect-level sentiment classifica- tion. Information Processing & Management 61. doi:10.1016/j.ipm.2023.103539. iP&M contrastive method precedent; ABSA SOTA

work page doi:10.1016/j.ipm.2023.103539 2024

[9] [9]

Learning the difference that makes a difference with counterfactually-augmented data, in: International Conference on Learning Representations (ICLR)

Kaushik, D., Hovy, E., Lipton, Z.C., 2020. Learning the difference that makes a difference with counterfactually-augmented data, in: International Conference on Learning Representations (ICLR). URL:https://openreview.net/forum?id=Sklgs0NFvr. foundational counterfactually- augmented data (CAD) paper: human minimal revisions flip the gold label

2020

[10] [10]

Dharshan Kumaran, Demis Hassabis, and James L

Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D., Clopath, C., Kumaran, D., Hadsell, R., 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sci- ences 114, 3521–3526. doi:10.1073/pnas.1611835114

work page doi:10.1073/pnas.1611835114 2017

[11] [11]

Statistical significance tests for machine translation evaluation, in: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp

Koehn, P., 2004. Statistical significance tests for machine translation evaluation, in: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 388–395

2004

[12] [12]

Li, H., Ai, Q., Chen, J., Dong, Q., Wu, Y., Liu, Y., Chen, C., Tian, Q., 2023. SAILER: Structure- aware pre-trained language model for legal case retrieval, in: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1035–1044. doi:10.1145/ 3539618.3591761. chinese legal BERT-encoder; we use as ...

work page arXiv 2023

[13] [13]

Triplecontrastivelearningrepresentationboostingforsupervisedmulticlass tasks

Li, X., Liu, Z., Liu, S., 2025. Triplecontrastivelearningrepresentationboostingforsupervisedmulticlass tasks. Information Processing & Management 62, 104011. doi:10.1016/j.ipm.2024.104011. iP&M label-aware supervised contrastive multiclass precedent

work page doi:10.1016/j.ipm.2024.104011 2025

[14] [14]

Li, Y., Xu, C., Long, G., Shen, T., Tao, C., Jiang, J., 2024. CCPrefix: Counterfactual contrastive prefix- tuning for many-class classification, in: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2977–2988. URL:https: //aclanthology.org/2024.eacl-long.181/, doi:10.18...

work page doi:10.18653/v1/2024.eacl-long.181 2024

[15] [15]

Learning without forgetting

Li, Z., Hoiem, D., 2018. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 2935–2947. doi:10.1109/TPAMI.2017.2773081. 30

work page doi:10.1109/tpami.2017.2773081 2018

[16] [16]

Ma, Y., Shao, Y., Wu, Y., Liu, Y., Zhang, R., Zhang, M., Ma, S., 2021. LeCaRD: A legal case retrieval dataset for Chinese law system, in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2342–2348. doi:10.1145/3404835.3463250

work page doi:10.1145/3404835.3463250 2021

[17] [17]

The preregistration revolution

Nosek, B.A., Ebersole, C.R., DeHaven, A.C., Mellor, D.T., 2018. The preregistration revolution. Proceedings of the National Academy of Sciences 115, 2600–2606. doi:10.1073/pnas.1708274114

work page doi:10.1073/pnas.1708274114 2018

[18] [18]

Qiu, X., Wang, Y., Guo, X., Zeng, Z., Yue, Y., Feng, Y., Miao, C., 2024a. PairCFR: Enhancing model training on paired counterfactually augmented data through contrastive learning, in: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL). URL:https: //aclanthology.org/2024.acl-long.646/. paired CAD + contrastive los...

2024

[19] [19]

Qiu, Z., Duan, X., Cai, Z., 2024b. Evaluating grammatical well-formedness in large language models: A comparative study with human judgments, in: Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL). URL:https://aclanthology.org/2024.cmcl-1.16/, doi:10. 18653/v1/2024.cmcl-1.16. oSF pre-registration of three NLP experiment...

2024

[20] [20]

Counterfactual contrastive learning: Robust representations via causal image synthesis, in: Data Engineering in Medical Imaging (DEMI) Workshop at MICCAI.arXiv:2403.09605

Roschewitz, M., De Sousa Ribeiro, F., Xia, T., Khara, G., Glocker, B., 2024. Counterfactual contrastive learning: Robust representations via causal image synthesis, in: Data Engineering in Medical Imaging (DEMI) Workshop at MICCAI.arXiv:2403.09605. counterfactual-as-positive contrastive; medical imaging not legal

work page arXiv 2024

[21] [21]

False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant

Simmons, J.P., Nelson, L.D., Simonsohn, U., 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22, 1359–1366. doi:10.1177/0956797611417632

work page doi:10.1177/0956797611417632 2011

[22] [22]

Increasing transparency through a multiverse analysis

Steegen, S., Tuerlinckx, F., Gelman, A., Vanpaemel, W., 2016. Increasing transparency through a multiverse analysis. Perspectives on Psychological Science 11, 702–712

2016

[23] [23]

Legal judgment prediction via graph boosting with con- straints

Tong, S., Yuan, J., Zhang, P., Li, L., 2024. Legal judgment prediction via graph boosting with con- straints. Information Processing & Management 61, 103663. doi:10.1016/j.ipm.2024.103663. iP&M Chinese LJP precedent; multi-task with constraints

work page doi:10.1016/j.ipm.2024.103663 2024

[24] [24]

Lawformer: A pre-trained language model for Chinese legal long documents

Xiao, C., Hu, X., Liu, Z., Tu, C., Sun, M., 2021. Lawformer: A pre-trained language model for Chinese legal long documents. AI Open Chinese legal long-document encoder, RoFormer-based

2021

[25] [25]

CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction

Xiao, C., Zhong, H., Guo, Z., Tu, C., Liu, Z., Sun, M., Feng, Y., Han, X., Hu, Z., Wang, H., Xu, J., 2018. CAIL2018: Alarge-scalelegaldatasetforjudgmentprediction. arXivpreprintarXiv:1807.02478.cAIL benchmark for Chinese LJP — broadly used foundation for LJP method papers. 31

work page internal anchor Pith review Pith/arXiv arXiv 2018

[26] [26]

LA-MGFM: A legal judgment prediction method via sememe- enhanced graph neural networks and multi-graph fusion mechanism

Zhao, Q., Gao, T., Guo, N., 2023. LA-MGFM: A legal judgment prediction method via sememe- enhanced graph neural networks and multi-graph fusion mechanism. Information Processing & Man- agement 60, 103455. doi:10.1016/j.ipm.2023.103455. iP&M legal NLP precedent; Chinese CAIL multi-task

work page doi:10.1016/j.ipm.2023.103455 2023

[27] [27]

Zhao, S., Xu, Y., Chen, Z., Qiao, F., Chen, H., Li, X., Lin, S., Ji, Z., Li, Y., Wang, W.,

[28] [28]

Scientific Data URL:https://www.nature.com/articles/s41597-026-07195-2, doi:10.1038/ s41597-026-07195-2

Bridging the gap in Chinese legal conflict review: A dataset, benchmark tasks, and frame- work. Scientific Data URL:https://www.nature.com/articles/s41597-026-07195-2, doi:10.1038/ s41597-026-07195-2. lCR-CN dataset, 6995 annotated provisions, 5-class conflict taxonomy

[29] [29]

Enhancing pre-trained language models with Chinese character mor- phological knowledge

Zheng, Z., Wu, X., Liu, X., 2025. Enhancing pre-trained language models with Chinese character mor- phological knowledge. Information Processing & Management 62. doi:10.1016/j.ipm.2024.103945. iP&M 2-stage Chinese contrastive pretraining precedent (methodology twin). 32

work page doi:10.1016/j.ipm.2024.103945 2025