Cross-Domain Data Selection and Augmentation for Automatic Compliance Detection
Pith reviewed 2026-05-09 22:08 UTC · model grok-4.3
The pith
Targeted selection of source data from larger domains cuts negative transfer when adapting compliance detection models to new regulations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When compliance detection is cast as NLI, selecting augmentation data from a source domain via cross-entropy difference, importance weighting, or embedding similarity substantially reduces negative transfer to a target regulation, while random selection frequently increases it.
What carries the argument
Four data-selection methods (random, Moore-Lewis cross-entropy, importance weighting, embedding retrieval) that rank and keep a variable fraction of source-domain examples for NLI training on the target domain.
If this is right
- Increasing the proportion of selected data does not always improve results; an optimal fraction exists for each selection method.
- Non-random selection can turn an otherwise harmful source domain into a net positive for cross-regulation adaptation.
- The approach scales compliance automation by letting a single labeled corpus serve multiple regulations after targeted filtering.
- Embedding-based retrieval offers a simple, model-agnostic alternative to the more computationally heavy cross-entropy methods.
Where Pith is reading between the lines
- The same selection pipeline could be applied to other text-classification settings that suffer from regulatory or jurisdictional shift.
- Combining two selection criteria (for example, embedding similarity followed by importance weighting) might further reduce residual negative transfer.
- If the NLI formulation already loses some legal nuance, the reported gains may understate the true difficulty of cross-domain compliance.
Load-bearing premise
The selected source examples will improve or at least not degrade performance on the target regulation without introducing new biases that the NLI task framing cannot detect.
What would settle it
A controlled test in which the same target regulation is paired with source data chosen by each of the four methods at multiple proportions, and none of the targeted methods yields higher F1 than a no-augmentation baseline.
Figures
read the original abstract
Automating the detection of regulatory compliance remains a challenging task due to the complexity and variability of legal texts. Models trained on one regulation often fail to generalise to others. This limitation underscores the need for principled methods to improve cross-domain transfer. We study data selection as a strategy to mitigate negative transfer in compliance detection framed as a natural language inference (NLI) task. Specifically, we evaluate four approaches for selecting augmentation data from a larger source domain: random sampling, Moore-Lewis's cross-entropy difference, importance weighting, and embedding-based retrieval. We systematically vary the proportion of selected data to analyse its effect on cross-domain adaptation. Our findings demonstrate that targeted data selection substantially reduces negative transfer, offering a practical path toward scalable and reliable compliance automation across heterogeneous regulations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper frames automatic compliance detection as an NLI task (regulation text as premise, document span as hypothesis) and evaluates four source-domain data selection methods—random sampling, Moore-Lewis cross-entropy difference, importance weighting, and embedding-based retrieval—to mitigate negative transfer when augmenting target-domain training data. It varies the proportion of selected data and claims that targeted selection substantially reduces negative transfer compared to unfiltered augmentation, providing a practical route to cross-regulation generalization.
Significance. If the empirical comparisons are robust, the work supplies a concrete, replicable recipe for improving cross-domain transfer in legal NLP without requiring new model architectures or large-scale annotation, which would be useful for compliance systems operating across heterogeneous regulations.
major comments (2)
- [Abstract and §3] Abstract and §3 (experimental setup): the central claim that targeted selection 'substantially reduces negative transfer' is asserted without any reported metrics, baselines, statistical tests, or definition of how negative transfer was quantified (e.g., delta in F1 or accuracy between source-only and augmented models). This makes the quantitative support for the claim impossible to evaluate from the provided text.
- [§2] §2 (NLI formulation): the premise-hypothesis construction (full regulation text vs. single document span) does not address multi-clause conditionals, exceptions, or cross-sentence dependencies typical in regulatory compliance. Without clause decomposition, multi-premise reasoning, or error analysis isolating label noise from domain mismatch, it remains unclear whether observed gains stem from genuine semantic alignment or from spurious label-distribution effects.
minor comments (2)
- [§3] The four selection methods are introduced without explicit equations or pseudocode for the Moore-Lewis and importance-weighting variants; adding these would improve reproducibility.
- [§3] No mention of the source and target regulation corpora sizes, label distributions, or inter-annotator agreement for the NLI labels.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight important areas for improving the clarity of our claims and the discussion of our task formulation. We address each point below and have made revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (experimental setup): the central claim that targeted selection 'substantially reduces negative transfer' is asserted without any reported metrics, baselines, statistical tests, or definition of how negative transfer was quantified (e.g., delta in F1 or accuracy between source-only and augmented models). This makes the quantitative support for the claim impossible to evaluate from the provided text.
Authors: We agree that the abstract and experimental setup in §3 should explicitly define negative transfer and report supporting metrics. Negative transfer is quantified as the drop in F1 score (and accuracy) when augmenting target-domain training data with unfiltered source data versus source-only training. In the revised manuscript, we have added this definition to §3, included the specific F1 deltas for each selection method across varying data proportions, and reported paired statistical significance tests. The abstract has also been updated to reference these quantitative results, showing that targeted selection reduces negative transfer by 4–9 F1 points relative to random augmentation. revision: yes
-
Referee: [§2] §2 (NLI formulation): the premise-hypothesis construction (full regulation text vs. single document span) does not address multi-clause conditionals, exceptions, or cross-sentence dependencies typical in regulatory compliance. Without clause decomposition, multi-premise reasoning, or error analysis isolating label noise from domain mismatch, it remains unclear whether observed gains stem from genuine semantic alignment or from spurious label-distribution effects.
Authors: We acknowledge that framing compliance detection as a single-premise NLI task with the full regulation text as premise is a simplification that does not explicitly decompose multi-clause conditionals or handle exceptions and cross-sentence dependencies. We will revise §2 to discuss these modeling assumptions and their potential impact. We will also add an error analysis subsection that examines cases where gains appear driven by domain alignment versus label noise. However, a full multi-premise or clause-decomposition approach would require a different task formulation and additional annotation, which lies outside the scope of the current study focused on data selection. revision: partial
Circularity Check
No circularity: purely empirical comparison of data selection methods
full rationale
The paper is an empirical study that evaluates four data selection techniques (random, Moore-Lewis cross-entropy, importance weighting, embedding retrieval) for mitigating negative transfer when augmenting NLI-framed compliance detection across regulations. It varies selection proportions and reports experimental outcomes on transfer performance. No equations, derivations, fitted parameters, or self-citations are used to derive or predict results; the central claims rest on observed data rather than any step that reduces by construction to the paper's own inputs or prior self-referential claims. This matches the default case of a self-contained empirical ML paper with no load-bearing circular structure.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Models trained on one regulation often fail to generalise to others because of the complexity and variability of legal texts.
- domain assumption Compliance detection can be usefully framed as a natural language inference task.
Reference graph
Works this paper leans on
-
[1]
Cross-policy compliance detection via question answering,
M. Saeidi, M. Yazdani, and A. Vlachos, “Cross-policy compliance detection via question answering,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, Eds. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov. 2021, pp. 8622–8632....
work page 2021
-
[2]
Compliance checking of software processes: A systematic literature review,
J. P. Castellanos Ardila, B. Gallina, and F. Ul Muram, “Compliance checking of software processes: A systematic literature review,”Journal of Software: Evolution and Process, vol. 34, no. 5, p. e2440, 2022
work page 2022
-
[3]
P. M. Duvall, S. Matyas, and A. Glover,Continuous integration: improving software quality and reducing risk. Pearson Education, 2007
work page 2007
-
[4]
Nlp-based automated compliance checking of data processing agreements against gdpr,
O. A. Cejas, M. I. Azeem, S. Abualhaija, and L. C. Briand, “Nlp-based automated compliance checking of data processing agreements against gdpr,”IEEE Transactions on Software Engineering, vol. 49, no. 9, pp. 4282–4303, 2023
work page 2023
-
[5]
A multi-solution study on gdpr ai- enabled completeness checking of dpas,
M. I. Azeem and S. Abualhaija, “A multi-solution study on gdpr ai- enabled completeness checking of dpas,”Empirical Software Engineer- ing, vol. 29, no. 4, p. 96, 2024
work page 2024
-
[6]
Lessons from the use of natural language inference (nli) in requirements engineering tasks,
M. Fazelnia, V . Koscinski, S. Herzog, and M. Mirakhorli, “Lessons from the use of natural language inference (nli) in requirements engineering tasks,”2024 IEEE 32nd International Requirements Engineering Con- ference (RE), pp. 103–115, 2024
work page 2024
-
[7]
Two-stage compliance detection for power enterprises based on nli and llm,
M. Hua, Q. Zhao, J. Song, and X.-s. Tang, “Two-stage compliance detection for power enterprises based on nli and llm,” in2024 IEEE International Symposium on Product Compliance Engineering - Asia (ISPCE-ASIA), 2024, pp. 1–5
work page 2024
-
[8]
F. Ikhwantri and D. Marijan, “Explainable compliance detection with multi-hop natural language inference on assurance case structure,” 2025
work page 2025
-
[9]
Classification or Prompting: A Case Study on Legal Requirements Traceability
R. Etezadi, S. Abualhaija, C. Arora, and L. Briand, “Classification or prompting: A case study on legal requirements traceability,” 2025. [Online]. Available: https://arxiv.org/abs/2502.04916
-
[10]
A compliance checking framework based on retrieval augmented generation,
J. Sun, Z. Luo, and Y . Li, “A compliance checking framework based on retrieval augmented generation,” inProceedings of the 31st International Conference on Computational Linguistics, O. Rambow, L. Wanner, M. Apidianaki, H. Al-Khalifa, B. D. Eugenio, and S. Schockaert, Eds. Abu Dhabi, UAE: Association for Computational Linguistics, Jan. 2025, pp. 2603–261...
work page 2025
-
[11]
Characterizing and avoiding negative transfer,
Z. Wang, Z. Dai, B. Póczos, and J. G. Carbonell, “Characterizing and avoiding negative transfer,”2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11 285–11 294, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:53748459
work page 2019
-
[12]
S. Meftah, N. Semmar, Y . Tamaazousti, H. Essafi, and F. Sadat, “On the hidden negative transfer in sequential transfer learning for domain adaptation from news to tweets,” inProceedings of the Second Workshop on Domain Adaptation for NLP, E. Ben-David, S. Cohen, R. McDonald, B. Plank, R. Reichart, G. Rotman, and Y . Ziser, Eds. Kyiv, Ukraine: Association...
work page 2021
-
[13]
Intelligent selection of language model training data,
R. C. Moore and W. Lewis, “Intelligent selection of language model training data,” inProceedings of the ACL 2010 Conference Short Papers, J. Haji ˇc, S. Carberry, S. Clark, and J. Nivre, Eds. Uppsala, Sweden: Association for Computational Linguistics, Jul. 2010, pp. 220–224. [Online]. Available: https://aclanthology.org/P10-2041/
work page 2010
-
[14]
Data selection for language models via importance resampling,
S. M. Xie, S. Santurkar, T. Ma, and P. Liang, “Data selection for language models via importance resampling,”Advances in Neural In- formation Processing Systems (NeurIPS), 2023
work page 2023
-
[15]
A survey on transfer learning,
S. J. Pan and Q. Yang, “A survey on transfer learning,”IEEE Transactions on Knowledge and Data Engineering, vol. 22, pp. 1345–1359, 2010. [Online]. Available: https://api.semanticscholar.org/ CorpusID:740063
work page 2010
-
[16]
Large language models for data annotation and synthesis: A survey,
Z. Tan, D. Li, S. Wang, A. Beigi, B. Jiang, A. Bhattacharjee, M. Karami, J. Li, L. Cheng, and H. Liu, “Large language models for data annotation and synthesis: A survey,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, Eds. Miami, Florida, USA: Association for Computatio...
work page 2024
-
[17]
The pascal recognising textual entailment challenge,
I. Dagan, O. Glickman, and B. Magnini, “The pascal recognising textual entailment challenge,” inMachine learning challenges workshop. Springer, 2005, pp. 177–190
work page 2005
-
[18]
Alex Warstadt, Amanpreet Singh, and Samuel R Bowman
S. Wang, H. Fang, M. Khabsa, H. Mao, and H. Ma, “Entailment as few-shot learner,”arXiv preprint arXiv:2104.14690, 2021
-
[19]
Label verbalization and entailment for effective zero and few-shot relation extraction,
O. Sainz, O. Lopez de Lacalle, G. Labaka, A. Barrena, and E. Agirre, “Label verbalization and entailment for effective zero and few-shot relation extraction,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, Eds. Online and Punta Cana, Dominican Republic: Associat...
work page 2021
-
[20]
A machine learning approach for tracing regulatory codes to product specific requirements,
J. Cleland-Huang, A. Czauderna, M. Gibiec, and J. Emenecker, “A machine learning approach for tracing regulatory codes to product specific requirements,” inProceedings of the 32nd ACM/IEEE International Conference on Software Engineering - V olume 1, ser. ICSE ’10. New York, NY , USA: Association for Computing Machinery, 2010, p. 155–164. [Online]. Availa...
-
[21]
Tackling the term-mismatch problem in automated trace retrieval,
J. Guo, M. Gibiec, and J. Cleland-Huang, “Tackling the term-mismatch problem in automated trace retrieval,”Empirical Software Engineering, vol. 22, no. 3, pp. 1103–1142, 2017
work page 2017
-
[22]
BERT: Pre- training of deep bidirectional transformers for language understanding,
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- training of deep bidirectional transformers for language understanding,” inProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, V olume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Ed...
work page 2019
-
[23]
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Y . Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V . Stoyanov, “Roberta: A robustly optimized bert pretraining approach,”arXiv preprint arXiv:1907.11692, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[24]
LEGAL-BERT: The muppets straight out of law school,
I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I. Androutsopoulos, “LEGAL-BERT: The muppets straight out of law school,” inFindings of the Association for Computational Linguistics: EMNLP 2020, T. Cohn, Y . He, and Y . Liu, Eds. Online: Association for Computational Linguistics, Nov. 2020, pp. 2898–2904. [Online]. Available: https://aclant...
work page 2020
-
[25]
A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughanet al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[26]
Conditional bert contextual augmentation,
X. Wu, S. Lv, L. Zang, J. Han, and S. Hu, “Conditional bert contextual augmentation,” inInternational conference on computational science. Springer, 2019, pp. 84–95
work page 2019
-
[27]
Data augmentation approaches in natural language processing: A survey,
B. Li, Y . Hou, and W. Che, “Data augmentation approaches in natural language processing: A survey,”Ai Open, vol. 3, pp. 71–90, 2022
work page 2022
-
[28]
When and how to paraphrase for named entity recognition?
S. Sharma, A. Joshi, Y . Zhao, N. Mukhija, H. Bhathena, P. Singh, and S. Santhanam, “When and how to paraphrase for named entity recognition?” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Toronto, Canada: Association for Computational Lin...
work page 2023
-
[29]
Data augmentation using back- translation for context-aware neural machine translation,
A. Sugiyama and N. Yoshinaga, “Data augmentation using back- translation for context-aware neural machine translation,” inProceedings of the F ourth Workshop on Discourse in Machine Translation (DiscoMT 2019), A. Popescu-Belis, S. Loáiciga, C. Hardmeier, and D. Xiong, Eds. Hong Kong, China: Association for Computational Linguistics, Nov. 2019, pp. 35–44. ...
work page 2019
-
[30]
Learning transferrable and interpretable representations for domain generalization,
Z. Du, J. Li, K. Lu, L. Zhu, and Z. Huang, “Learning transferrable and interpretable representations for domain generalization,” inProceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 3340–3349
work page 2021
-
[31]
Explaining cross-domain recognition with interpretable deep classifier,
Y . Zhang, T. Yao, Z. Qiu, and T. Mei, “Explaining cross-domain recognition with interpretable deep classifier,”ACM Trans. Multimedia Comput. Commun. Appl., vol. 20, no. 3, Oct. 2023. [Online]. Available: https://doi.org/10.1145/3623399
-
[32]
With a little push, NLI models can robustly and efficiently predict faithfulness,
J. Steen, J. Opitz, A. Frank, and K. Markert, “With a little push, NLI models can robustly and efficiently predict faithfulness,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (V olume 2: Short Papers), A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Toronto, Canada: Association for Computational Linguistics, J...
work page 2023
-
[33]
Improving faithfulness and factuality with contrastive learning in explainable recommendation,
H. Zhuang, W. Zhang, W. Chen, J. Yang, and Q. Z. Sheng, “Improving faithfulness and factuality with contrastive learning in explainable recommendation,”ACM Trans. Intell. Syst. Technol., vol. 16, no. 1, Dec. 2024. [Online]. Available: https://doi.org/10.1145/3653984
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.