arxiv: 2604.22098 · v1 · submitted 2026-04-23 · 💻 cs.CL

Knowledge-driven Augmentation and Retrieval for Integrative Temporal Adaptation

Weisi Liu , Guangzeng Han , Xiaolei Huang This is my paper

Pith reviewed 2026-05-09 20:56 UTC · model grok-4.3

classification 💻 cs.CL

keywords temporal adaptationknowledge integrationdata augmentationretrieval-augmented learningdomain shiftMeSH ontologyclassification taskslanguage models

0 comments

The pith

Knowledge integration is more critical and effective for temporal augmentation and learning in models facing data shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops KARITA to address how models trained on historical data encounter evolving semantic distributions and domain knowledge when deployed on future data. It constructs and integrates external knowledge sources such as medical ontologies to capture uncertainty and feature shifts, then applies retrieval-augmented learning based on those shifts. Evaluations on classification tasks across clinical, legal, and scientific corpora show consistent gains. The central result is that knowledge integration outperforms approaches that overlook or underuse such sources during temporal adaptation.

Core claim

We develop Knowledge-driven Augmentation and Retrieval for Integrative Temporal Adaptation (KARITA) to capture diverse temporal shifts (e.g., uncertainty and feature shift), construct and integrate rich knowledge sources (e.g., medical ontology like MeSH), and leverage shifting insights for selecting-retrieval augmented learning. We evaluate KARITA on classification tasks across multiple domains, clinical, legal, and scientific corpora, demonstrating consistent improvements across multiple domains with temporal adaptation. Our results show that knowledge integration can be more critical and effective in temporal augmentation and learning.

What carries the argument

KARITA, the framework that combines knowledge-driven data augmentation with retrieval-augmented learning to detect and adapt to temporal domain shifts using external sources like MeSH.

Load-bearing premise

External knowledge sources such as MeSH are sufficiently available, relevant, and free of noise or bias to guide augmentation and retrieval for observed temporal shifts.

What would settle it

A controlled test on temporal shift data where adding the knowledge integration and retrieval steps produces no accuracy gain or causes a drop compared to non-knowledge baselines.

Figures

Figures reproduced from arXiv: 2604.22098 by Guangzeng Han, Weisi Liu, Xiaolei Huang.

**Figure 2.** Figure 2: t-SNE visualization of target-domain representations on MIMIC-IV-Notes, EurLex, and arXiv-CS. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

Time introduces fundamental challenges in model development and deployment: models are usually trained on historical data while deployed on future data where semantic distributions and domain knowledge may evolve. Unfortunately, existing studies either overlook temporal shifts or hardly capture rich shifting patterns of both semantic and knowledge. We develop Knowledge-driven Augmentation and Retrieval for Integrative Temporal Adaptation (KARITA) to capture diverse temporal shifts (e.g., uncertainty and feature shift), construct and integrate rich knowledge sources (e.g., medical ontology like MeSH), and leverage shifting insights for selecting-retrieval augmented learning. We evaluate KARITA on classification tasks across multiple domains, clinical, legal, and scientific corpora, demonstrating consistent improvements across multiple domains with temporal adaptation. Our results show that knowledge integration can be more critical and effective in temporal augmentation and learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KARITA integrates external knowledge like MeSH into augmentation and retrieval to handle temporal shifts in NLP, but the setup leaves open whether gains come from genuine adaptation or future knowledge leakage.

read the letter

The core idea is to use knowledge sources to better capture both semantic and knowledge changes over time, rather than treating temporal adaptation as just data augmentation or retrieval alone. The authors frame KARITA as combining knowledge-driven augmentation, shift-aware retrieval, and integrative learning, then test it on classification across clinical, legal, and scientific corpora with reported consistent gains. That multi-domain evaluation is a plus for showing the method is not tied to one narrow setting. The integrative framing does appear to go beyond prior lines that handled temporal robustness or knowledge injection in isolation. The experiments give a concrete sense of where the approach might help in applied settings where models face evolving terminology and concepts. The main soft spot is the temporal integrity of the knowledge base. MeSH and similar ontologies are updated over time, so if the paper uses a current snapshot instead of period-specific frozen versions, some improvements could reflect leakage of post-training information rather than adaptation to distribution shift. The abstract does not mention versioning or filtering, and if the full paper lacks explicit checks on this, the central claim that knowledge integration is more critical than augmentation or retrieval alone rests on weaker ground. Ablation details and error analysis would also help confirm the contribution of each piece. This work is aimed at applied NLP researchers dealing with long-term model stability in domains like medicine or law. Readers looking for practical methods to mitigate temporal drift could extract useful implementation ideas from the setup, even if they need to add their own safeguards around knowledge freshness. It deserves a serious referee because it targets a real deployment problem with a specific proposal and cross-domain tests. I would send it to peer review but flag the need for temporal versioning experiments and component ablations.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes KARITA, a framework for integrative temporal adaptation that integrates external knowledge sources (e.g., MeSH ontology) into augmentation and retrieval mechanisms to handle semantic, uncertainty, and feature shifts over time. It evaluates the method on classification tasks across clinical, legal, and scientific corpora and concludes that knowledge integration is more critical and effective than augmentation or retrieval alone for temporal adaptation.

Significance. If the reported gains are shown to hold under strict temporal constraints on knowledge access, the work could offer a practical direction for improving model robustness in domains with evolving terminology and concepts, where structured external knowledge is routinely available.

major comments (1)

[Method] Method section on knowledge integration: the description of retrieving from MeSH and similar sources does not indicate any temporal versioning or filtering to ensure only information available up to each training cutoff is used. This is load-bearing for the central claim, because without it the observed improvements could partly reflect future-knowledge leakage rather than genuine adaptation to distribution shift.

minor comments (2)

[Abstract and Experiments] The abstract and evaluation sections would benefit from explicit statement of the exact baselines (e.g., standard retrieval-augmented generation without knowledge) and the precise temporal splits used in each domain.
[Framework] Notation for the shifting insights and selection-retrieval components could be clarified with a single running example across sections.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment below and describe the planned revisions.

read point-by-point responses

Referee: [Method] Method section on knowledge integration: the description of retrieving from MeSH and similar sources does not indicate any temporal versioning or filtering to ensure only information available up to each training cutoff is used. This is load-bearing for the central claim, because without it the observed improvements could partly reflect future-knowledge leakage rather than genuine adaptation to distribution shift.

Authors: We agree that explicit temporal constraints on knowledge access are essential to support the central claim of genuine adaptation rather than leakage. The current Method section does not describe versioning or cutoff-based filtering for sources such as MeSH. We will revise the manuscript to add a dedicated subsection detailing the knowledge versions employed, the exact filtering rules applied at each training cutoff, and verification steps confirming that only pre-cutoff information was used. These changes will be included in the revised version. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract and provided description present KARITA as a proposed method that integrates external knowledge sources for handling temporal shifts, followed by empirical evaluation on classification tasks across domains. No derivation chain, equations, or steps are shown that reduce by construction to fitted parameters, self-definitions, or self-citation chains. The central claim rests on reported empirical improvements rather than tautological renaming or imported uniqueness. This is the common case of an independent empirical proposal with no load-bearing circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the method is presented at the level of a named framework without mathematical or implementation specifics.

pith-pipeline@v0.9.0 · 5427 in / 1019 out tokens · 45361 ms · 2026-05-09T20:56:44.546048+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 26 canonical work pages · 1 internal anchor

[1]

A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , url =

Lee, Kimin and Lee, Kibok and Lee, Honglak and Shin, Jinwoo , booktitle =. A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , url =
[2]

2018 , publisher=

On the generalized distance in statistics , journal=. 2018 , publisher=

2018
[3]

Proceedings of the National Institute of Science of India , volume=

On the generalised distance in statistics , author=. Proceedings of the National Institute of Science of India , volume=
[4]

2025 , eprint=

Cultivating Multidisciplinary Research and Education on GPU Infrastructure for Mid-South Institutions at the University of Memphis: Practice and Challenge , author=. 2025 , eprint=

2025
[5]

2025 , howpublished =

Medical Subject Headings (. 2025 , howpublished =

2025
[6]

and Osborne, Francesco and Thanapalasingam, Thiviyan and Motta, Enrico , title =

Salatino, Angelo A. and Osborne, Francesco and Thanapalasingam, Thiviyan and Motta, Enrico , title =. Digital Libraries for Open Knowledge: 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Oslo, Norway, September 9-12, 2019, Proceedings , pages =. 2019 , isbn =. doi:10.1007/978-3-030-30760-8_26 , abstract =

work page doi:10.1007/978-3-030-30760-8_26 2019
[7]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , year =

Chalkidis, Ilias and Fergadiotis, Manos and Androutsopoulos, Ion , title =. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , year =

2021
[8]

The E uro V oc Thesaurus: Management, Applications, and Future Directions

Walhain, Lucy and Albouze, S \'e bastien and Gerencs \'e r, Anik \'o and Paunescu, Mihai and Tzouvaras, Vassilis and Palma, Cosimo. The E uro V oc Thesaurus: Management, Applications, and Future Directions. Proceedings of the 5th Conference on Language, Data and Knowledge. 2025

2025
[9]

and Thanapalasingam, Thiviyan and Mannocci, Andrea and Birukou, Aliaksandr and Osborne, Francesco and Motta, Enrico , title =

Salatino, Angelo A. and Thanapalasingam, Thiviyan and Mannocci, Andrea and Birukou, Aliaksandr and Osborne, Francesco and Motta, Enrico , title =. Data Intelligence , volume =. 2020 , doi =

2020
[10]

2023 , publisher=

MIMIC-IV-Note: Deidentified free-text clinical notes , author=. 2023 , publisher=

2023
[11]

circulation , volume=

PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals , author=. circulation , volume=. 2000 , publisher=

2000
[12]

MIMIC-IV

MIMIC-IV (version 2.2) , author=. PhysioNet. Available from: https://doi.org/10.13026/6mm1-ek67. , year=

work page doi:10.13026/6mm1-ek67
[13]

Scientific Data , number =

Johnson, Alistair E W and Bulgarelli, Lucas and Shen, Lu and Gayles, Alvin and Shammout, Ayad and Horng, Steven and Pollard, Tom J and Hao, Sicheng and Moody, Benjamin and Gow, Brian and Lehman, Li-wei H and Celi, Leo A and Mark, Roger G , doi =. Scientific Data , number =
[14]

ICD-Mappings , year =

Gon. ICD-Mappings , year =. GitHub repository , url =
[15]

Wehbe and Faraz S

Clinical-longformer and clinical-bigbird: Transformers for long clinical sequences , author=. arXiv preprint arXiv:2201.11838 , year=

work page arXiv
[16]

Longformer: The Long-Document Transformer

Longformer: The Long-Document Transformer , author=. arXiv:2004.05150 , year=

work page internal anchor Pith review arXiv 2004
[17]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , editor=

Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm \'a n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin. Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ...

work page doi:10.18653/v1/2020.acl-main.747 2020
[18]

C hronos L ex: Time-aware Incremental Training for Temporal Generalization of Legal Classification Tasks

T.y.s.s, Santosh and Vuong, Tuan-Quang and Grabmair, Matthias. C hronos L ex: Time-aware Incremental Training for Temporal Generalization of Legal Classification Tasks. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.166

work page doi:10.18653/v1/2024.acl-long.166 2024
[19]

Temporal Effects on Pre-trained Models for Language Processing Tasks

Agarwal, Oshin and Nenkova, Ani. Temporal Effects on Pre-trained Models for Language Processing Tasks. Transactions of the Association for Computational Linguistics. 2022. doi:10.1162/tacl_a_00497

work page doi:10.1162/tacl_a_00497 2022
[20]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

Beware of Model Collapse! Fast and Stable Test-time Adaptation for Robust Question Answering , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

2023
[21]

The Eleventh International Conference on Learning Representations , year=

Towards Stable Test-time Adaptation in Dynamic Wild World , author=. The Eleventh International Conference on Learning Representations , year=
[22]

Proceedings of the 39th International Conference on Machine Learning , pages =

Efficient Test-Time Model Adaptation without Forgetting , author =. Proceedings of the 39th International Conference on Machine Learning , pages =. 2022 , editor =

2022
[23]

International Conference on Learning Representations , year=

Tent: Fully Test-Time Adaptation by Entropy Minimization , author=. International Conference on Learning Representations , year=
[24]

2023 , eprint=

A Survey of Large Language Models , author=. 2023 , eprint=

2023
[25]

Text Classification via Large Language Models

Sun, Xiaofei and Li, Xiaoya and Li, Jiwei and Wu, Fei and Guo, Shangwei and Zhang, Tianwei and Wang, Guoyin. Text Classification via Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.603

work page doi:10.18653/v1/2023.findings-emnlp.603 2023
[26]

Neural Temporality Adaptation for Document Classification: Diachronic Word Embeddings and Domain Adaptation Models

Huang, Xiaolei and Paul, Michael J. Neural Temporality Adaptation for Document Classification: Diachronic Word Embeddings and Domain Adaptation Models. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1403

work page doi:10.18653/v1/p19-1403 2019
[27]

Improving Time Sensitivity for Question Answering over Temporal Knowledge Graphs

Shang, Chao and Wang, Guangtao and Qi, Peng and Huang, Jing. Improving Time Sensitivity for Question Answering over Temporal Knowledge Graphs. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.552

work page doi:10.18653/v1/2022.acl-long.552 2022
[28]

M ed A dapter: Efficient Test-Time Adaptation of Large Language Models Towards Medical Reasoning

Shi, Wenqi and Xu, Ran and Zhuang, Yuchen and Yu, Yue and Sun, Haotian and Wu, Hang and Yang, Carl and Wang, May Dongmei. M ed A dapter: Efficient Test-Time Adaptation of Large Language Models Towards Medical Reasoning. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1244

work page doi:10.18653/v1/2024.emnlp-main.1244 2024
[29]

Examining and Adapting Time for Multilingual Classification via Mixture of Temporal Experts

Liu, Weisi and Han, Guangzeng and Huang, Xiaolei. Examining and Adapting Time for Multilingual Classification via Mixture of Temporal Experts. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.naacl-long.313

work page doi:10.18653/v1/2025.naacl-long.313 2025
[30]

AMIA Annual Symposium Proceedings , volume=

Time matters: Examine temporal effects on biomedical language models , author=. AMIA Annual Symposium Proceedings , volume=. 2024 , pmid =

2024
[31]

Modeling Temporality of Human Intentions by Domain Adaptation

Huang, Xiaolei and Liu, Lixing and Carey, Kate and Woolley, Joshua and Scherer, Stefan and Borsari, Brian. Modeling Temporality of Human Intentions by Domain Adaptation. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1074

work page doi:10.18653/v1/d18-1074 2018
[32]

doi: 10.18653/v1/2020.emnlp-demos.6

Wolf, Thomas and Debut, Lysandre and Sanh, Victor and Chaumond, Julien and Delangue, Clement and Moi, Anthony and Cistac, Pierric and Rault, Tim and Louf, Remi and Funtowicz, Morgan and Davison, Joe and Shleifer, Sam and von Platen, Patrick and Ma, Clara and Jernite, Yacine and Plu, Julien and Xu, Canwen and Le Scao, Teven and Gugger, Sylvain and Drame, M...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020
[33]

Learning Dynamic Contextualised Word Embeddings via Template-based Temporal Adaptation

Tang, Xiaohang and Zhou, Yi and Bollegala, Danushka. Learning Dynamic Contextualised Word Embeddings via Template-based Temporal Adaptation. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.520

work page doi:10.18653/v1/2023.acl-long.520 2023
[34]

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=

Examining temporality in document classification , author=. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=
[35]

Proceedings of the 40th International Conference on Machine Learning , articleno =

He, Huan and Queen, Owen and Koker, Teddy and Cuevas, Consuelo and Tsiligkaridis, Theodoros and Zitnik, Marinka , title =. Proceedings of the 40th International Conference on Machine Learning , articleno =. 2023 , publisher =

2023
[37]

Improved Multi-label Classification under Temporal Concept Drift: Rethinking Group-Robust Algorithms in a Label-Wise Setting

Chalkidis, Ilias and S gaard, Anders. Improved Multi-label Classification under Temporal Concept Drift: Rethinking Group-Robust Algorithms in a Label-Wise Setting. Findings of the Association for Computational Linguistics: ACL 2022. 2022. doi:10.18653/v1/2022.findings-acl.192

work page doi:10.18653/v1/2022.findings-acl.192 2022
[38]

Cole, Julian Martin Eisenschlos, Daniel Gillick, Jacob Eisen- stein, and William W

Dhingra, Bhuwan and Cole, Jeremy R. and Eisenschlos, Julian Martin and Gillick, Daniel and Eisenstein, Jacob and Cohen, William W. Time-Aware Language Models as Temporal Knowledge Bases. Transactions of the Association for Computational Linguistics. 2022. doi:10.1162/tacl_a_00459

work page doi:10.1162/tacl_a_00459 2022
[39]

Improving Temporal Generalization of Pre-trained Language Models with Lexical Semantic Change

Su, Zhaochen and Tang, Zecheng and Guan, Xinyan and Wu, Lijun and Zhang, Min and Li, Juntao. Improving Temporal Generalization of Pre-trained Language Models with Lexical Semantic Change. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.428

work page doi:10.18653/v1/2022.emnlp-main.428 2022
[40]

Predict the Future from the Past? On the Temporal Data Distribution Shift in Financial Sentiment Classifications

Guo, Yue and Hu, Chenxi and Yang, Yi. Predict the Future from the Past? On the Temporal Data Distribution Shift in Financial Sentiment Classifications. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.65

work page doi:10.18653/v1/2023.emnlp-main.65 2023
[41]

Retrieval–reasoning processes for multi-hop question an- swering: A four-axis design framework and empirical trends

Retrieval--Reasoning Processes for Multi-hop Question Answering: A Four-Axis Design Framework and Empirical Trends , author=. arXiv preprint arXiv:2601.00536 , year=

work page arXiv
[42]

Mrag-suite: A di- agnostic evaluation platform for visual retrieval-augmented generation.arXiv preprint arXiv:2509.24253, 2025

MRAG-Suite: A Diagnostic Evaluation Platform for Visual Retrieval-Augmented Generation , author=. arXiv preprint arXiv:2509.24253 , year=

work page arXiv
[43]

The Fourteenth International Conference on Learning Representations , year=

Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models , author=. The Fourteenth International Conference on Learning Representations , year=
[44]

Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views

Margatina, Katerina and Wang, Shuai and Vyas, Yogarshi and Anna John, Neha and Benajiba, Yassine and Ballesteros, Miguel. Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. 2023. doi:10.18653/v1/2023.eacl-main.211

work page doi:10.18653/v1/2023.eacl-main.211 2023
[45]

Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media

R. Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media. Findings of the Association for Computational Linguistics: EMNLP 2021. 2021. doi:10.18653/v1/2021.findings-emnlp.206

work page doi:10.18653/v1/2021.findings-emnlp.206 2021
[46]

PyTorch: An Imperative Style, High-Performance Deep Learning Library , url =

Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu an...
[47]

Diachronic word embeddings and semantic shifts: a survey

Kutuzov, Andrey and vrelid, Lilja and Szymanski, Terrence and Velldal, Erik. Diachronic word embeddings and semantic shifts: a survey. Proceedings of the 27th International Conference on Computational Linguistics. 2018

2018
[48]

Can Word Sense Distribution Detect Semantic Changes of Words?

Tang, Xiaohang and Zhou, Yi and Aida, Taichi and Sen, Procheta and Bollegala, Danushka. Can Word Sense Distribution Detect Semantic Changes of Words?. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.231

work page doi:10.18653/v1/2023.findings-emnlp.231 2023
[49]

International Conference on Learning Representations , year=

Towards Continual Knowledge Learning of Language Models , author=. International Conference on Learning Representations , year=
[50]

Journal of Healthcare Informatics Research , author =

A. Journal of Healthcare Informatics Research , author =. doi:10.1007/s41666-026-00229-9 , abstract =

work page doi:10.1007/s41666-026-00229-9
[51]

Attributes as Textual Genes: Leveraging LLM s as Genetic Algorithm Simulators for Conditional Synthetic Data Generation

Han, Guangzeng and Liu, Weisi and Huang, Xiaolei. Attributes as Textual Genes: Leveraging LLM s as Genetic Algorithm Simulators for Conditional Synthetic Data Generation. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.1055

work page doi:10.18653/v1/2025.findings-emnlp.1055 2025