Recognition: unknown
CoDA: Towards Effective Cross-domain Knowledge Transfer via CoT-guided Domain Adaptation
Pith reviewed 2026-05-10 02:07 UTC · model grok-4.3
The pith
A lightweight adapter aligns latent reasoning patterns across domains in LLMs by distilling CoT-enriched representations and matching them with MMD.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CoDA employs a lightweight adapter to directly intervene in the intermediate hidden states of LLMs. By combining feature-based distillation of CoT-enriched reference representations with Maximum Mean Discrepancy for kernelized distribution matching, the method aligns the latent reasoning representation of the source and target domains, allowing more effective transfer of cross-domain knowledge for logical reasoning.
What carries the argument
Lightweight adapter that intervenes in intermediate hidden states through CoT-enriched feature distillation plus MMD-based distribution matching to align source and target reasoning representations.
If this is right
- Cross-domain samples become usable as surrogate in-context demonstrations for logical reasoning tasks.
- Performance improves substantially in expertise-scarce domains such as certain scientific or legal fields.
- The gains hold across different LLM families on multiple reasoning benchmarks.
- The method outperforms previous state-of-the-art cross-domain baselines by a large margin.
Where Pith is reading between the lines
- Reasoning patterns may be partially separable from surface domain content at the level of hidden states.
- Similar hidden-state adapters could be tested on non-reasoning transfer tasks such as generation or classification.
- The approach raises the possibility that targeted interventions inside the model can substitute for ever-larger context windows or retrieval systems.
Load-bearing premise
Pronounced domain shift between source and target can be overcome by aligning latent reasoning patterns that exist in intermediate hidden states and can be captured via CoT distillation and MMD.
What would settle it
If applying the CoDA adapter produces no measurable reduction in domain discrepancy metrics or no accuracy gains over raw cross-domain prompting on held-out logical reasoning benchmarks, the alignment approach would be shown ineffective.
Figures
read the original abstract
Large language models (LLMs) have achieved substantial advances in logical reasoning, yet they continue to lag behind human-level performance. In-context learning provides a viable solution that boosts the model's performance via prompting its input with expert-curated, in-domain exemplars. However, in many real-world, expertise-scarce domains, such as low-resource scientific disciplines, emerging biomedical subfields, or niche legal jurisdictions, such high-quality in-domain demonstrations are inherently limited or entirely unavailable, thereby constraining the general applicability of these approaches. To mitigate this limitation, recent efforts have explored the retrieval of cross-domain samples as surrogate in-context demonstrations. Nevertheless, the resulting gains remain modest. This is largely attributable to the pronounced domain shift between source and target distributions, which impedes the model's ability to effectively identify and exploit underlying shared structures or latent reasoning patterns. Consequently, when relying solely on raw textual prompting, LLMs struggle to abstract and transfer such cross-domain knowledge in a robust and systematic manner. To address these issues, we propose CoDA, which employs a lightweight adapter to directly intervene in the intermediate hidden states. By combining feature-based distillation of CoT-enriched reference representations with Maximum Mean Discrepancy (MMD) for kernelized distribution matching, our method aligns the latent reasoning representation of the source and target domains. Extensive experimental results on multiple logical reasoning tasks across various model families validate the efficacy of CoDA by significantly outperforming the previous state-of-the-art baselines by a large margin.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes CoDA, a cross-domain adaptation technique for LLMs on logical reasoning tasks. It introduces a lightweight adapter that intervenes directly on intermediate hidden states, combining feature-based distillation of Chain-of-Thought (CoT) enriched source representations with Maximum Mean Discrepancy (MMD) kernel matching to align latent reasoning distributions between source and target domains. The central claim is that this alignment enables robust transfer of underlying reasoning patterns, yielding large-margin gains over prior SOTA baselines across multiple tasks and model families in low-resource domains lacking in-domain exemplars.
Significance. If the alignment mechanism demonstrably transfers step-by-step logical structures rather than merely reducing surface-level distributional mismatch, the work could meaningfully extend in-context learning to expertise-scarce domains such as emerging biomedical or niche legal fields, moving beyond retrieval-based prompting.
major comments (2)
- [Abstract] Abstract: The assertion that CoDA 'significantly outperform[s] the previous state-of-the-art baselines by a large margin' supplies no quantitative metrics, baseline names, task-specific results, ablation controls, or statistical significance tests, making the central empirical claim impossible to evaluate from the summary alone and requiring full experimental tables for assessment.
- [Method] Method description (abstract and implied §3): The claim that MMD on CoT-distilled hidden states 'aligns the latent reasoning representation' rests on the untested assumption that shared reasoning structures exist and are capturable at the level of kernelized activation matching; without ablations isolating the CoT distillation component, error analysis on reasoning-step fidelity, or probing for logical transfer (vs. generic regularization), it remains unclear whether gains reflect reasoning alignment or superficial statistic matching.
minor comments (2)
- [Abstract] Abstract: The phrase 'CoT-enriched reference representations' is used without an inline definition or citation to the specific Chain-of-Thought prompting protocol employed for the source references.
- [Abstract] Abstract: 'Large margin' is repeated without anchoring to concrete effect sizes or comparison tables that would appear in the experimental section.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our work. We address each major comment below, clarifying the manuscript's contributions and outlining targeted revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion that CoDA 'significantly outperform[s] the previous state-of-the-art baselines by a large margin' supplies no quantitative metrics, baseline names, task-specific results, ablation controls, or statistical significance tests, making the central empirical claim impossible to evaluate from the summary alone and requiring full experimental tables for assessment.
Authors: We agree that the abstract would be strengthened by including concrete quantitative support. In the revised version, we will incorporate specific metrics (e.g., average accuracy gains and baseline names across the main tasks) while referencing the full results tables and statistical tests in Section 4. This change will make the central claim directly evaluable from the abstract without exceeding length constraints. revision: yes
-
Referee: [Method] Method description (abstract and implied §3): The claim that MMD on CoT-distilled hidden states 'aligns the latent reasoning representation' rests on the untested assumption that shared reasoning structures exist and are capturable at the level of kernelized activation matching; without ablations isolating the CoT distillation component, error analysis on reasoning-step fidelity, or probing for logical transfer (vs. generic regularization), it remains unclear whether gains reflect reasoning alignment or superficial statistic matching.
Authors: The manuscript already contains ablations in Section 4.3 that isolate the CoT distillation component, demonstrating that its removal leads to substantial performance drops relative to the full CoDA model. Qualitative examples of transferred reasoning chains are also provided in the appendix. We acknowledge, however, that additional quantitative probing for step-level logical fidelity versus generic regularization would provide stronger evidence. We will add a new analysis subsection with error categorization on reasoning steps and a controlled comparison against a non-CoT MMD baseline. revision: partial
Circularity Check
No circularity: empirical architecture with no self-referential derivations or load-bearing self-citations
full rationale
The paper presents CoDA as a practical method combining a lightweight adapter, feature-based distillation of CoT-enriched representations, and MMD for distribution alignment. No equations, derivations, or parameter-fitting steps are described that reduce the claimed cross-domain transfer gains to inputs by construction. The abstract and method description treat the components as independent design choices validated empirically, without invoking uniqueness theorems, renaming known results, or self-citation chains for the core premise. This matches the reader's assessment of an empirical architecture lacking self-referential structure. The central claim remains open to external validation via experiments rather than being tautological.
Axiom & Free-Parameter Ledger
free parameters (2)
- adapter parameters
- MMD kernel hyperparameters
axioms (1)
- domain assumption Source and target domains share latent reasoning patterns that can be aligned in hidden states
Reference graph
Works this paper leans on
-
[1]
A Survey of Large Language Models
W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y . Hou, Y . Min, B. Zhang, J. Zhang, Z. Donget al., “A survey of large language models,”arXiv preprint arXiv:2303.18223, vol. 1, no. 2, pp. 1–124, 2023
work page internal anchor Pith review arXiv 2023
-
[2]
Mathprompter: Mathematical reasoning using large language models,
S. Imani, L. Du, and H. Shrivastava, “Mathprompter: Mathematical reasoning using large language models,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), 2023, pp. 37–42
2023
-
[3]
Towards reasoning in large language models: A survey,
J. Huang and K. C.-C. Chang, “Towards reasoning in large language models: A survey,” inFindings of the association for computational linguistics: ACL 2023, 2023, pp. 1049–1065
2023
-
[4]
Chain-of-thought prompting elicits reasoning in large language models,
J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022
2022
-
[5]
Towards robust deep reinforcement learning-based quantitative trading with neuro-symbolic trend analysis,
J. Jiang, Z. Li, A. Cui, X. Zhou, B. Li, and D. Sun, “Towards robust deep reinforcement learning-based quantitative trading with neuro-symbolic trend analysis,”Neural Networks, vol. 201, p. 108924,
-
[6]
Available: https://www.sciencedirect.com/science/article/ pii/S0893608026003850
[Online]. Available: https://www.sciencedirect.com/science/article/ pii/S0893608026003850
-
[7]
Investigating training data detection in ai coders,
T. Li, Y . Wei, Z. Li, A. Liu, Q. Guo, X. Liu, D. Sun, and Y . Liu, “Investigating training data detection in ai coders,” 2025. [Online]. Available: https://arxiv.org/abs/2507.17389
-
[8]
Logic-q: Improving deep reinforcement learning-based quantitative trading via program sketch-based tuning,
Z. Li, J. Jiang, Y . Cao, A. Cui, B. Wu, B. Li, Y . Liu, and D. D. Sun, “Logic-q: Improving deep reinforcement learning-based quantitative trading via program sketch-based tuning,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 17, pp. 18 584–18 592, Apr. 2025. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/ article/v...
2025
-
[9]
Language models are few-shot learners,
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language models are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020
1901
-
[10]
In-context learning with retrieved demonstrations for language models: A survey,
M. Luo, X. Xu, Y . Liu, P. Pasupat, and M. Kazemi, “In-context learning with retrieved demonstrations for language models: A survey,”arXiv preprint arXiv:2401.11624, 2024
-
[11]
Learning to retrieve prompts for in-context learning,
O. Rubin, J. Herzig, and J. Berant, “Learning to retrieve prompts for in-context learning,” inProceedings of the 2022 conference of the North American chapter of the association for computational linguistics: human language technologies, 2022, pp. 2655–2671
2022
-
[12]
Learning to retrieve in-context examples for large language models,
L. Wang, N. Yang, and F. Wei, “Learning to retrieve in-context examples for large language models,” inProceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 1752–1767
2024
-
[13]
Uncertainty quantification for in-context learning of large language models,
C. Ling, X. Zhao, X. Zhang, W. Cheng, Y . Liu, Y . Sun, M. Oishi, T. Osaki, K. Matsuda, J. Jiet al., “Uncertainty quantification for in-context learning of large language models,” inProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024, p...
2024
-
[14]
Reasoning graph enhanced exemplars retrieval for in-context learning,
Y . Lin, B. Zhong, S. Jiang, J. Siebert, and Q. Chen, “Reasoning graph enhanced exemplars retrieval for in-context learning,” inProceedings of the 31st International Conference on Computational Linguistics, 2025, pp. 9737–9759
2025
-
[15]
Improving neural logic machines via failure reflection,
Z. Li, Y . Cao, Y . ZHENG, X. Liu, B. Wu, T. Li, X. Xu, J. Jiang, Y . S. Teo, S.-W. Lin, and Y . Liu, “Improving neural logic machines via failure reflection,” inForty-first International Conference on Machine Learning,
-
[16]
Available: https://openreview.net/forum?id=JObct1zyTb
[Online]. Available: https://openreview.net/forum?id=JObct1zyTb
-
[17]
Fedmut: Generalized federated learning via stochastic mutation,
M. Hu, Y . Cao, A. Li, Z. Li, C. Liu, T. Li, M. Chen, and Y . Liu, “Fedmut: Generalized federated learning via stochastic mutation,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 11, pp. 12 528–12 537, Mar. 2024. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/29146
2024
-
[18]
B. Wu, S. Liu, Y . Xiao, Z. Li, J. Sun, and S.-W. Lin, “Learning program semantics for vulnerability detection via vulnerability-specific inter-procedural slicing,” inProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2023. New York, NY , USA: Association for C...
-
[19]
FAIRER: Fairness as decision rationale alignment,
T. Li, Q. Guo, A. Liu, M. Du, Z. Li, and Y . Liu, “FAIRER: Fairness as decision rationale alignment,” inProceedings of the 40th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, Eds., vol. 202. PMLR, 23–29 Jul 2023, pp. 19 471–19 489. [On...
2023
-
[20]
L. Liu, Z. Li, J. Yan, Z. Yuan, S. Chen, Y . Pan, B. Tang, Q. Chen, Y . Xiang, and D. D. Sun, “Reason analogically via cross-domain prior knowledge: An empirical study of cross-domain knowledge transfer for in-context learning,” 2026. [Online]. Available: https://arxiv.org/abs/2604.05396
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[21]
X. W. Tan, N. Tan, G. Lee, and S. Kok, “The shape of reasoning: Topological analysis of reasoning traces in large language models,”arXiv preprint arXiv:2510.20665, 2025
-
[22]
J. Yan, Z. Li, L. Liu, Z. Yuan, S. Chen, Y . Pan, B. Tang, Y . Xiang, and D. D. Sun, “Towards effective in-context cross-domain knowledge transfer via domain-invariant-neurons-based retrieval,” 2026. [Online]. Available: https://arxiv.org/abs/2604.05383
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[23]
Large language models can be lazy learners: Analyze shortcuts in in-context learning,
R. Tang, D. Kong, L. Huanget al., “Large language models can be lazy learners: Analyze shortcuts in in-context learning,” inFindings of the association for computational linguistics: ACL 2023, 2023, pp. 4645–4657
2023
-
[24]
Exploring and mitigating shortcut learning for generative large language models,
Z. Sun, Y . Xiao, J. Li, Y . Ji, W. Chen, and M. Zhang, “Exploring and mitigating shortcut learning for generative large language models,” in Proceedings of the 2024 joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024), 2024, pp. 6883–6893
2024
-
[25]
Examining the robustness of llm evaluation to the distributional assumptions of benchmarks,
C. Siska, K. Marazopoulou, M. Ailem, and J. Bono, “Examining the robustness of llm evaluation to the distributional assumptions of benchmarks,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 10 406–10 421
2024
-
[26]
Graph of thoughts: Solving elaborate problems with large language models,
M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, J. Gajda, T. Lehmann, H. Niewiadomski, P. Nyczyket al., “Graph of thoughts: Solving elaborate problems with large language models,” inProceedings of the AAAI conference on artificial intelligence, vol. 38, no. 16, 2024, pp. 17 682–17 690
2024
-
[27]
Demystifying chains, trees, and graphs of thoughts,
M. Besta, F. Memedi, Z. Zhang, R. Gerstenberger, G. Piao, N. Blach, P. Nyczyk, M. Copik, G. Kwa ´sniewski, J. Mülleret al., “Demystifying chains, trees, and graphs of thoughts,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
2025
-
[28]
Enhanced data synthesis for llm through reasoning structures generated by hierarchical gflownet,
T. Bu, M. Zhang, H. Duan, S. Li, L. Hu, and Y . Li, “Enhanced data synthesis for llm through reasoning structures generated by hierarchical gflownet,” inFindings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 15 931–15 958
2025
-
[29]
arXiv preprint arXiv:2408.09172 , year=
H.-Y . Huang, Z. Wu, Y . Yang, J. Zhang, and Y . Wu, “Unlocking the power of llm uncertainty for active in-context example selection,”arXiv preprint arXiv:2408.09172, 2024
-
[30]
Using natural language explanations to improve robustness of in-context learning,
X. He, Y . Wu, O.-M. Camburu, P. Minervini, and P. Stenetorp, “Using natural language explanations to improve robustness of in-context learning,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 13 477– 13 499
2024
-
[31]
A kernel two-sample test,
A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, “A kernel two-sample test,”The journal of machine learning research, vol. 13, no. 1, pp. 723–773, 2012
2012
-
[32]
A survey on transfer learning,
S. J. Pan and Q. Yang, “A survey on transfer learning,”IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2009
2009
-
[33]
A survey of multi-source domain adaptation,
S. Sun, H. Shi, and Y . Wu, “A survey of multi-source domain adaptation,” Information Fusion, vol. 24, pp. 84–92, 2015
2015
-
[34]
A brief review of domain adaptation,
A. Farahani, S. V oghoei, K. Rasheed, and H. R. Arabnia, “A brief review of domain adaptation,”Advances in data science and information engineering: proceedings from ICDATA 2020 and IKE 2020, pp. 877–894, 2021
2020
-
[35]
Cot vectors: Transfer- ring and probing the reasoning mechanisms of llms,
L. Li, Z. Wang, Y . Wu, J. Cai, and X. Yang, “Cot vectors: Transfer- ring and probing the reasoning mechanisms of llms,”arXiv preprint arXiv:2510.00579, 2025
-
[36]
Training Verifiers to Solve Math Word Problems
K. Cobbe, V . Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakanoet al., “Training verifiers to solve math word problems,”arXiv preprint arXiv:2110.14168, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[37]
Non-interactive symbolic-aided chain-of-thought for logical reasoning,
P. M. Nguyen, T. Dang, and N. Inoue, “Non-interactive symbolic-aided chain-of-thought for logical reasoning,” inProceedings of the 39th Pacific Asia Conference on Language, Information and Computation, 2025, pp. 329–340
2025
-
[38]
Folio: Natural language reasoning with first-order logic,
S. Han, H. Schoelkopf, Y . Zhao, Z. Qi, M. Riddell, W. Zhou, J. Coady, D. Peng, Y . Qiao, L. Bensonet al., “Folio: Natural language reasoning with first-order logic,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 22 017– 22 031
2024
-
[39]
Proofwriter: Generating implications, proofs, and abductive statements over natural language,
O. Tafjord, B. Dalvi, and P. Clark, “Proofwriter: Generating implications, proofs, and abductive statements over natural language,” inFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 3621–3634
2021
-
[40]
Commonsenseqa: A question answering challenge targeting commonsense knowledge,
A. Talmor, J. Herzig, N. Lourie, and J. Berant, “Commonsenseqa: A question answering challenge targeting commonsense knowledge,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4149–4158
2019
-
[41]
Gemma 3 technical report,
G. Team, A. Kamath, J. Ferret, S. Pathak, N. Vieillard, R. Merhej, S. Perrin, T. Matejovicova, A. Ramé, M. Rivière, L. Rouillard, T. Mesnard, G. Cideron, J. bastien Grill, S. Ramos, E. Yvinec, M. Casbon, E. Pot, I. Penchev, G. Liu, F. Visin, K. Kenealy, L. Beyer, X. Zhai, A. Tsitsulin, R. Busa-Fekete, A. Feng, N. Sachdeva, B. Coleman, Y . Gao, B. Mustafa,...
-
[42]
[Online]. Available: https://arxiv.org/abs/2503.19786
work page internal anchor Pith review Pith/arXiv arXiv
-
[43]
Qwen2.5: A party of foundation models,
Q. Team, “Qwen2.5: A party of foundation models,” September 2024. [Online]. Available: https://qwenlm.github.io/blog/qwen2.5/
2024
-
[44]
Robertson and H
S. Robertson and H. Zaragoza,The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc, 2009, vol. 4
2009
-
[45]
Retrieval- augmented generation for knowledge-intensive nlp tasks,
P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020
2020
-
[46]
Revisiting demonstration selection strategies in in-context learning,
K. Peng, L. Ding, Y . Yuan, X. Liu, M. Zhang, Y . Ouyang, and D. Tao, “Revisiting demonstration selection strategies in in-context learning,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 9090– 9101
2024
-
[47]
Lora: Low-rank adaptation of large language models
E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” Iclr, vol. 1, no. 2, p. 3, 2022
2022
-
[48]
P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks,
X. Liu, K. Ji, Y . Fu, W. Tam, Z. Du, Z. Yang, and J. Tang, “P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2022, pp. 61–68
2022
-
[49]
Steering Llama 2 via Contrastive Activation Addition
N. Panickssery, N. Gabrieli, J. Schulz, M. Tong, E. Hubinger, and A. M. Turner, “Steering llama 2 via contrastive activation addition,”arXiv preprint arXiv:2312.06681, 2023
work page internal anchor Pith review arXiv 2023
-
[50]
Adam: A method for stochastic optimization,
K. Diederik, “Adam: A method for stochastic optimization,”(No Title), 2014
2014
-
[51]
C- pack: Packed resources for general chinese embeddings,
S. Xiao, Z. Liu, P. Zhang, N. Muennighoff, D. Lian, and J.-Y . Nie, “C- pack: Packed resources for general chinese embeddings,” inProceedings of the 47th international ACM SIGIR conference on research and development in information retrieval, 2024, pp. 641–649
2024
-
[52]
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Y . Zheng, R. Zhang, J. Zhang, Y . Ye, Z. Luo, Z. Feng, and Y . Ma, “Llamafactory: Unified efficient fine-tuning of 100+ language models,” JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 12 inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). Bangkok, Thailand: Association f...
work page internal anchor Pith review arXiv 2021
-
[53]
Visualizing data using t-sne
L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.”Journal of machine learning research, vol. 9, no. 11, 2008
2008
-
[54]
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,
P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,”Journal of computational and applied mathematics, vol. 20, pp. 53–65, 1987
1987
-
[55]
A test metric for assessing single-cell rna-seq batch correction,
M. Büttner, Z. Miao, F. A. Wolf, S. A. Teichmann, and F. J. Theis, “A test metric for assessing single-cell rna-seq batch correction,”Nature methods, vol. 16, no. 1, pp. 43–49, 2019
2019
-
[56]
Adaptation odyssey in llms: Why does additional pretraining sometimes fail to improve?
F. Öncel, M. Bethge, B. Ermis, M. Ravanelli, C. Subakan, and Ç. Yıldız, “Adaptation odyssey in llms: Why does additional pretraining sometimes fail to improve?” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 19 834–19 843
2024
-
[57]
Understanding multimodal llms under distribution shifts: An information-theoretic approach, 2025
C. Oh, Z. Fang, S. Im, X. Du, and Y . Li, “Understanding multimodal llms under distribution shifts: An information-theoretic approach,”arXiv preprint arXiv:2502.00577, 2025
-
[58]
Dawin: Training-free dynamic weight interpolation for robust adaptation,
C. Oh, Y . Li, K. Song, S. Yun, and D. Han, “Dawin: Training-free dynamic weight interpolation for robust adaptation,”arXiv preprint arXiv:2410.03782, 2024
-
[59]
Dong Yan, Gaochen Wu, and Bowen Zhou
J. Wu, Y . George, J. Ye, Y . Wu, D. F. Schmidt, and J. Cai, “Spine: Token-selective test-time reinforcement learning with entropy-band regularization,”arXiv preprint arXiv:2511.17938, 2025
-
[60]
arXiv preprint arXiv:2406.09265 , year=
W. Wang, B. Haddow, M. Wu, W. Peng, and A. Birch, “Sharing matters: Analysing neurons across languages and tasks in llms,”arXiv preprint arXiv:2406.09265, 2024
-
[61]
How programming concepts and neurons are shared in code language models,
A. H. Kargaran, Y . Liu, F. Yvon, and H. Schütze, “How programming concepts and neurons are shared in code language models,” inFindings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 26 905–26 917
2025
-
[62]
Calibrate before use: Improving few-shot performance of language models,
Z. Zhao, E. Wallace, S. Feng, D. Klein, and S. Singh, “Calibrate before use: Improving few-shot performance of language models,” in International conference on machine learning. Pmlr, 2021, pp. 12 697– 12 706
2021
-
[63]
Exploring explanations improves the robustness of in-context learning,
U. Honda and T. Oka, “Exploring explanations improves the robustness of in-context learning,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 23 693–23 714
2025
-
[64]
Intrinsic dimensionality explains the effectiveness of language model fine-tuning,
A. Aghajanyan, S. Gupta, and L. Zettlemoyer, “Intrinsic dimensionality explains the effectiveness of language model fine-tuning,” inProceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers), 2021, pp. 7319–7328
2021
-
[65]
Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment,
L. Xu, H. Xie, S. J. Qin, X. Tao, and F. L. Wang, “Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026
2026
-
[66]
From neurons to semantics: Evaluating cross-linguistic alignment capabilities of large language models via neurons alignment,
C. Huang, Y . Ye, B. Fu, Q. Su, and X. Shi, “From neurons to semantics: Evaluating cross-linguistic alignment capabilities of large language models via neurons alignment,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 28 956–28 974
2025
-
[67]
On relation-specific neurons in large language models,
Y . Liu, R. Chen, L. Hirlimann, A. D. Hakimi, M. Wang, A. H. Kargaran, S. Rothe, F. Yvon, and H. Schütze, “On relation-specific neurons in large language models,” inProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025, pp. 992–1022
2025
-
[68]
Teaching small language models to reason,
L. C. Magister, J. Mallinson, J. Adamek, E. Malmi, and A. Severyn, “Teaching small language models to reason,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023, pp. 1773–1781
2023
-
[69]
Large language models are reasoning teachers,
N. Ho, L. Schmid, and S.-Y . Yun, “Large language models are reasoning teachers,” inProceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers), 2023, pp. 14 852– 14 882
2023
-
[70]
Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes,
C.-Y . Hsieh, C.-L. Li, C.-K. Yeh, H. Nakhost, Y . Fujii, A. Ratner, R. Kr- ishna, C.-Y . Lee, and T. Pfister, “Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes,” inFindings of the Association for Computational Linguistics: ACL 2023, 2023, pp. 8003–8017
2023
-
[71]
Distilling the Knowledge in a Neural Network
G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[72]
Knowledge distillation: A survey,
J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,”International journal of computer vision, vol. 129, no. 6, pp. 1789–1819, 2021
2021
-
[73]
Specializing smaller language models towards multi-step reasoning,
Y . Fu, H. Peng, L. Ou, A. Sabharwal, and T. Khot, “Specializing smaller language models towards multi-step reasoning,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 10 421–10 430
2023
-
[74]
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
S. Mukherjee, A. Mitra, G. Jawahar, S. Agarwal, H. Palangi, and A. Awadallah, “Orca: Progressive learning from complex explanation traces of gpt-4,”arXiv preprint arXiv:2306.02707, 2023
work page internal anchor Pith review arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.