pith. sign in

arxiv: 2606.29407 · v1 · pith:E5I7C44Ynew · submitted 2026-06-28 · 💻 cs.CL · cs.AI

LC-ICL: Label-Guided Contrastive In-Context Learning for Robust Information Extraction

Pith reviewed 2026-06-30 07:30 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords in-context learninginformation extractionnamed entity recognitionrelation extractionfew-shot learningnegative sampleserror labelscontrastive demonstrations
0
0 comments X

The pith

LC-ICL improves few-shot information extraction by adding error-cause labels to negative examples in LLM demonstrations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LC-ICL, a few-shot in-context learning method for named entity recognition and relation extraction that builds demonstrations from both correct positive samples and incorrect negative samples. Each negative sample receives an explicit error-cause label so the model can see why a similar prediction would fail. The approach then selects hard negatives and nearest positive neighbors to the test instance and feeds the full set to the LLM. Experiments across multiple datasets show higher extraction accuracy than prior positive-only in-context methods. A reader would care because the technique turns the model's own mistakes into usable training signals without extra model training.

Core claim

LC-ICL creates in-context learning demonstrations by pairing positive samples with negative samples annotated by error-cause labels; these labels expose detailed error features so the LLM understands why similar predictions fail and avoids repeating the errors at inference time on entity and relation extraction tasks.

What carries the argument

Label-guided contrastive in-context learning that combines positive samples with hard negative samples carrying error-cause annotations.

If this is right

  • LLMs can reach higher accuracy on NER and RE by learning from both correct answers and explicitly labeled mistakes in the same prompt.
  • Demonstration selection that includes nearest positives and hard negatives supplies contextual information that standard random or positive-only selection misses.
  • The method works across multiple IE datasets without task-specific fine-tuning.
  • Error features learned from negatives help the model avoid repeating particular failure modes on unseen inputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same error-labeling idea could be tested on other sequence labeling or classification tasks where failure modes can be categorized in advance.
  • If error-cause labels prove costly to create by hand, an automated labeling step could be added and its effect measured.
  • Scaling the method to larger LLMs might show whether the benefit grows or saturates with model size.
  • The technique might reduce the number of positive examples needed by making each negative example more informative.

Load-bearing premise

Error-cause labels on negative samples supply information that positive samples alone cannot provide and that this information transfers to better performance on new test instances.

What would settle it

Running the same few-shot IE experiments with and without the error-cause labels on the negative samples and finding no accuracy gain or a drop would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.29407 by Shan Zhao, Tianwei Yan, Xiao You.

Figure 1
Figure 1. Figure 1: Using the relation extraction (RE) task as an example, this figure [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The figure illustrates an overview of the LC-ICL framework for information extraction tasks, using the Named Entity Recognition (NER) task as an [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparative experimental results based on NER and RE tasks, eval [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of RE and NER task performance of various methods [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of experimental results under two settings: using only [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: A comparison of prompts between LC-ICL and ICL methods on the [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Label Categorization Illustration. In the Relation Extraction (RE) task, labels are categorized into six classes, while in the Named Entity Recognition [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: A supplementary case study on label-guided contrastive in-context learning is presented. The left side shows the results of our method, while the [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
read the original abstract

There has been increasing interest in exploring the capabilities of advanced large language models (LLMs) in the field of information extraction (IE), specifically focusing on tasks related to named entity recognition (NER) and relation extraction (RE).Although researchers are exploring the use of few-shot information extraction through in-context learning with LLMs, they tend to focus only on using correct or positive examples for demonstration, neglecting the potential value of incorporating incorrect or negative examples into the learning process.In this paper, we present LC-ICL a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations. This approach enhances the ability of LLMs to extract entities and relations by combining positive samples with negative samples annotated by error-cause labels. These labels expose more detailed error features in erroneous examples, enabling the model to understand why similar predictions fail and avoid repeating such errors during inference.Specifically, our proposed method taps into the inherent contextual information and valuable information in hard negative samples and the nearest positive neighbors to the test and then applies the in-context learning demonstrations based on LLMs. Our experiments on various datasets indicate that LC-ICL outperforms previous few-shot in-context learning methods, delivering substantial enhancements in performance across a broad spectrum of related tasks. These improvements are noteworthy, showcasing the versatility of our approach in diverse scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes LC-ICL, a few-shot in-context learning method for information extraction (NER and RE) that augments positive demonstrations with negative samples annotated by error-cause labels. These labels are intended to expose detailed error features so that LLMs can avoid repeating similar mistakes at inference time. The central empirical claim is that this contrastive construction yields substantial gains over prior few-shot ICL baselines across multiple datasets.

Significance. If the performance gains are shown to be robust and the label-generation procedure is shown to be no more expensive than standard few-shot example selection, the method could meaningfully improve the reliability of LLM-based IE without requiring additional model training. The approach is notable for attempting to exploit hard negatives in a contrastive ICL setting, but its practical value hinges on the cost and generality of the error-cause annotations.

major comments (2)
  1. [Abstract / §3] Abstract and §3 (method description): the procedure for producing error-cause labels on negative samples is never specified. Because the central claim rests on these labels supplying transferable information that cannot be recovered from positives alone, the absence of any account of how the labels are obtained (manual authoring, model-based derivation on held-out data, etc.) makes it impossible to evaluate whether the method remains a fair few-shot technique or inadvertently leaks test-distribution information.
  2. [Abstract / Experiments] Abstract and experimental section: the manuscript asserts outperformance on “various datasets” yet supplies no description of the datasets, shot counts, baseline implementations, statistical significance tests, or ablations that isolate the contribution of the error-cause labels versus simply adding unlabeled negatives. Without these elements the empirical claim cannot be assessed.
minor comments (1)
  1. [Abstract] The abstract contains several run-on sentences and missing punctuation that impair readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that key details are missing from the current manuscript and will revise accordingly to address both major comments.

read point-by-point responses
  1. Referee: [Abstract / §3] Abstract and §3 (method description): the procedure for producing error-cause labels on negative samples is never specified. Because the central claim rests on these labels supplying transferable information that cannot be recovered from positives alone, the absence of any account of how the labels are obtained (manual authoring, model-based derivation on held-out data, etc.) makes it impossible to evaluate whether the method remains a fair few-shot technique or inadvertently leaks test-distribution information.

    Authors: We agree that the manuscript does not specify how error-cause labels are produced. This omission prevents proper evaluation of the method's fairness as few-shot ICL. In the revision we will add a subsection to §3 that explicitly describes the label-generation procedure (including whether it is manual, model-based on held-out data, or otherwise) and confirms that no test-distribution information is used. revision: yes

  2. Referee: [Abstract / Experiments] Abstract and experimental section: the manuscript asserts outperformance on “various datasets” yet supplies no description of the datasets, shot counts, baseline implementations, statistical significance tests, or ablations that isolate the contribution of the error-cause labels versus simply adding unlabeled negatives. Without these elements the empirical claim cannot be assessed.

    Authors: We acknowledge that the experimental section lacks the requested details. The revision will expand the experiments section to describe all datasets, shot counts, baseline implementations, statistical significance testing, and ablations that isolate the contribution of error-cause labels versus unlabeled negatives alone. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical method with no derivations or self-citation chains

full rationale

The paper presents LC-ICL as an empirical few-shot ICL technique that augments demonstrations with error-cause labeled negatives. No equations, derivations, or mathematical claims appear in the provided text. The central performance claim rests on experimental results rather than any reduction of outputs to fitted parameters or self-citations. No load-bearing steps match the enumerated circularity patterns; the method is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the premise that error-cause labels on negative examples add usable signal beyond positive examples alone; this premise is a domain assumption rather than a derived quantity.

axioms (2)
  • domain assumption Error-cause labels can be assigned to negative samples in a way that reveals actionable features for the LLM
    The method description explicitly relies on annotating incorrect examples with error causes.
  • domain assumption LLMs can extract and apply the contrast between positive and labeled negative demonstrations during inference
    Implicit in the claim that the model learns to avoid repeating errors.

pith-pipeline@v0.9.1-grok · 5770 in / 1283 out tokens · 48292 ms · 2026-06-30T07:30:32.903082+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 33 canonical work pages · 4 internal anchors

  1. [1]

    A comprehensive survey on automatic knowledge graph construction,

    L. Zhong, J. Wu, Q. Li, H. Peng, and X. Wu, “A comprehensive survey on automatic knowledge graph construction,”CoRR, vol. abs/2302.05019, 2023. [Online]. Available: https://doi.org/10.48550/ arXiv.2302.05019

  2. [2]

    Named entity recognition for question answering,

    D. M. Aliod, M. van Zaanen, and D. Smith, “Named entity recognition for question answering,” inProceedings of the Australasian Language Technology Workshop, ALTA 2006, Sydney, Australia, November 30-December 1, 2006, L. Cavedon and I. Zukerman, Eds. Australasian Language Technology Association, 2006, pp. 51–58. [Online]. Available: https://aclanthology.or...

  3. [3]

    Language models are few-shot learners,

    T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert- V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Am...

  4. [4]

    Rethinking the role of demonstrations: What makes in-context learning work? In Goldberg, Y., Kozareva, Z., and Zhang, Y

    S. Min, X. Lyu, A. Holtzman, M. Artetxe, M. Lewis, H. Hajishirzi, and L. Zettlemoyer, “Rethinking the role of demonstrations: What makes in-context learning work?” inProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Y . Goldberg, Z. Kozareva, and Y . Zh...

  5. [5]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosaleet al., “Llama 2: Open foundation and fine-tuned chat models,”arXiv preprint arXiv:2307.09288, 2023

  6. [6]

    GPT-4 Technical Report

    J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “Gpt-4 technical report,”arXiv preprint arXiv:2303.08774, 2023

  7. [7]

    Evaluating chatgpt’s information extraction capabilities: An assessment of performance, explainability, calibration, and faithfulness,

    B. Li, G. Fang, Y . Yang, Q. Wang, W. Ye, W. Zhao, and S. Zhang, “Evaluating chatgpt’s information extraction capabilities: An assessment of performance, explainability, calibration, and faithfulness,”CoRR, vol. abs/2304.11633, 2023. [Online]. Available: https://doi.org/10. 48550/arXiv.2304.11633

  8. [8]

    Large language models for generative information extraction: A survey,

    D. Xu, W. Chen, W. Peng, C. Zhang, T. Xu, X. Zhao, X. Wu, Y . Zheng, and E. Chen, “Large language models for generative information extraction: A survey,”CoRR, vol. abs/2312.17617, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2312.17617

  9. [9]

    Learning in-context learning for named entity recognition,

    J. Chen, Y . Lu, H. Lin, J. Lou, W. Jia, D. Dai, H. Wu, B. Cao, X. Han, and L. Sun, “Learning in-context learning for named entity recognition,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, A. Rogers, J. L. Boyd-Graber, and N. Okazaki, Eds. As...

  10. [10]

    Z-ICL: zero-shot in-context learning with pseudo-demonstrations,

    X. Lyu, S. Min, I. Beltagy, L. Zettlemoyer, and H. Hajishirzi, “Z-ICL: zero-shot in-context learning with pseudo-demonstrations,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, A. Rogers, J. L. Boyd-Graber, and N. Okazaki, Eds. Association for C...

  11. [11]

    Zero-shot information extraction via chatting with ChatGPT,

    X. Wei, X. Cui, N. Cheng, X. Wang, X. Zhang, S. Huang, P. Xie, J. Xu, Y . Chen, M. Zhanget al., “Zero-shot information extraction via chatting with chatgpt,”arXiv preprint arXiv:2302.10205, 2023

  12. [12]

    Chain of thought with explicit evidence reasoning for few-shot relation extraction,

    X. Ma, J. Li, and M. Zhang, “Chain of thought with explicit evidence reasoning for few-shot relation extraction,” inFindings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, H. Bouamor, J. Pino, and K. Bali, Eds. Association for Computational Linguistics, 2023, pp. 2334–2352. [Online]. Available: https://aclant...

  13. [13]

    Revisiting relation extraction in the era of large language models,

    S. Wadhwa, S. Amir, and B. C. Wallace, “Revisiting relation extraction in the era of large language models,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, A. Rogers, J. L. Boyd-Graber, and N. Okazaki, Eds. Association 11 for Computational Lingu...

  14. [14]

    Proceedings of the 61st

    P. Li, T. Sun, Q. Tang, H. Yan, Y . Wu, X. Huang, and X. Qiu, “Codeie: Large code generation models are better few- shot information extractors,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, A. Rogers, J. L. Boyd-Graber, and N. Okazaki, Eds. A...

  15. [15]

    Gollie: Annotation guidelines improve zero-shot information-extraction,

    O. Sainz, I. García-Ferrero, R. Agerri, O. L. de Lacalle, G. Rigau, and E. Agirre, “Gollie: Annotation guidelines improve zero-shot information-extraction,”CoRR, vol. abs/2310.03668, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.03668

  16. [16]

    arXiv preprint arXiv:2304.08085 , year=

    X. Wang, W. Zhou, C. Zu, H. Xia, T. Chen, Y . Zhang, R. Zheng, J. Ye, Q. Zhang, T. Guiet al., “Instructuie: Multi-task instruction tuning for unified information extraction,”arXiv preprint arXiv:2304.08085, 2023

  17. [17]

    Code4struct: Code generation for few-shot event structure prediction,

    X. Wang, S. Li, and H. Ji, “Code4struct: Code generation for few-shot event structure prediction,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, A. Rogers, J. L. Boyd-Graber, and N. Okazaki, Eds. Association for Computational Linguistics, 2023,...

  18. [18]

    GPT-RE: in-context learning for relation extraction using large language models,

    Z. Wan, F. Cheng, Z. Mao, Q. Liu, H. Song, J. Li, and S. Kurohashi, “GPT-RE: in-context learning for relation extraction using large language models,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, H. Bouamor, J. Pino, and K. Bali, Eds. Association for Computational Lin...

  19. [19]

    What makes good in-context examples for gpt-3?

    J. Liu, D. Shen, Y . Zhang, B. Dolan, L. Carin, and W. Chen, “What makes good in-context examples for gpt-3?” inProceedings of Deep Learning Inside Out: The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, DeeLIO@ACL 2022, Dublin, Ireland and Online, May 27, 2022, E. Agirre, M. Apidianaki, and I. Vulic, Eds. Associatio...

  20. [20]

    What Makes Good In-Context Examples for

    [Online]. Available: https://doi.org/10.18653/v1/2022.deelio-1.10

  21. [21]

    Thinking about GPT-3 in-context learning for biomedical ie? think again,

    B. J. Gutierrez, N. McNeal, C. Washington, Y . Chen, L. Li, H. Sun, and Y . Su, “Thinking about GPT-3 in-context learning for biomedical ie? think again,” inFindings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Y . Goldberg, Z. Kozareva, and Y . Zhang, Eds. Association for Computationa...

  22. [22]

    Retrieval-

    Y . Guo, Z. Li, X. Jin, Y . Liu, Y . Zeng, W. Liu, X. Li, P. Yang, L. Bai, J. Guo, and X. Cheng, “Retrieval-augmented code generation for universal information extraction,”CoRR, vol. abs/2311.02962, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2311.02962

  23. [23]

    VSE++: Improving Visual-Semantic Embeddings with Hard Negatives

    F. Faghri, D. J. Fleet, J. R. Kiros, and S. Fidler, “Vse++: Improv- ing visual-semantic embeddings with hard negatives,”arXiv preprint arXiv:1707.05612, 2017

  24. [24]

    C- icl: contrastive in-context learning for information extraction,

    Y . Mo, J. Liu, J. Yang, Q. Wang, S. Zhang, J. Wang, and Z. Li, “C- icl: contrastive in-context learning for information extraction,”arXiv preprint arXiv:2402.11254, 2024

  25. [25]

    A linear programming formulation for global inference in natural language tasks,

    D. Roth and W. Yih, “A linear programming formulation for global inference in natural language tasks,” inProceedings of the Eighth Conference on Computational Natural Language Learning, CoNLL 2004, Held in cooperation with HLT-NAACL 2004, Boston, Massachusetts, USA, May 6-7, 2004, H. T. Ng and E. Riloff, Eds. ACL, 2004, pp. 1–8. [Online]. Available: https...

  26. [26]

    Modeling relations and their mentions without labeled text,

    S. Riedel, L. Yao, and A. McCallum, “Modeling relations and their mentions without labeled text,” inMachine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2010, Barcelona, Spain, September 20-24, 2010, Proceedings, Part III 21. Springer, 2010, pp. 148–163

  27. [27]

    A hierarchical framework for relation extraction with reinforcement learning,

    R. Takanobu, T. Zhang, J. Liu, and M. Huang, “A hierarchical framework for relation extraction with reinforcement learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 7072–7079

  28. [28]

    Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , address=

    Y . Luan, L. He, M. Ostendorf, and H. Hajishirzi, “Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction,” inProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, E...

  29. [29]

    Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports,

    H. Gurulingappa, A. M. Rajput, A. Roberts, J. Fluck, M. Hofmann- Apitius, and L. Toldo, “Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports,”Journal of biomedical informatics, vol. 45, no. 5, pp. 885–892, 2012

  30. [30]

    Unified structure generation for universal information extraction,

    Y . Lu, Q. Liu, D. Dai, X. Xiao, H. Lin, X. Han, L. Sun, and H. Wu, “Unified structure generation for universal information extraction,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, S. Muresan, P. Nakov, and A. Villavicencio, Eds. Association ...

  31. [31]

    The automatic content extraction (ACE) program - tasks, data, and evaluation,

    G. R. Doddington, A. Mitchell, M. A. Przybocki, L. A. Ramshaw, S. M. Strassel, and R. M. Weischedel, “The automatic content extraction (ACE) program - tasks, data, and evaluation,” inProceedings of the Fourth International Conference on Language Resources and Evaluation, LREC 2004, May 26-28, 2004, Lisbon, Portugal. European Language Resources Association...

  32. [32]

    Walker and L

    C. Walker and L. D. Consortium,ACE 2005 Multilingual Training Corpus, ser. LDC corpora, 2005

  33. [33]

    Ncbi disease corpus: a resource for disease name recognition and concept normalization,

    R. I. Dogan, R. Leaman, and Z. Lu, “Ncbi disease corpus: a resource for disease name recognition and concept normalization,”Journal of biomedical informatics, vol. 47, pp. 1–10, 2014

  34. [34]

    A unified MRC framework for named entity recognition,

    X. Li, J. Feng, Y . Meng, Q. Han, F. Wu, and J. Li, “A unified MRC framework for named entity recognition,” inProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, D. Jurafsky, J. Chai, N. Schluter, and J. R. Tetreault, Eds. Association for Computational Linguistics, 2020, pp. 5849–5859...

  35. [35]

    Warm, comforting recollection

    Y . Mo, H. Tang, J. Liu, Q. Wang, Z. Xu, J. Wang, W. Wu, and Z. Li, “Multi-task transformer with relation-attention and type-attention for named entity recognition,” inIEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023. IEEE, 2023, pp. 1–5. [Online]. Available: https://doi.org/10.11...

  36. [36]

    mcl- ner: Cross-lingual named entity recognition via multi-view contrastive learning,

    Y . Mo, J. Yang, J. Liu, Q. Wang, R. Chen, J. Wang, and Z. Li, “mcl- ner: Cross-lingual named entity recognition via multi-view contrastive learning,”CoRR, vol. abs/2308.09073, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.09073

  37. [37]

    The llama 3 herd of models,

    A. Grattafiori, A. Dubey, A. Jauhriet al., “The llama 3 herd of models,” 2024

  38. [38]

    DeepSeek-V3 Technical Report

    A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruanet al., “Deepseek-v3 technical report,”arXiv preprint arXiv:2412.19437, 2024

  39. [39]

    Neural architectures for named entity recognition,

    G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, “Neural architectures for named entity recognition,” inNAACL HLT 2016, 2016, pp. 260–270

  40. [40]

    Recognizing continuous and discontinuous adverse drug reaction mentions from social media using lstm-crf,

    B. Tang, J. Hu, X. Wang, and Q. Chen, “Recognizing continuous and discontinuous adverse drug reaction mentions from social media using lstm-crf,”Wireless Communications and Mobile Computing, vol. 2018, 2018

  41. [41]

    Dynamic modeling cross-modal interactions in two-phase prediction for entity-relation extraction,

    S. Zhao, M. Hu, Z. Cai, and F. Liu, “Dynamic modeling cross-modal interactions in two-phase prediction for entity-relation extraction,”IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 3, pp. 1122–1131, 2023

  42. [42]

    CROP: zero-shot cross-lingual named entity recognition with multilingual labeled sequence translation,

    J. Yang, S. Huang, S. Ma, Y . Yin, L. Dong, D. Zhang, H. Guo, Z. Li, and F. Wei, “CROP: zero-shot cross-lingual named entity recognition with multilingual labeled sequence translation,” inFindings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Y . Goldberg, Z. Kozareva, and Y . Zhang, Ed...

  43. [43]

    Zero-shot information extraction via chatting with ChatGPT,

    X. Wei, X. Cui, N. Cheng, X. Wang, X. Zhang, S. Huang, P. Xie, J. Xu, Y . Chen, M. Zhang, Y . Jiang, and W. Han, “Zero-shot information extraction via chatting with chatgpt,”CoRR, vol. abs/2302.10205, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2302.10205

  44. [44]

    Learning to select relevant knowledge for neural machine translation,

    J. Yang, J. Wan, S. Ma, H. Huang, D. Zhang, Y . Yu, Z. Li, and F. Wei, “Learning to select relevant knowledge for neural machine translation,” inNatural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Qingdao, China, October 13-17, 2021, Proceedings, Part I, ser. Lecture Notes in Computer Science, L. Wang, Y . Fe...

  45. [45]

    Learning To Retrieve Prompts for In-Context Learning , url =

    O. Rubin, J. Herzig, and J. Berant, “Learning to retrieve prompts for in-context learning,” inProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, M. Carpuat, M. de Marneffe, and I. V . M. Ruíz, Eds. Associat...

  46. [46]

    A survey on in-context learning,

    Q. Dong, L. Li, D. Dai, C. Zheng, Z. Wu, B. Chang, X. Sun, J. Xu, L. Li, and Z. Sui, “A survey on in-context learning,” 2023

  47. [47]

    1 M MX µ=1 wµv⊤ µ #

    Y . Ge, S. Liu, Y . Wang, L. Mei, L. Chen, B. Bi, and X. Cheng, “Innate reasoning is not enough: In-context learning enhances reasoning large language models with less overthinking,” 2025. [Online]. Available: https://arxiv.org/abs/2503.19602

  48. [48]

    InProceedings of the 2021 Conference on Empiri- cal Methods in Natural Language Processing, pages 6030–6040

    B. Y . Lin, A. Ravichander, X. Lu, N. Dziri, M. Sclar, K. Chandu, C. Bha- gavatula, and Y . Choi, “The unlocking spell on base llms: Rethinking alignment via in-context learning,”arXiv preprint arXiv:2312.01552, 2023

  49. [49]

    Is in- context learning sufficient for instruction following in llms?

    H. Zhao, M. Andriushchenko, F. Croce, and N. Flammarion, “Is in- context learning sufficient for instruction following in llms?”arXiv preprint arXiv:2405.19874, 2024

  50. [50]

    Distributed representations of words and phrases and their composi- tionality,

    T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their composi- tionality,”Advances in neural information processing systems, vol. 26, 2013

  51. [51]

    Bert: Pre-training of deep bidirectional transformers for language understanding,

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inPro- ceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186

  52. [52]

    Distinguish- ing ignorance from error in llm hallucinations,

    A. Simhi, J. Herzig, I. Szpektor, and Y . Belinkov, “Distinguish- ing ignorance from error in llm hallucinations,”arXiv preprint arXiv:2410.22071, 2024

  53. [53]

    Error analysis prompting enables human-like translation evaluation in large language models,

    Q. Lu, B. Qiu, L. Ding, K. Zhang, T. Kocmi, and D. Tao, “Error analysis prompting enables human-like translation evaluation in large language models,”arXiv preprint arXiv:2303.13809, 2023

  54. [54]

    Hierarchical label-enhanced contrastive learning for chinese ner,

    C. Wang, S. Zhao, T. Yan, S. Song, W. Ma, K. Liu, and M. Wang, “Hierarchical label-enhanced contrastive learning for chinese ner,”IEEE Transactions on Neural Networks and Learning Systems, pp. 1–11, 2025

  55. [55]

    Hcl: A hierarchical contrastive learning framework for zero-shot rela- tion extraction,

    T. Yan, S. Zhao, M. Hu, M. Wang, X. Zhang, Z. Luo, and M. Wang, “Hcl: A hierarchical contrastive learning framework for zero-shot rela- tion extraction,”IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 3, pp. 5694–5705, 2025. 13 APPENDIX A. Dataset Statistics To facilitate a thorough evaluation, we incorporate a diverse collection o...