arxiv: 2604.17574 · v1 · submitted 2026-04-19 · 💻 cs.CL

Recognition: unknown

Beyond Fine-Tuning: In-Context Learning and Chain-of-Thought for Reasoned Distractor Generation

Elaf Alhazmi , Quan Z. Sheng , Wei Emma Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:48 UTC · model grok-4.3

classification 💻 cs.CL

keywords distractor generationin-context learningchain-of-thought promptinglarge language modelsmultiple-choice questionsreasoned distractorsfew-shot examplessemantic retrieval

0 comments

The pith

Large language models prompted with few-shot examples and chain-of-thought generate superior reasoned distractors compared to fine-tuned models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that in-context learning with large language models can improve distractor generation for multiple-choice questions by selecting relevant examples through semantic retrieval and producing both distractors and their reasoning rationales. This approach is important because creating good distractors currently requires significant expert effort, and better automation could enhance testing and learning materials. Experiments across six different benchmarks show that this prompting method outperforms recent fine-tuning techniques and reaches state-of-the-art alignment with human-created distractors.

Core claim

By applying in-context learning to LLMs for distractor generation, the authors demonstrate that few-shot prompting with retrieved examples combined with chain-of-thought rationale generation produces distractors that are more plausible and better aligned with human benchmarks than those from fine-tuned encoder-decoder models with contrastive learning.

What carries the argument

The rationale-augmented distractor generation framework, which retrieves few-shot examples using unsupervised semantic similarity and prompts LLMs to output both distractors and step-by-step rationales for their selection.

If this is right

Prompted LLMs can replace or augment fine-tuning for this task without additional training data or compute for model updates.
The inclusion of rationales makes the generated distractors more interpretable and closer to expert reasoning.
Performance gains hold across domains with different question types and distractor lengths.
The method achieves state-of-the-art results on all six evaluated benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This could extend to generating other types of educational content that require plausible incorrect options, such as in adaptive testing systems.
If the approach generalizes, it might lower barriers for creating high-quality assessments in specialized fields where experts are scarce.
Future work could test whether the same framework improves performance on related tasks like generating explanations for correct answers.

Load-bearing premise

The assumption that the chain-of-thought rationales produced by the LLM will consistently mirror the hidden reasoning steps that human experts use to choose effective distractors on the benchmarks.

What would settle it

A human evaluation study on a held-out set of questions where experts rate the generated distractors and rationales as less plausible or less aligned than those from previous fine-tuned models would disprove the performance advantage.

Figures

Figures reproduced from arXiv: 2604.17574 by Elaf Alhazmi, Quan Z. Sheng, Wei Emma Zhang.

**Figure 2.** Figure 2: The in-context learning framework by large language model and chain-of-thought rationale generation for [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: T5 – Question Answering Accuracy [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: F1@3 of Mistral(k-NN) with recent models. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: F1@3 score comparison with varying number of few-shot ( [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

read the original abstract

Distractor generation (DG) remains a labor-intensive task that still significantly depends on domain experts. The task focuses on generating plausible yet incorrect options, known as distractors, for multiple-choice questions. A reliable distractor must be contextually relevant to the question and able to mislead examinees through implicit reasoning when identifying the correct answer. While a recent method integrates fine-tuning pre-trained encoder-decoder models with contrastive learning to generate semantically relevant distractors for a given question-answer, it often fails to capture the underlying reasoning process that experts utilize when selecting distractors in benchmarks. In this paper, we explore large language models (LLMs) reasoning for DG through in-context learning with unsupervised semantic retrieval for selecting few-shot examples. We design a rationale-augmented DG framework that jointly generates distractors and their rationales for a given question-answer. Extensive experiments on six benchmarks, with varying average distractor lengths and domains, demonstrate that prompting LLMs with few-shot examples substantially improves the performance compared to recent DG models. It outperforms recent approaches and achieves state-of-the-art results in generating reasoned distractors that align with human-labeled benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper replaces fine-tuning with ICL plus CoT and retrieval to jointly generate distractors and rationales, but the SOTA claim on human alignment rests on untested assumptions about rationale quality.

read the letter

The paper's main contribution is showing that large language models prompted with few-shot examples selected via unsupervised semantic retrieval, combined with chain-of-thought, can generate both distractors and rationales for multiple-choice questions, outperforming recent fine-tuned models on six benchmarks. This is new in the distractor generation literature because it moves away from encoder-decoder fine-tuning with contrastive learning and instead uses in-context learning to jointly produce the distractors and explanations of their plausibility. The framework tries to capture the implicit reasoning that human experts apply when creating test items. What the paper does well is identifying a clear limitation in prior work and proposing a prompting-based solution that avoids retraining. Running experiments across varied domains and distractor lengths gives a reasonable sense of generality. The soft spots are in the evaluation. The state-of-the-art claims would be more convincing if there were separate assessments of the rationales, such as human ratings on whether they match the reasoning behind the benchmark labels. Relying primarily on automatic metrics for the distractors leaves the alignment with human reasoning as an assumption rather than a demonstrated result. More details on the retrieval process and prompt engineering would also help reproducibility. This kind of work is useful for researchers in natural language processing focused on educational technology and question generation. Readers who want to explore prompting strategies for domain-specific tasks without fine-tuning will find the setup relevant. It deserves peer review. The experiments are on real benchmarks and the idea is practical, so referees can assess the details and suggest ways to strengthen the rationale validation.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a rationale-augmented distractor generation (DG) framework for multiple-choice questions that uses large language models via in-context learning. Few-shot examples are selected through unsupervised semantic retrieval, and chain-of-thought prompting is employed to jointly produce distractors along with their rationales. The central claim is that this prompting-based approach substantially outperforms recent fine-tuned encoder-decoder models with contrastive learning, achieving state-of-the-art results on six benchmarks with varying domains and distractor lengths by generating reasoned distractors that align with human-labeled data.

Significance. If the empirical claims hold under rigorous validation, the work could meaningfully shift distractor generation away from resource-intensive fine-tuning toward more flexible LLM prompting strategies, lowering barriers for creating high-quality educational assessments. The explicit inclusion of rationale generation addresses a noted gap in prior DG methods regarding implicit reasoning capture. However, the absence of detailed metrics and rationale-specific validation in the provided description limits assessment of whether this represents a genuine advance over existing approaches.

major comments (3)

[Abstract] Abstract: The abstract asserts SOTA results on six benchmarks yet supplies no metrics, baselines, error bars, statistical tests, or ablation details; the full evaluation protocol is absent, which is load-bearing for the central empirical claim of outperforming recent DG models.
[§4 (Experiments)] §4 (Experiments): The reported results appear to rely on automatic metrics (e.g., BLEU, ROUGE, semantic similarity) applied only to the generated distractors, without separate human or expert evaluation to verify that the accompanying rationales match the implicit reasoning experts used when labeling the human benchmarks. This directly undermines the 'reasoned distractors that align with human-labeled benchmarks' component of the SOTA claim.
[§3 (Method)] §3 (Method): The unsupervised semantic retrieval mechanism for selecting few-shot examples lacks sufficient implementation details (embedding model, similarity function, number of shots, and any filtering criteria), preventing assessment of its contribution or reproducibility; no ablation is described comparing it to random or other selection strategies.

minor comments (2)

[§2 (Related Work)] The related work section could benefit from explicit comparison tables summarizing prior DG methods' performance on the same six benchmarks to contextualize the claimed improvements.
[§3 (Method)] Notation for the rationale-augmented prompt template is introduced without a clear formal definition or example in the main text, making the framework description harder to follow.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, providing clarifications where appropriate and outlining planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract asserts SOTA results on six benchmarks yet supplies no metrics, baselines, error bars, statistical tests, or ablation details; the full evaluation protocol is absent, which is load-bearing for the central empirical claim of outperforming recent DG models.

Authors: We agree that the abstract would benefit from greater specificity to support the SOTA claim. In the revised version, we will incorporate key quantitative highlights, including average performance improvements over the strongest baselines across the six benchmarks and the primary metrics employed. The full details on baselines, error bars, statistical tests, and ablations are already reported in Section 4; we will ensure the abstract more clearly references the evaluation protocol. revision: yes
Referee: [§4 (Experiments)] §4 (Experiments): The reported results appear to rely on automatic metrics (e.g., BLEU, ROUGE, semantic similarity) applied only to the generated distractors, without separate human or expert evaluation to verify that the accompanying rationales match the implicit reasoning experts used when labeling the human benchmarks. This directly undermines the 'reasoned distractors that align with human-labeled benchmarks' component of the SOTA claim.

Authors: The evaluation in the manuscript centers on automatic metrics that quantify how closely the generated distractors align with those in the human-labeled benchmarks; superior performance under these metrics is presented as evidence that the rationale-augmented prompting better captures the implicit reasoning used by experts. We did not conduct separate human evaluation specifically on the rationales themselves. To address the concern, we will add a targeted discussion of this limitation and include a small-scale expert assessment of rationale quality in the revised manuscript. revision: partial
Referee: [§3 (Method)] §3 (Method): The unsupervised semantic retrieval mechanism for selecting few-shot examples lacks sufficient implementation details (embedding model, similarity function, number of shots, and any filtering criteria), preventing assessment of its contribution or reproducibility; no ablation is described comparing it to random or other selection strategies.

Authors: We appreciate the referee's emphasis on reproducibility. In the revised manuscript, Section 3 will be expanded to specify the embedding model, similarity function (cosine similarity), number of shots, and any filtering criteria. We will also add an ablation study comparing semantic retrieval against random selection and alternative strategies to quantify its contribution. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical prompting evaluation

full rationale

The paper describes an empirical study using in-context learning and chain-of-thought prompting on LLMs for distractor generation, with experiments on six benchmarks comparing against prior DG models. No equations, derivations, fitted parameters, or self-citations are used to derive claims; results are reported via standard metrics on held-out benchmarks. The central claim reduces to observed performance improvements rather than any self-referential construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested premise that LLMs can be prompted to replicate expert-level reasoning for distractor selection; no free parameters or new entities are introduced.

axioms (1)

domain assumption Large language models can perform reasoning for distractor generation when provided with few-shot examples and chain-of-thought prompts.
Core premise of the rationale-augmented DG framework described in the abstract.

pith-pipeline@v0.9.0 · 5509 in / 1199 out tokens · 41423 ms · 2026-05-10T05:48:39.902940+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

154 extracted references · 34 canonical work pages · 7 internal anchors

[1]

Trainable Hard Negative Examples in Contrastive Learning for Unsupervised Abstractive Summarization

Zhuang, Haojie and Zhang, Wei Emma and Dong, Chang and Yang, Jian and Sheng, Quan. Trainable Hard Negative Examples in Contrastive Learning for Unsupervised Abstractive Summarization. Findings of the Association for Computational Linguistics: EACL 2024. 2024

2024
[2]

Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) , url =

Visual7w: Grounded Question Answering in Images , author=. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) , url =
[3]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Unsupervised sentence representation via contrastive learning with mixing negatives , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=. 2022 , url =

2022
[4]

Fine-grained Contrastive Learning for Definition Generation

Zhang, Hengyuan and Li, Dawei and Yang, Shiping and Li, Yanran. Fine-grained Contrastive Learning for Definition Generation. Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2022

2022
[5]

International Conference on Web Information Systems Engineering (WISE) , pages=

Learning Contrastive Representations for Dense Passage Retrieval in Open-Domain Conversational Question Answering , author=. International Conference on Web Information Systems Engineering (WISE) , pages=. 2024 , url =

2024
[6]

Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration

Yu, Han Cheng and Shih, Yu An and Law, Kin Man and Hsieh, KaiYu and Cheng, Yu Chen and Ho, Hsin Chih and Lin, Zih An and Hsu, Wen-Chuan and Fan, Yao-Chung. Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration. Findings of the Association for Computational Linguistics (ACL). 2024

2024
[7]

Distractor Generation for Fill-in-the-Blank Exercises by Question Type

Yoshimi, Nana and Kajiwara, Tomoyuki and Uchida, Satoru and Arase, Yuki and Ninomiya, Takashi. Distractor Generation for Fill-in-the-Blank Exercises by Question Type. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) (Volume 4: Student Research Workshop). 2023

2023
[8]

R ecipe QA : A Challenge Dataset for Multimodal Comprehension of Cooking Recipes

Yagcioglu, Semih and Erdem, Aykut and Erdem, Erkut and Ikizler-Cinbis, Nazli. R ecipe QA : A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2018

2018
[9]

S im CSE ++: Improving Contrastive Learning for Sentence Embeddings from Two Perspectives

Xu, Jiahao and Shao, Wei and Chen, Lihui and Liu, Lemao. S im CSE ++: Improving Contrastive Learning for Sentence Embeddings from Two Perspectives. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.737

work page doi:10.18653/v1/2023.emnlp-main.737 2023
[10]

Large-scale Cloze Test Dataset Created by Teachers

Xie, Qizhe and Lai, Guokun and Dai, Zihang and Hovy, Eduard. Large-scale Cloze Test Dataset Created by Teachers. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2018

2018
[11]

IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) , volume=

Diverse distractor generation for constructing high-quality multiple choice questions , author=. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) , volume=. 2021 , publisher=

2021
[12]

Proceedings of the 2012 ACM SIGMOD international conference on management of data , url =

Probase: A probabilistic taxonomy for text understanding , author=. Proceedings of the 2012 ACM SIGMOD international conference on management of data , url =

2012
[13]

PCL : Peer-Contrastive Learning with Diverse Augmentations for Unsupervised Sentence Embeddings

Wu, Qiyu and Tao, Chongyang and Shen, Tao and Xu, Can and Geng, Xiubo and Jiang, Daxin. PCL : Peer-Contrastive Learning with Diverse Augmentations for Unsupervised Sentence Embeddings. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2022. doi:10.18653/v1/2022.emnlp-main.826

work page doi:10.18653/v1/2022.emnlp-main.826 2022
[15]

and Gardner, Matt

Welbl, Johannes and Liu, Nelson F. and Gardner, Matt. Crowdsourcing Multiple Choice Science Questions. Proceedings of the 3rd Workshop on Noisy User-generated Text (WNUT). 2017

2017
[16]

Distractor Generation based on T ext2 T ext Language Models with Pseudo K ullback- L eibler Divergence Regulation

Wang, Hui-Juan and Hsieh, Kai-Yu and Yu, Han-Cheng and Tsou, Jui-Ching and Shih, Yu An and Huang, Chen-Hua and Fan, Yao-Chung. Distractor Generation based on T ext2 T ext Language Models with Pseudo K ullback- L eibler Divergence Regulation. Findings of the Association for Computational Linguistics: ACL 2023. 2023

2023
[17]

Distractor Generation Using Generative and Discriminative Capabilities of Transformer-based Models

Taslimipoor, Shiva and Benedetto, Luca and Felice, Mariano and Buttery, Paula. Distractor Generation Using Generative and Discriminative Capabilities of Transformer-based Models. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING). 2024

2024
[18]

A Contrastive Framework for Neural Text Generation , url =

Su, Yixuan and Lan, Tian and Wang, Yan and Yogatama, Dani and Kong, Lingpeng and Collier, Nigel , booktitle =. A Contrastive Framework for Neural Text Generation , url =
[19]

Improved Deep Metric Learning with Multi-class N-pair Loss Objective , url =

Sohn, Kihyuk , booktitle =. Improved Deep Metric Learning with Multi-class N-pair Loss Objective , url =
[20]

Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) , pages=

Facenet: A unified embedding for face recognition and clustering , author=. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) , pages=
[21]

https://ojs.aaai.org/index.php/AAAI/article/view/16559

Knowledge-driven distractor generation for cloze-style multiple choice questions , author=. Proceedings of the AAAI conference on artificial intelligence , url = "https://ojs.aaai.org/index.php/AAAI/article/view/16559", volume=
[22]

The Journal of Machine Learning Research (JMLR) , url =

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , author=. The Journal of Machine Learning Research (JMLR) , url =
[23]

Proceedings of the 38th International Conference on Machine Learning (ICML) , pages =

Learning Transferable Visual Models From Natural Language Supervision , author =. Proceedings of the 38th International Conference on Machine Learning (ICML) , pages =. 2021 , volume =

2021
[24]

Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding

Qu, Fanyi and Sun, Hao and Wu, Yunfang. Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding. Findings of the Association for Computational Linguistics (ACL). 2024

2024
[25]

ERICA : Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning

Qin, Yujia and Lin, Yankai and Takanobu, Ryuichi and Liu, Zhiyuan and Li, Peng and Ji, Heng and Huang, Minlie and Sun, Maosong and Zhou, Jie. ERICA : Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Inte...

work page doi:10.18653/v1/2021.acl-long.260 2021
[26]

https://www.isca-archive.org/slate_2009/pino09_slate.pdf

Semi-automatic generation of cloze question distractors effect of students’ L1 , author=. International Workshop on Speech and Language Technology in Education , url = "https://www.isca-archive.org/slate_2009/pino09_slate.pdf", year=
[27]

G lo V e: Global Vectors for Word Representation

Pennington, Jeffrey and Socher, Richard and Manning, Christopher. G lo V e: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ). 2014

2014
[28]

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Pan, Xiao and Wang, Mingxuan and Wu, Liwei and Li, Lei. Contrastive Learning for Many-to-many Multilingual Neural Machine Translation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP). 2021. doi:10.18653/v1/2021.acl-long.21

work page doi:10.18653/v1/2021.acl-long.21 2021
[29]

https://dl.acm.org/doi/abs/10.3115/1118894.1118897

Computer-aided generation of multiple-choice tests , author=. Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing , url ="https://dl.acm.org/doi/abs/10.3115/1118894.1118897", pages=

work page doi:10.3115/1118894.1118897
[30]

Semantic Similarity of Distractors in Multiple-Choice Tests: Extrinsic Evaluation

Mitkov, Ruslan and Ha, Le An and Varga, Andrea and Rello, Luz. Semantic Similarity of Distractors in Multiple-Choice Tests: Extrinsic Evaluation. Proceedings of the Workshop on Geometrical Models of Natural Language Semantics (GEMS). 2009

2009
[31]

WordNet: A Lexical Reference System and its Application , pages=

Combining local context and WordNet similarity for word sense identification , author=. WordNet: A Lexical Reference System and its Application , pages=
[32]

Efficient Estimation of Word Representations in Vector Space

Efficient estimation of word representations in vector space , author=. arXiv preprint arXiv:1301.3781 , url = "https://arxiv.org/pdf/1301.3781", year=

work page internal anchor Pith review arXiv
[34]

https://link.springer.com/chapter/10.1007/978-3-031-56063-7_18

A Novel Multi-Stage Prompting Approach for Language Agnostic MCQ Generation Using GPT , author=. Proceedings of European Conference on Information Retrieval (ECIR) , url = "https://link.springer.com/chapter/10.1007/978-3-031-56063-7_18", pages=

work page doi:10.1007/978-3-031-56063-7_18
[35]

Proceedings 16th European Conference Computer Vision (ECCV) , pages=

Oscar: Object-semantics aligned pre-training for vision-language tasks , author=. Proceedings 16th European Conference Computer Vision (ECCV) , pages=. 2020 , url =

2020
[36]

BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Veselin and Zettlemoyer, Luke. BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguisti...

2020
[37]

International Journal of Artificial Intelligence in Education (IJAIED) , url =

A systematic review of automatic question generation for educational purposes , author=. International Journal of Artificial Intelligence in Education (IJAIED) , url =
[38]

R ev UP : Automatic Gap-Fill Question Generation from Educational Texts

Kumar, Girish and Banchs, Rafael and D ' Haro, Luis Fernando. R ev UP : Automatic Gap-Fill Question Generation from Educational Texts. Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications (BEA). 2015

2015
[39]

Self-Guided Contrastive Learning for BERT Sentence Representations

Kim, Taeuk and Yoo, Kang Min and Lee, Sang-goo. Self-Guided Contrastive Learning for BERT Sentence Representations. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP). 2021. doi:10.18653/v1/2021.acl-long.197

work page doi:10.18653/v1/2021.acl-long.197 2021
[40]

Dense Passage Retrieval for Open-Domain Question Answering

Karpukhin, Vladimir and Oguz, Barlas and Min, Sewon and Lewis, Patrick and Wu, Ledell and Edunov, Sergey and Chen, Danqi and Yih, Wen-tau. Dense Passage Retrieval for Open-Domain Question Answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020

2020
[41]

2006 IEEE computer society conference on computer vision and pattern recognition (CVPR) , volume=

Dimensionality reduction by learning an invariant mapping , author=. 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR) , volume=

2006
[42]

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI) , url =

Questimator: generating knowledge assessments for arbitrary topics , author=. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI) , url =
[43]

D e CLUTR : Deep Contrastive Learning for Unsupervised Textual Representations

Giorgi, John and Nitski, Osvald and Wang, Bo and Bader, Gary. D e CLUTR : Deep Contrastive Learning for Unsupervised Textual Representations. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP). 2021. doi:10.18653/v1/2021.acl-long.72

work page doi:10.18653/v1/2021.acl-long.72 2021
[44]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Generating distractors for reading comprehension questions from real examinations , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[45]

SimCSE: Simple Contrastive Learning of Sentence Embeddings , booktitle =

Gao, Tianyu and Yao, Xingcheng and Chen, Danqi. S im CSE : Simple Contrastive Learning of Sentence Embeddings. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2021. doi:10.18653/v1/2021.emnlp-main.552

work page doi:10.18653/v1/2021.emnlp-main.552 2021
[46]

Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models

Feng, Wanyong and Lee, Jaewook and McNichols, Hunter and Scarlatos, Alexander and Smith, Digory and Woodhead, Simon and Ornelas, Nancy and Lan, Andrew. Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models. Findings of the Association for Computational Linguistics (NAACL). 2024

2024
[47]

https://ieeexplore.ieee.org/abstract/document/10342898

A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education , author=. Proceedings of the 26th Australasian Computing Education Conference (ACE) , url = "https://ieeexplore.ieee.org/abstract/document/10342898", pages=

work page arXiv
[48]

Closed-book Question Generation via Contrastive Learning

Dong, Xiangjue and Lu, Jiaying and Wang, Jianling and Caverlee, James. Closed-book Question Generation via Contrastive Learning. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL). 2023. doi:10.18653/v1/2023.eacl-main.230

work page doi:10.18653/v1/2023.eacl-main.230 2023
[49]

Can We Learn Question, Answer, and Distractors All from an Image? A New Task for Multiple-choice Visual Question Answering

Ding, Wenjian and Zhang, Yao and Wang, Jun and Jatowt, Adam and Yang, Zhenglu. Can We Learn Question, Answer, and Distractors All from an Image? A New Task for Multiple-choice Visual Question Answering. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024

2024
[50]

2005 IEEE computer society conference on computer vision and pattern recognition (CVPR) , volume=

Learning a similarity metric discriminatively, with application to face verification , author=. 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR) , volume=. 2005 , url =

2005
[51]

CDGP : Automatic Cloze Distractor Generation based on Pre-trained Language Model

Chiang, Shang-Hsuan and Wang, Ssu-Cheng and Fan, Yao-Chung. CDGP : Automatic Cloze Distractor Generation based on Pre-trained Language Model. Findings of the Association for Computational Linguistics (EMNLP). 2022

2022
[52]

International conference on machine learning (ICML) , pages=

A simple framework for contrastive learning of visual representations , author=. International conference on machine learning (ICML) , pages=
[53]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Contrastnet: A contrastive learning framework for few-shot text classification , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[54]

FAST -- An Automatic Generation System for Grammar Tests

Chen, Chia-Yin and Liou, Hsien-Chin and Chang, Jason S. FAST -- An Automatic Generation System for Grammar Tests. Proceedings of the COLING / ACL 2006 Interactive Presentation Sessions. 2006. doi:10.3115/1225403.1225404

work page doi:10.3115/1225403.1225404 2006
[55]

Enriching Word Vectors with Subword Information

Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics (TACL). 2017

2017
[58]

CoNT: Contrastive Neural Text Generation , url =

An, Chenxin and Feng, Jiangtao and Lv, Kai and Kong, Lingpeng and Qiu, Xipeng and Huang, Xuanjing , booktitle =. CoNT: Contrastive Neural Text Generation , url =
[59]

and Zhang, Wei Emma and Zaib, Munazza and Alhazmi, Ahoud

Alhazmi, Elaf and Sheng, Quan Z. and Zhang, Wei Emma and Zaib, Munazza and Alhazmi, Ahoud. Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2024

2024
[60]

1145/219717.219748

WordNet: a lexical database for English , author=. Communications of the ACM , url = "https://dl.acm.org/doi/10.1145/219717.219748", volume=

work page doi:10.1145/219717.219748
[62]

ACM Transactions on Information Systems (TOIS) , url =

A review on question generation from natural language text , author=. ACM Transactions on Information Systems (TOIS) , url =
[63]

IEEE Transactions on Learning Technologies , url =

Automatic multiple choice question generation from text: A survey , author=. IEEE Transactions on Learning Technologies , url =. 2018 , publisher=

2018
[64]

ACM Computing Survey , url =

A Survey of Natural Language Generation , author=. ACM Computing Survey , url =
[65]

Proceedings of the Knowledge Capture Conference (K-CAP) , url =

Distractor generation with generative adversarial nets for automatically creating fill-in-the-blank questions , author=. Proceedings of the Knowledge Capture Conference (K-CAP) , url =
[66]

Learning From the Source Document: Unsupervised Abstractive Summarization

Zhuang, Haojie and Zhang, Wei Emma and Yang, Jian and Ma, Congbo and Qu, Yutong and Sheng, Quan Z. Learning From the Source Document: Unsupervised Abstractive Summarization. Findings of the Association for Computational Linguistics (EMNLP). 2022

2022
[68]

International Conference on Knowledge Science, Engineering and Management , pages=

Optimization Strategies for Knowledge Graph Based Distractor Generation , author=. International Conference on Knowledge Science, Engineering and Management , pages=. 2024 , organization=

2024
[69]

Liang, Chen and Yang, Xiao and Dave, Neisarg and Wham, Drew and Pursel, Bart and Giles, C. Lee. Distractor Generation for Multiple Choice Questions Using L earning to R ank. Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications (BEA). 2018

2018
[70]

Artificial Intelligence Review , volume=

Transformer-enhanced hierarchical encoding with multi-decoder for diversified MCQ distractor generation , author=. Artificial Intelligence Review , volume=. 2025 , url =

2025
[71]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 2019

2019
[72]

PeerJ Computer Science , volume=

Automatic distractor generation in multiple-choice questions: A systematic literature review , author=. PeerJ Computer Science , volume=. 2024 , publisher=

2024
[73]

https://aclanthology.org/2024.emnlp-main.512/

DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , url = "https://aclanthology.org/2024.emnlp-main.512/", pages=

2024
[74]

Chain-of-Exemplar: Enhancing Distractor Generation for Multimodal Educational Question Generation

Luo, Haohao and Deng, Yang and Shen, Ying and Ng, See-Kiong and Chua, Tat-Seng. Chain-of-Exemplar: Enhancing Distractor Generation for Multimodal Educational Question Generation. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024

2024
[76]

A Survey on In-context Learning

Dong, Qingxiu and Li, Lei and Dai, Damai and Zheng, Ce and Ma, Jingyuan and Li, Rui and Xia, Heming and Xu, Jingjing and Wu, Zhiyong and Chang, Baobao and Sun, Xu and Li, Lei and Sui, Zhifang. A Survey on In-context Learning. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024

2024
[77]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , url =

Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and ichter, brian and Xia, Fei and Chi, Ed and Le, Quoc V and Zhou, Denny , booktitle =. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , url =
[78]

, author=

Lora: Low-rank adaptation of large language models. , author=. ICLR , volume=
[79]

Prefix-Tuning: Optimizing Continuous Prompts for Generation , author=. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , pages=
[80]

QLoRA: Efficient Finetuning of Quantized LLMs , url =

Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke , booktitle =. QLoRA: Efficient Finetuning of Quantized LLMs , url =
[81]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=
[82]

Journal of Machine Learning Research , volume=

Palm: Scaling language modeling with pathways , author=. Journal of Machine Learning Research , volume=
[83]

LLaMA: Open and Efficient Foundation Language Models

Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[84]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=
[85]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages=

2019
[86]

Think Both Ways: Teacher-Student Bidirectional Reasoning Enhances MCQ Generation and Distractor Quality

Qiu, Yimiao and Deng, Yang and Yao, Quanming and Zhang, Zhimeng and Dong, Zhiang and Yao, Chang and Chen, Jingyuan. Think Both Ways: Teacher-Student Bidirectional Reasoning Enhances MCQ Generation and Distractor Quality. Findings of the Association for Computational Linguistics: ACL 2025. 2025

2025
[87]

and Zhang, Wei Emma and Thanoon, Mohammed I

Alhazmi, Elaf and Sheng, Quan Z. and Zhang, Wei Emma and Thanoon, Mohammed I. and Zhuang, Haojie and Soltani, Behnaz and Zaib, Munazza. Fine-Tuning Encoder-Decoder Models with Contrastive Learning for In-Context Distractor Generation. Findings of the Association for Computational Linguistics (EMNLP). 2025

2025

Showing first 80 references.