Recognition: unknown
Beyond Fine-Tuning: In-Context Learning and Chain-of-Thought for Reasoned Distractor Generation
Pith reviewed 2026-05-10 05:48 UTC · model grok-4.3
The pith
Large language models prompted with few-shot examples and chain-of-thought generate superior reasoned distractors compared to fine-tuned models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By applying in-context learning to LLMs for distractor generation, the authors demonstrate that few-shot prompting with retrieved examples combined with chain-of-thought rationale generation produces distractors that are more plausible and better aligned with human benchmarks than those from fine-tuned encoder-decoder models with contrastive learning.
What carries the argument
The rationale-augmented distractor generation framework, which retrieves few-shot examples using unsupervised semantic similarity and prompts LLMs to output both distractors and step-by-step rationales for their selection.
If this is right
- Prompted LLMs can replace or augment fine-tuning for this task without additional training data or compute for model updates.
- The inclusion of rationales makes the generated distractors more interpretable and closer to expert reasoning.
- Performance gains hold across domains with different question types and distractor lengths.
- The method achieves state-of-the-art results on all six evaluated benchmarks.
Where Pith is reading between the lines
- This could extend to generating other types of educational content that require plausible incorrect options, such as in adaptive testing systems.
- If the approach generalizes, it might lower barriers for creating high-quality assessments in specialized fields where experts are scarce.
- Future work could test whether the same framework improves performance on related tasks like generating explanations for correct answers.
Load-bearing premise
The assumption that the chain-of-thought rationales produced by the LLM will consistently mirror the hidden reasoning steps that human experts use to choose effective distractors on the benchmarks.
What would settle it
A human evaluation study on a held-out set of questions where experts rate the generated distractors and rationales as less plausible or less aligned than those from previous fine-tuned models would disprove the performance advantage.
Figures
read the original abstract
Distractor generation (DG) remains a labor-intensive task that still significantly depends on domain experts. The task focuses on generating plausible yet incorrect options, known as distractors, for multiple-choice questions. A reliable distractor must be contextually relevant to the question and able to mislead examinees through implicit reasoning when identifying the correct answer. While a recent method integrates fine-tuning pre-trained encoder-decoder models with contrastive learning to generate semantically relevant distractors for a given question-answer, it often fails to capture the underlying reasoning process that experts utilize when selecting distractors in benchmarks. In this paper, we explore large language models (LLMs) reasoning for DG through in-context learning with unsupervised semantic retrieval for selecting few-shot examples. We design a rationale-augmented DG framework that jointly generates distractors and their rationales for a given question-answer. Extensive experiments on six benchmarks, with varying average distractor lengths and domains, demonstrate that prompting LLMs with few-shot examples substantially improves the performance compared to recent DG models. It outperforms recent approaches and achieves state-of-the-art results in generating reasoned distractors that align with human-labeled benchmarks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a rationale-augmented distractor generation (DG) framework for multiple-choice questions that uses large language models via in-context learning. Few-shot examples are selected through unsupervised semantic retrieval, and chain-of-thought prompting is employed to jointly produce distractors along with their rationales. The central claim is that this prompting-based approach substantially outperforms recent fine-tuned encoder-decoder models with contrastive learning, achieving state-of-the-art results on six benchmarks with varying domains and distractor lengths by generating reasoned distractors that align with human-labeled data.
Significance. If the empirical claims hold under rigorous validation, the work could meaningfully shift distractor generation away from resource-intensive fine-tuning toward more flexible LLM prompting strategies, lowering barriers for creating high-quality educational assessments. The explicit inclusion of rationale generation addresses a noted gap in prior DG methods regarding implicit reasoning capture. However, the absence of detailed metrics and rationale-specific validation in the provided description limits assessment of whether this represents a genuine advance over existing approaches.
major comments (3)
- [Abstract] Abstract: The abstract asserts SOTA results on six benchmarks yet supplies no metrics, baselines, error bars, statistical tests, or ablation details; the full evaluation protocol is absent, which is load-bearing for the central empirical claim of outperforming recent DG models.
- [§4 (Experiments)] §4 (Experiments): The reported results appear to rely on automatic metrics (e.g., BLEU, ROUGE, semantic similarity) applied only to the generated distractors, without separate human or expert evaluation to verify that the accompanying rationales match the implicit reasoning experts used when labeling the human benchmarks. This directly undermines the 'reasoned distractors that align with human-labeled benchmarks' component of the SOTA claim.
- [§3 (Method)] §3 (Method): The unsupervised semantic retrieval mechanism for selecting few-shot examples lacks sufficient implementation details (embedding model, similarity function, number of shots, and any filtering criteria), preventing assessment of its contribution or reproducibility; no ablation is described comparing it to random or other selection strategies.
minor comments (2)
- [§2 (Related Work)] The related work section could benefit from explicit comparison tables summarizing prior DG methods' performance on the same six benchmarks to contextualize the claimed improvements.
- [§3 (Method)] Notation for the rationale-augmented prompt template is introduced without a clear formal definition or example in the main text, making the framework description harder to follow.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below, providing clarifications where appropriate and outlining planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The abstract asserts SOTA results on six benchmarks yet supplies no metrics, baselines, error bars, statistical tests, or ablation details; the full evaluation protocol is absent, which is load-bearing for the central empirical claim of outperforming recent DG models.
Authors: We agree that the abstract would benefit from greater specificity to support the SOTA claim. In the revised version, we will incorporate key quantitative highlights, including average performance improvements over the strongest baselines across the six benchmarks and the primary metrics employed. The full details on baselines, error bars, statistical tests, and ablations are already reported in Section 4; we will ensure the abstract more clearly references the evaluation protocol. revision: yes
-
Referee: [§4 (Experiments)] §4 (Experiments): The reported results appear to rely on automatic metrics (e.g., BLEU, ROUGE, semantic similarity) applied only to the generated distractors, without separate human or expert evaluation to verify that the accompanying rationales match the implicit reasoning experts used when labeling the human benchmarks. This directly undermines the 'reasoned distractors that align with human-labeled benchmarks' component of the SOTA claim.
Authors: The evaluation in the manuscript centers on automatic metrics that quantify how closely the generated distractors align with those in the human-labeled benchmarks; superior performance under these metrics is presented as evidence that the rationale-augmented prompting better captures the implicit reasoning used by experts. We did not conduct separate human evaluation specifically on the rationales themselves. To address the concern, we will add a targeted discussion of this limitation and include a small-scale expert assessment of rationale quality in the revised manuscript. revision: partial
-
Referee: [§3 (Method)] §3 (Method): The unsupervised semantic retrieval mechanism for selecting few-shot examples lacks sufficient implementation details (embedding model, similarity function, number of shots, and any filtering criteria), preventing assessment of its contribution or reproducibility; no ablation is described comparing it to random or other selection strategies.
Authors: We appreciate the referee's emphasis on reproducibility. In the revised manuscript, Section 3 will be expanded to specify the embedding model, similarity function (cosine similarity), number of shots, and any filtering criteria. We will also add an ablation study comparing semantic retrieval against random selection and alternative strategies to quantify its contribution. revision: yes
Circularity Check
No circularity: purely empirical prompting evaluation
full rationale
The paper describes an empirical study using in-context learning and chain-of-thought prompting on LLMs for distractor generation, with experiments on six benchmarks comparing against prior DG models. No equations, derivations, fitted parameters, or self-citations are used to derive claims; results are reported via standard metrics on held-out benchmarks. The central claim reduces to observed performance improvements rather than any self-referential construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can perform reasoning for distractor generation when provided with few-shot examples and chain-of-thought prompts.
Reference graph
Works this paper leans on
-
[1]
Trainable Hard Negative Examples in Contrastive Learning for Unsupervised Abstractive Summarization
Zhuang, Haojie and Zhang, Wei Emma and Dong, Chang and Yang, Jian and Sheng, Quan. Trainable Hard Negative Examples in Contrastive Learning for Unsupervised Abstractive Summarization. Findings of the Association for Computational Linguistics: EACL 2024. 2024
2024
-
[2]
Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) , url =
Visual7w: Grounded Question Answering in Images , author=. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) , url =
-
[3]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Unsupervised sentence representation via contrastive learning with mixing negatives , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=. 2022 , url =
2022
-
[4]
Fine-grained Contrastive Learning for Definition Generation
Zhang, Hengyuan and Li, Dawei and Yang, Shiping and Li, Yanran. Fine-grained Contrastive Learning for Definition Generation. Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2022
2022
-
[5]
International Conference on Web Information Systems Engineering (WISE) , pages=
Learning Contrastive Representations for Dense Passage Retrieval in Open-Domain Conversational Question Answering , author=. International Conference on Web Information Systems Engineering (WISE) , pages=. 2024 , url =
2024
-
[6]
Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration
Yu, Han Cheng and Shih, Yu An and Law, Kin Man and Hsieh, KaiYu and Cheng, Yu Chen and Ho, Hsin Chih and Lin, Zih An and Hsu, Wen-Chuan and Fan, Yao-Chung. Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration. Findings of the Association for Computational Linguistics (ACL). 2024
2024
-
[7]
Distractor Generation for Fill-in-the-Blank Exercises by Question Type
Yoshimi, Nana and Kajiwara, Tomoyuki and Uchida, Satoru and Arase, Yuki and Ninomiya, Takashi. Distractor Generation for Fill-in-the-Blank Exercises by Question Type. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) (Volume 4: Student Research Workshop). 2023
2023
-
[8]
R ecipe QA : A Challenge Dataset for Multimodal Comprehension of Cooking Recipes
Yagcioglu, Semih and Erdem, Aykut and Erdem, Erkut and Ikizler-Cinbis, Nazli. R ecipe QA : A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2018
2018
-
[9]
S im CSE ++: Improving Contrastive Learning for Sentence Embeddings from Two Perspectives
Xu, Jiahao and Shao, Wei and Chen, Lihui and Liu, Lemao. S im CSE ++: Improving Contrastive Learning for Sentence Embeddings from Two Perspectives. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.737
-
[10]
Large-scale Cloze Test Dataset Created by Teachers
Xie, Qizhe and Lai, Guokun and Dai, Zihang and Hovy, Eduard. Large-scale Cloze Test Dataset Created by Teachers. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2018
2018
-
[11]
IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) , volume=
Diverse distractor generation for constructing high-quality multiple choice questions , author=. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) , volume=. 2021 , publisher=
2021
-
[12]
Proceedings of the 2012 ACM SIGMOD international conference on management of data , url =
Probase: A probabilistic taxonomy for text understanding , author=. Proceedings of the 2012 ACM SIGMOD international conference on management of data , url =
2012
-
[13]
PCL : Peer-Contrastive Learning with Diverse Augmentations for Unsupervised Sentence Embeddings
Wu, Qiyu and Tao, Chongyang and Shen, Tao and Xu, Can and Geng, Xiubo and Jiang, Daxin. PCL : Peer-Contrastive Learning with Diverse Augmentations for Unsupervised Sentence Embeddings. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2022. doi:10.18653/v1/2022.emnlp-main.826
-
[15]
and Gardner, Matt
Welbl, Johannes and Liu, Nelson F. and Gardner, Matt. Crowdsourcing Multiple Choice Science Questions. Proceedings of the 3rd Workshop on Noisy User-generated Text (WNUT). 2017
2017
-
[16]
Distractor Generation based on T ext2 T ext Language Models with Pseudo K ullback- L eibler Divergence Regulation
Wang, Hui-Juan and Hsieh, Kai-Yu and Yu, Han-Cheng and Tsou, Jui-Ching and Shih, Yu An and Huang, Chen-Hua and Fan, Yao-Chung. Distractor Generation based on T ext2 T ext Language Models with Pseudo K ullback- L eibler Divergence Regulation. Findings of the Association for Computational Linguistics: ACL 2023. 2023
2023
-
[17]
Distractor Generation Using Generative and Discriminative Capabilities of Transformer-based Models
Taslimipoor, Shiva and Benedetto, Luca and Felice, Mariano and Buttery, Paula. Distractor Generation Using Generative and Discriminative Capabilities of Transformer-based Models. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING). 2024
2024
-
[18]
A Contrastive Framework for Neural Text Generation , url =
Su, Yixuan and Lan, Tian and Wang, Yan and Yogatama, Dani and Kong, Lingpeng and Collier, Nigel , booktitle =. A Contrastive Framework for Neural Text Generation , url =
-
[19]
Improved Deep Metric Learning with Multi-class N-pair Loss Objective , url =
Sohn, Kihyuk , booktitle =. Improved Deep Metric Learning with Multi-class N-pair Loss Objective , url =
-
[20]
Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) , pages=
Facenet: A unified embedding for face recognition and clustering , author=. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) , pages=
-
[21]
https://ojs.aaai.org/index.php/AAAI/article/view/16559
Knowledge-driven distractor generation for cloze-style multiple choice questions , author=. Proceedings of the AAAI conference on artificial intelligence , url = "https://ojs.aaai.org/index.php/AAAI/article/view/16559", volume=
-
[22]
The Journal of Machine Learning Research (JMLR) , url =
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , author=. The Journal of Machine Learning Research (JMLR) , url =
-
[23]
Proceedings of the 38th International Conference on Machine Learning (ICML) , pages =
Learning Transferable Visual Models From Natural Language Supervision , author =. Proceedings of the 38th International Conference on Machine Learning (ICML) , pages =. 2021 , volume =
2021
-
[24]
Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding
Qu, Fanyi and Sun, Hao and Wu, Yunfang. Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding. Findings of the Association for Computational Linguistics (ACL). 2024
2024
-
[25]
Qin, Yujia and Lin, Yankai and Takanobu, Ryuichi and Liu, Zhiyuan and Li, Peng and Ji, Heng and Huang, Minlie and Sun, Maosong and Zhou, Jie. ERICA : Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Inte...
-
[26]
https://www.isca-archive.org/slate_2009/pino09_slate.pdf
Semi-automatic generation of cloze question distractors effect of students’ L1 , author=. International Workshop on Speech and Language Technology in Education , url = "https://www.isca-archive.org/slate_2009/pino09_slate.pdf", year=
-
[27]
G lo V e: Global Vectors for Word Representation
Pennington, Jeffrey and Socher, Richard and Manning, Christopher. G lo V e: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ). 2014
2014
-
[28]
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Pan, Xiao and Wang, Mingxuan and Wu, Liwei and Li, Lei. Contrastive Learning for Many-to-many Multilingual Neural Machine Translation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP). 2021. doi:10.18653/v1/2021.acl-long.21
-
[29]
https://dl.acm.org/doi/abs/10.3115/1118894.1118897
Computer-aided generation of multiple-choice tests , author=. Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing , url ="https://dl.acm.org/doi/abs/10.3115/1118894.1118897", pages=
-
[30]
Semantic Similarity of Distractors in Multiple-Choice Tests: Extrinsic Evaluation
Mitkov, Ruslan and Ha, Le An and Varga, Andrea and Rello, Luz. Semantic Similarity of Distractors in Multiple-Choice Tests: Extrinsic Evaluation. Proceedings of the Workshop on Geometrical Models of Natural Language Semantics (GEMS). 2009
2009
-
[31]
WordNet: A Lexical Reference System and its Application , pages=
Combining local context and WordNet similarity for word sense identification , author=. WordNet: A Lexical Reference System and its Application , pages=
-
[32]
Efficient Estimation of Word Representations in Vector Space
Efficient estimation of word representations in vector space , author=. arXiv preprint arXiv:1301.3781 , url = "https://arxiv.org/pdf/1301.3781", year=
work page internal anchor Pith review arXiv
-
[34]
https://link.springer.com/chapter/10.1007/978-3-031-56063-7_18
A Novel Multi-Stage Prompting Approach for Language Agnostic MCQ Generation Using GPT , author=. Proceedings of European Conference on Information Retrieval (ECIR) , url = "https://link.springer.com/chapter/10.1007/978-3-031-56063-7_18", pages=
-
[35]
Proceedings 16th European Conference Computer Vision (ECCV) , pages=
Oscar: Object-semantics aligned pre-training for vision-language tasks , author=. Proceedings 16th European Conference Computer Vision (ECCV) , pages=. 2020 , url =
2020
-
[36]
BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Veselin and Zettlemoyer, Luke. BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguisti...
2020
-
[37]
International Journal of Artificial Intelligence in Education (IJAIED) , url =
A systematic review of automatic question generation for educational purposes , author=. International Journal of Artificial Intelligence in Education (IJAIED) , url =
-
[38]
R ev UP : Automatic Gap-Fill Question Generation from Educational Texts
Kumar, Girish and Banchs, Rafael and D ' Haro, Luis Fernando. R ev UP : Automatic Gap-Fill Question Generation from Educational Texts. Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications (BEA). 2015
2015
-
[39]
Self-Guided Contrastive Learning for BERT Sentence Representations
Kim, Taeuk and Yoo, Kang Min and Lee, Sang-goo. Self-Guided Contrastive Learning for BERT Sentence Representations. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP). 2021. doi:10.18653/v1/2021.acl-long.197
-
[40]
Dense Passage Retrieval for Open-Domain Question Answering
Karpukhin, Vladimir and Oguz, Barlas and Min, Sewon and Lewis, Patrick and Wu, Ledell and Edunov, Sergey and Chen, Danqi and Yih, Wen-tau. Dense Passage Retrieval for Open-Domain Question Answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020
2020
-
[41]
2006 IEEE computer society conference on computer vision and pattern recognition (CVPR) , volume=
Dimensionality reduction by learning an invariant mapping , author=. 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR) , volume=
2006
-
[42]
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI) , url =
Questimator: generating knowledge assessments for arbitrary topics , author=. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI) , url =
-
[43]
D e CLUTR : Deep Contrastive Learning for Unsupervised Textual Representations
Giorgi, John and Nitski, Osvald and Wang, Bo and Bader, Gary. D e CLUTR : Deep Contrastive Learning for Unsupervised Textual Representations. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP). 2021. doi:10.18653/v1/2021.acl-long.72
-
[44]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Generating distractors for reading comprehension questions from real examinations , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[45]
SimCSE: Simple Contrastive Learning of Sentence Embeddings , booktitle =
Gao, Tianyu and Yao, Xingcheng and Chen, Danqi. S im CSE : Simple Contrastive Learning of Sentence Embeddings. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2021. doi:10.18653/v1/2021.emnlp-main.552
-
[46]
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models
Feng, Wanyong and Lee, Jaewook and McNichols, Hunter and Scarlatos, Alexander and Smith, Digory and Woodhead, Simon and Ornelas, Nancy and Lan, Andrew. Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models. Findings of the Association for Computational Linguistics (NAACL). 2024
2024
-
[47]
https://ieeexplore.ieee.org/abstract/document/10342898
A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education , author=. Proceedings of the 26th Australasian Computing Education Conference (ACE) , url = "https://ieeexplore.ieee.org/abstract/document/10342898", pages=
-
[48]
Closed-book Question Generation via Contrastive Learning
Dong, Xiangjue and Lu, Jiaying and Wang, Jianling and Caverlee, James. Closed-book Question Generation via Contrastive Learning. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL). 2023. doi:10.18653/v1/2023.eacl-main.230
-
[49]
Can We Learn Question, Answer, and Distractors All from an Image? A New Task for Multiple-choice Visual Question Answering
Ding, Wenjian and Zhang, Yao and Wang, Jun and Jatowt, Adam and Yang, Zhenglu. Can We Learn Question, Answer, and Distractors All from an Image? A New Task for Multiple-choice Visual Question Answering. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024
2024
-
[50]
2005 IEEE computer society conference on computer vision and pattern recognition (CVPR) , volume=
Learning a similarity metric discriminatively, with application to face verification , author=. 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR) , volume=. 2005 , url =
2005
-
[51]
CDGP : Automatic Cloze Distractor Generation based on Pre-trained Language Model
Chiang, Shang-Hsuan and Wang, Ssu-Cheng and Fan, Yao-Chung. CDGP : Automatic Cloze Distractor Generation based on Pre-trained Language Model. Findings of the Association for Computational Linguistics (EMNLP). 2022
2022
-
[52]
International conference on machine learning (ICML) , pages=
A simple framework for contrastive learning of visual representations , author=. International conference on machine learning (ICML) , pages=
-
[53]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Contrastnet: A contrastive learning framework for few-shot text classification , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[54]
FAST -- An Automatic Generation System for Grammar Tests
Chen, Chia-Yin and Liou, Hsien-Chin and Chang, Jason S. FAST -- An Automatic Generation System for Grammar Tests. Proceedings of the COLING / ACL 2006 Interactive Presentation Sessions. 2006. doi:10.3115/1225403.1225404
-
[55]
Enriching Word Vectors with Subword Information
Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics (TACL). 2017
2017
-
[58]
CoNT: Contrastive Neural Text Generation , url =
An, Chenxin and Feng, Jiangtao and Lv, Kai and Kong, Lingpeng and Qiu, Xipeng and Huang, Xuanjing , booktitle =. CoNT: Contrastive Neural Text Generation , url =
-
[59]
and Zhang, Wei Emma and Zaib, Munazza and Alhazmi, Ahoud
Alhazmi, Elaf and Sheng, Quan Z. and Zhang, Wei Emma and Zaib, Munazza and Alhazmi, Ahoud. Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2024
2024
-
[60]
WordNet: a lexical database for English , author=. Communications of the ACM , url = "https://dl.acm.org/doi/10.1145/219717.219748", volume=
-
[62]
ACM Transactions on Information Systems (TOIS) , url =
A review on question generation from natural language text , author=. ACM Transactions on Information Systems (TOIS) , url =
-
[63]
IEEE Transactions on Learning Technologies , url =
Automatic multiple choice question generation from text: A survey , author=. IEEE Transactions on Learning Technologies , url =. 2018 , publisher=
2018
-
[64]
ACM Computing Survey , url =
A Survey of Natural Language Generation , author=. ACM Computing Survey , url =
-
[65]
Proceedings of the Knowledge Capture Conference (K-CAP) , url =
Distractor generation with generative adversarial nets for automatically creating fill-in-the-blank questions , author=. Proceedings of the Knowledge Capture Conference (K-CAP) , url =
-
[66]
Learning From the Source Document: Unsupervised Abstractive Summarization
Zhuang, Haojie and Zhang, Wei Emma and Yang, Jian and Ma, Congbo and Qu, Yutong and Sheng, Quan Z. Learning From the Source Document: Unsupervised Abstractive Summarization. Findings of the Association for Computational Linguistics (EMNLP). 2022
2022
-
[68]
International Conference on Knowledge Science, Engineering and Management , pages=
Optimization Strategies for Knowledge Graph Based Distractor Generation , author=. International Conference on Knowledge Science, Engineering and Management , pages=. 2024 , organization=
2024
-
[69]
Liang, Chen and Yang, Xiao and Dave, Neisarg and Wham, Drew and Pursel, Bart and Giles, C. Lee. Distractor Generation for Multiple Choice Questions Using L earning to R ank. Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications (BEA). 2018
2018
-
[70]
Artificial Intelligence Review , volume=
Transformer-enhanced hierarchical encoding with multi-decoder for diversified MCQ distractor generation , author=. Artificial Intelligence Review , volume=. 2025 , url =
2025
-
[71]
BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 2019
2019
-
[72]
PeerJ Computer Science , volume=
Automatic distractor generation in multiple-choice questions: A systematic literature review , author=. PeerJ Computer Science , volume=. 2024 , publisher=
2024
-
[73]
https://aclanthology.org/2024.emnlp-main.512/
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , url = "https://aclanthology.org/2024.emnlp-main.512/", pages=
2024
-
[74]
Chain-of-Exemplar: Enhancing Distractor Generation for Multimodal Educational Question Generation
Luo, Haohao and Deng, Yang and Shen, Ying and Ng, See-Kiong and Chua, Tat-Seng. Chain-of-Exemplar: Enhancing Distractor Generation for Multimodal Educational Question Generation. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024
2024
-
[76]
A Survey on In-context Learning
Dong, Qingxiu and Li, Lei and Dai, Damai and Zheng, Ce and Ma, Jingyuan and Li, Rui and Xia, Heming and Xu, Jingjing and Wu, Zhiyong and Chang, Baobao and Sun, Xu and Li, Lei and Sui, Zhifang. A Survey on In-context Learning. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024
2024
-
[77]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , url =
Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and ichter, brian and Xia, Fei and Chi, Ed and Le, Quoc V and Zhou, Denny , booktitle =. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , url =
-
[78]
, author=
Lora: Low-rank adaptation of large language models. , author=. ICLR , volume=
-
[79]
Prefix-Tuning: Optimizing Continuous Prompts for Generation , author=. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , pages=
-
[80]
QLoRA: Efficient Finetuning of Quantized LLMs , url =
Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke , booktitle =. QLoRA: Efficient Finetuning of Quantized LLMs , url =
-
[81]
OpenAI blog , volume=
Language models are unsupervised multitask learners , author=. OpenAI blog , volume=
-
[82]
Journal of Machine Learning Research , volume=
Palm: Scaling language modeling with pathways , author=. Journal of Machine Learning Research , volume=
-
[83]
LLaMA: Open and Efficient Foundation Language Models
Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[84]
Advances in neural information processing systems , volume=
Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=
-
[85]
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages=
2019
-
[86]
Think Both Ways: Teacher-Student Bidirectional Reasoning Enhances MCQ Generation and Distractor Quality
Qiu, Yimiao and Deng, Yang and Yao, Quanming and Zhang, Zhimeng and Dong, Zhiang and Yao, Chang and Chen, Jingyuan. Think Both Ways: Teacher-Student Bidirectional Reasoning Enhances MCQ Generation and Distractor Quality. Findings of the Association for Computational Linguistics: ACL 2025. 2025
2025
-
[87]
and Zhang, Wei Emma and Thanoon, Mohammed I
Alhazmi, Elaf and Sheng, Quan Z. and Zhang, Wei Emma and Thanoon, Mohammed I. and Zhuang, Haojie and Soltani, Behnaz and Zaib, Munazza. Fine-Tuning Encoder-Decoder Models with Contrastive Learning for In-Context Distractor Generation. Findings of the Association for Computational Linguistics (EMNLP). 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.