iPOE: Interpretable Prompt Optimization via Explanations
Pith reviewed 2026-05-20 10:49 UTC · model grok-4.3
The pith
Guidelines derived from annotation explanations optimize prompts and improve LLM performance by up to 35 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that guiding prompt optimization with guidelines automatically derived from explanations of annotation decisions, refined by a series of operations including removing, adding, shuffling, and merging, results in prompts that are both interpretable and higher-performing for LLMs on classification tasks.
What carries the argument
The iPOE method, which generates a set of guidelines from explanations of annotation decisions and optimizes them via remove, add, shuffle, and merge operations to produce transparent annotation instructions for the LLM.
Load-bearing premise
That guidelines automatically derived from explanations of annotation decisions, when refined by the listed operations, will produce prompts that are both more transparent and measurably higher-performing for LLMs on the target tasks.
What would settle it
A direct comparison on the same four datasets where iPOE guidelines yield no accuracy gain or no increase in transparency relative to prompts without guidelines or with random guidelines.
Figures
read the original abstract
Prompt optimization has often been framed as a discrete search problem to find high-performing and robust instructions for an LLM. However, the search result might not make it transparent why and where specific prompt changes lead to performance gains. This is in contrast to how humans are instructed for annotation tasks. Here, researchers carefully design annotation guidelines, leading to enhanced annotation consistency. Our paper aims at joining these two approaches and introduces iPOE, a novel interpretable prompt optimization strategy via explanations. We guide the prompt optimization process by automatically created guidelines from explanations of annotation decisions (either automatically generated or from humans). This set of guidelines is furthermore optimized by as series of operations, including removing, adding, shuffling, and merging. The resulting prompt includes guidelines that instruct the annotation, making the decision process of the LLM and the optimization transparent. It therefore supports also laypeople in the area of prompt optimization, particularly in challenging domains requiring expertise. In our experiments on four datasets, we find that iPOE can improves over prompts without guidelines and with random selected guidelines by up to $31\%$ and $35\%$, respectively. Moreover, LLM explanations can replace human explanations in the proposed method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces iPOE, a method for interpretable prompt optimization that automatically derives annotation guidelines from explanations of decisions (human or LLM-generated), refines this set via operations including removal, addition, shuffling, and merging, and incorporates the guidelines into LLM prompts. Experiments across four datasets demonstrate that iPOE prompts outperform those without guidelines by up to 31% and those with randomly selected guidelines by up to 35%, while also showing that LLM explanations can effectively replace human explanations.
Significance. If the empirical results hold after addressing potential confounds in the experimental design, this work would meaningfully advance prompt optimization research by aligning it with human annotation guideline practices. It offers a route to more transparent and accessible prompt engineering, with practical benefits for non-expert users in specialized domains and the demonstrated feasibility of substituting LLM-generated explanations for human ones.
major comments (2)
- [Experimental section] Experimental section (and abstract): The reported performance gains of up to 31% and 35% are presented without details on dataset sizes, baseline prompt construction procedures, statistical testing, or any controls that match total instruction length, number of guidelines, or structural complexity between the iPOE condition and the no-guideline/random-guideline baselines. This omission leaves open the possibility that gains arise from added prompt elaboration rather than from the explanation-derived guideline content or the listed refinement operations, which is load-bearing for the central claim that the proposed derivation process drives the improvements.
- [§3 (Method)] §3 (Method): The description of the guideline refinement operations (remove, add, shuffle, merge) does not specify selection criteria, ordering, or iteration limits, nor does it include an ablation isolating their contribution from the baseline effect of simply providing more detailed instructions. Without this, it is difficult to confirm that the transparency and performance benefits are attributable to the iPOE process rather than generic prompt expansion.
minor comments (1)
- [Abstract] Abstract: 'as series of operations' should read 'a series of operations'; 'can improves' should read 'can improve'.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below, providing clarifications where possible and committing to revisions that strengthen the experimental rigor and methodological transparency of the manuscript.
read point-by-point responses
-
Referee: [Experimental section] Experimental section (and abstract): The reported performance gains of up to 31% and 35% are presented without details on dataset sizes, baseline prompt construction procedures, statistical testing, or any controls that match total instruction length, number of guidelines, or structural complexity between the iPOE condition and the no-guideline/random-guideline baselines. This omission leaves open the possibility that gains arise from added prompt elaboration rather than from the explanation-derived guideline content or the listed refinement operations, which is load-bearing for the central claim that the proposed derivation process drives the improvements.
Authors: We agree that additional experimental details are necessary to rule out confounds from prompt length or elaboration. In the revised manuscript we will report exact dataset sizes and splits for all four datasets, provide a precise description of baseline prompt construction (including how no-guideline and random-guideline prompts were generated), include statistical significance tests (paired t-tests or McNemar’s test across multiple seeds), and add length- and complexity-matched controls. While the existing random-guideline baseline already holds the number of guidelines constant, we will introduce an explicit length-matched baseline that adds generic elaboration without explanation-derived content. These changes will directly address whether the observed gains are attributable to the iPOE derivation and refinement process. revision: yes
-
Referee: [§3 (Method)] §3 (Method): The description of the guideline refinement operations (remove, add, shuffle, merge) does not specify selection criteria, ordering, or iteration limits, nor does it include an ablation isolating their contribution from the baseline effect of simply providing more detailed instructions. Without this, it is difficult to confirm that the transparency and performance benefits are attributable to the iPOE process rather than generic prompt expansion.
Authors: We acknowledge that §3 requires greater specificity. The revised manuscript will expand the description of each operation with explicit selection criteria (e.g., removal of redundant or low-impact guidelines based on validation-set performance, addition of new guidelines derived from remaining explanations, shuffling to test robustness, and merging for conciseness), the sequence in which operations are applied, and iteration limits (e.g., until validation performance stabilizes). We will also add an ablation that compares the full iPOE pipeline against a control condition receiving an equivalent volume of additional instructions generated without the explanation-derived guidelines or the listed refinement operations. This ablation will help isolate the contribution of the iPOE process from generic prompt expansion. revision: yes
Circularity Check
No significant circularity; empirical method evaluated against external baselines
full rationale
The paper describes a procedural algorithm: guidelines are automatically derived from annotation explanations (human or LLM-generated), then refined via explicit operations (remove, add, shuffle, merge) before insertion into the prompt. Performance is assessed via direct empirical comparisons on four datasets against two external baselines (prompts without guidelines; prompts with randomly selected guidelines), reporting relative gains of up to 31% and 35%. No equations, fitted parameters, or self-referential definitions appear in the provided text. The central claims rest on measurable task accuracy rather than reducing to construction by definition, self-citation chains, or renaming of prior results. The derivation chain is therefore self-contained as an engineering procedure whose validity is tested externally.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Liu, Pengfei and Yuan, Weizhe and Fu, Jinlan and Jiang, Zhengbao and Hayashi, Hiroaki and Neubig, Graham , title =. ACM Comput. Surv. , month = jan, articleno =. 2023 , issue_date =. doi:10.1145/3560815 , abstract =
-
[2]
Large Language Models for Data Annotation and Synthesis: A Survey
Tan, Zhen and Li, Dawei and Wang, Song and Beigi, Alimohammad and Jiang, Bohan and Bhattacharjee, Amrita and Karami, Mansooreh and Li, Jundong and Cheng, Lu and Liu, Huan. Large Language Models for Data Annotation and Synthesis: A Survey. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emn...
-
[3]
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
Mishra, Swaroop and Khashabi, Daniel and Baral, Chitta and Hajishirzi, Hannaneh. Cross-Task Generalization via Natural Language Crowdsourcing Instructions. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.244
-
[4]
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Lu, Yao and Bartolo, Max and Moore, Alastair and Riedel, Sebastian and Stenetorp, Pontus. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.556
-
[5]
Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models
Tang, Raphael and Zhang, Crystina and Ma, Xueguang and Lin, Jimmy and Ture, Ferhan. Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 202...
-
[6]
Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems , articleno =
Reynolds, Laria and McDonell, Kyle , title =. Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems , articleno =. 2021 , isbn =. doi:10.1145/3411763.3451760 , abstract =
-
[7]
Association for Computing Machinery, New York, NY, USA, Article 437, 21 pages
Zamfirescu-Pereira, J.D. and Wong, Richmond Y. and Hartmann, Bjoern and Yang, Qian , title =. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems , articleno =. 2023 , isbn =. doi:10.1145/3544548.3581388 , abstract =
-
[8]
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLM s
Calderon, Nitay and Reichart, Roi. On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLM s. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.naacl-long.29
-
[9]
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , year =
Huang, Jiaxin and Gu, Shixiang and Hou, Le and Wu, Yuexin and Wang, Xuezhi and Yu, Hongkun and Han, Jiawei. Large Language Models Can Self-Improve. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.67
-
[10]
How Interpretable are Reasoning Explanations from Prompting Large Language Models?
Wei Jie, Yeo and Satapathy, Ranjan and Goh, Rick and Cambria, Erik. How Interpretable are Reasoning Explanations from Prompting Large Language Models?. Findings of the Association for Computational Linguistics: NAACL 2024. 2024. doi:10.18653/v1/2024.findings-naacl.138
-
[11]
A survey on improving NLP models with human explanations
Hartmann, Mareike and Sonntag, Daniel. A survey on improving NLP models with human explanations. Proceedings of the First Workshop on Learning with Natural Language Supervision. 2022. doi:10.18653/v1/2022.lnls-1.5
-
[12]
Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study , author=. 2025 , eprint=
work page 2025
-
[13]
Explanations from Large Language Models Make Small Reasoners Better , author=. 2022 , eprint=
work page 2022
-
[14]
Properties and Challenges of LLM -Generated Explanations
Kunz, Jenny and Kuhlmann, Marco. Properties and Challenges of LLM -Generated Explanations. Proceedings of the Third Workshop on Bridging Human--Computer Interaction and Natural Language Processing. 2024. doi:10.18653/v1/2024.hcinlp-1.2
-
[15]
Reframing Human- AI Collaboration for Generating Free-Text Explanations
Wiegreffe, Sarah and Hessel, Jack and Swayamdipta, Swabha and Riedl, Mark and Choi, Yejin. Reframing Human- AI Collaboration for Generating Free-Text Explanations. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022. doi:10.18653/v1/2022.naacl-main.47
-
[16]
Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , articleno =
Wang, Xinru and Kim, Hannah and Rahman, Sajjadur and Mitra, Kushan and Miao, Zhengjie , title =. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , articleno =. 2024 , isbn =. doi:10.1145/3613904.3641960 , abstract =
-
[17]
Knowledge Graphs as Context Sources for LLM-Based Explanations of Learning Recommendations , year=
Abu-Rasheed, Hasan and Weber, Christian and Fathi, Madjid , booktitle=. Knowledge Graphs as Context Sources for LLM-Based Explanations of Learning Recommendations , year=
-
[18]
Lubos, Sebastian and Tran, Thi Ngoc Trang and Felfernig, Alexander and Polat Erdeniz, Seda and Le, Viet-Man , title =. Adjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization , pages =. 2024 , isbn =. doi:10.1145/3631700.3665185 , abstract =
-
[19]
Are self-explanations from Large Language Models faithful?
Madsen, Andreas and Chandar, Sarath and Reddy, Siva. Are self-explanations from Large Language Models faithful?. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.19
-
[20]
ELI -Why: Evaluating the Pedagogical Utility of Language Model Explanations
Joshi, Brihi and He, Keyu and Ramnath, Sahana and Sabouri, Sadra and Zhou, Kaitlyn and Chattopadhyay, Souti and Swayamdipta, Swabha and Ren, Xiang. ELI -Why: Evaluating the Pedagogical Utility of Language Model Explanations. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.1306
-
[21]
e-SNLI: natural language inference with natural language explanations , year =
Camburu, Oana-Maria and Rockt\". e-SNLI: natural language inference with natural language explanations , year =. Proceedings of the 32nd International Conference on Neural Information Processing Systems , pages =
-
[22]
Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations , author=. 2023 , eprint=
work page 2023
-
[23]
Di Bonaventura, Chiara and Siciliani, Lucia and Basile, Pierpaolo and Merono Penuela, Albert and Mcgillivray, Barbara. Is Explanation All You Need? An Expert Survey on LLM -generated Explanations for Abusive Language Detection. Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024). 2024
work page 2024
-
[24]
and Jiang, Jing and Liao, Lizi
Liang, Jinggui and Vo, Dung and Xian, Yap Hong and Chieu, Hai Leong and Chai, Kian Ming A. and Jiang, Jing and Liao, Lizi. Colloquial Singaporean E nglish Style Transfer with Fine-Grained Explainable Control. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.1309
-
[25]
Multi-Level Explanations for Generative Language Models
Monteiro Paes, Lucas and Wei, Dennis and Do, Hyo Jin and Strobelt, Hendrik and Luss, Ronny and Dhurandhar, Amit and Nagireddy, Manish and Natesan Ramamurthy, Karthikeyan and Sattigeri, Prasanna and Geyer, Werner and Ghosh, Soumya. Multi-Level Explanations for Generative Language Models. Proceedings of the 63rd Annual Meeting of the Association for Computa...
-
[26]
LLM s are Biased Evaluators But Not Biased for Fact-Centric Retrieval Augmented Generation
Chen, Yen-Shan and Jin, Jing and Kuo, Peng-Ting and Huang, Chao-Wei and Chen, Yun-Nung. LLM s are Biased Evaluators But Not Biased for Fact-Centric Retrieval Augmented Generation. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.1369
-
[27]
Quantifying Label-Induced Bias in Large Language Model Self- and Cross-Evaluations , author=. 2025 , eprint=
work page 2025
-
[28]
Randl, Korbinian and Pavlopoulos, John and Henriksson, Aron and Lindgren, Tony , title =. Discovery Science: 27th International Conference, DS 2024, Pisa, Italy, October 14–16, 2024, Proceedings, Part I , pages =. 2025 , isbn =. doi:10.1007/978-3-031-78977-9_3 , abstract =
-
[29]
Chen, Beiduo and Peng, Siyao and Korhonen, Anna and Plank, Barbara. A Rose by Any Other Name: LLM -Generated Explanations Are Good Proxies for Human Explanations to Collect Label Distributions on NLI. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.562
-
[30]
The Effect of Model Size on LLM Post-hoc Explainability via LIME , author=. 2024 , eprint=
work page 2024
-
[31]
Proceedings of the 13th Hellenic Conference on Artificial Intelligence , articleno =
Fragkathoulas, Christos and Chlapanis, Odysseas Spyridon , title =. Proceedings of the 13th Hellenic Conference on Artificial Intelligence , articleno =. 2024 , isbn =. doi:10.1145/3688671.3688775 , abstract =
-
[32]
In-Context Explainers: Harnessing LLMs for Explaining Black Box Models , author=. 2024 , eprint=
work page 2024
-
[33]
CELL your Model: Contrastive Explanations for Large Language Models , author=. 2025 , eprint=
work page 2025
-
[34]
Automatic Prompt Optimization with ``Gradient Descent'' and Beam Search
Pryzant, Reid and Iter, Dan and Li, Jerry and Lee, Yin and Zhu, Chenguang and Zeng, Michael. Automatic Prompt Optimization with ``Gradient Descent'' and Beam Search. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.494
-
[35]
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training
Cheng, Jiale and Liu, Xiao and Zheng, Kehan and Ke, Pei and Wang, Hongning and Dong, Yuxiao and Tang, Jie and Huang, Minlie. Black-Box Prompt Optimization: Aligning Large Language Models without Model Training. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.176
-
[36]
Robust Prompt Optimization for Large Language Models Against Distribution Shifts
Li, Moxin and Wang, Wenjie and Feng, Fuli and Cao, Yixin and Zhang, Jizhi and Chua, Tat-Seng. Robust Prompt Optimization for Large Language Models Against Distribution Shifts. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.95
-
[37]
EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers , author=. 2025 , eprint=
work page 2025
-
[38]
Emotion-Conditioned Text Generation through Automatic Prompt Optimization
Resendiz, Yarik Menchaca and Klinger, Roman. Emotion-Conditioned Text Generation through Automatic Prompt Optimization. Proceedings of the 1st Workshop on Taming Large Language Models: Controllability in the era of Interactive Assistants!. 2023
work page 2023
-
[39]
MOPO : Multi-Objective Prompt Optimization for Affective Text Generation
Menchaca Resendiz, Yarik and Klinger, Roman. MOPO : Multi-Objective Prompt Optimization for Affective Text Generation. Proceedings of the 31st International Conference on Computational Linguistics. 2025
work page 2025
-
[40]
MAPO : Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization
Chen, Yuyan and Wen, Zhihao and Fan, Ge and Chen, Zhengyu and Wu, Wei and Liu, Dayiheng and Li, Zhixu and Liu, Bang and Xiao, Yanghua. MAPO : Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.215
-
[41]
Discrete Prompt Optimization via Constrained Generation for Zero-shot Re-ranker
Cho, Sukmin and Jeong, Soyeong and Seo, Jeong yeon and Park, Jong. Discrete Prompt Optimization via Constrained Generation for Zero-shot Re-ranker. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.61
-
[42]
G r IPS : Gradient-free, Edit-based Instruction Search for Prompting Large Language Models
Prasad, Archiki and Hase, Peter and Zhou, Xiang and Bansal, Mohit. G r IPS : Gradient-free, Edit-based Instruction Search for Prompting Large Language Models. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. 2023. doi:10.18653/v1/2023.eacl-main.277
-
[43]
RLP rompt: Optimizing Discrete Text Prompts with Reinforcement Learning
Deng, Mingkai and Wang, Jianyu and Hsieh, Cheng-Ping and Wang, Yihan and Guo, Han and Shu, Tianmin and Song, Meng and Xing, Eric and Hu, Zhiting. RLP rompt: Optimizing Discrete Text Prompts with Reinforcement Learning. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.222
-
[44]
Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models
Singla, Somanshu and Wang, Zhen and Liu, Tianyang and Ashfaq, Abdullah and Hu, Zhiting and Xing, Eric P. Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1220
-
[45]
Chen, Yongchao and Arkin, Jacob and Hao, Yilun and Zhang, Yang and Roy, Nicholas and Fan, Chuchu. PR ompt Optimization in Multi-Step Tasks ( PROMST ): Integrating Human Feedback and Heuristic-based Sampling. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.226
-
[46]
AMPO : Automatic Multi-Branched Prompt Optimization
Yang, Sheng and Wu, Yurong and Gao, Yan and Zhou, Zineng and Zhu, Bin Benjamin and Sun, Xiaodi and Lou, Jian-Guang and Ding, Zhiming and Hu, Anbang and Fang, Yuan and Li, Yunsong and Chen, Junyan and Yang, Linjun. AMPO : Automatic Multi-Branched Prompt Optimization. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 20...
-
[47]
Jain, Yash and Chowdhary, Vishal. Local Prompt Optimization. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 2025. doi:10.18653/v1/2025.naacl-short.7
-
[48]
and Kawaguchi, Kenji and Shieh, Michael and He, Junxian
Do, Xuan Long and Zhao, Yiran and Brown, Hannah and Xie, Yuxi and Zhao, James Xu and Chen, Nancy F. and Kawaguchi, Kenji and Shieh, Michael and He, Junxian. Prompt Optimization via Adversarial In-Context Learning. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl...
-
[49]
S tra G o: Harnessing Strategic Guidance for Prompt Optimization
Wu, Yurong and Gao, Yan and Zhu, Bin Benjamin and Zhou, Zineng and Sun, Xiaodi and Yang, Sheng and Lou, Jian-Guang and Ding, Zhiming and Yang, Linjun. S tra G o: Harnessing Strategic Guidance for Prompt Optimization. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.588
-
[50]
Dual-Phase Accelerated Prompt Optimization
Yang, Muchen and Li, Moxin and Li, Yongle and Chen, Zijun and Gao, Chongming and Zhang, Junqi and Li, Yangyang and Feng, Fuli. Dual-Phase Accelerated Prompt Optimization. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.709
-
[51]
Proceedings of the 36th International Conference on Neural Information Processing Systems , url =
Ye, Xi and Durrett, Greg , title =. Proceedings of the 36th International Conference on Neural Information Processing Systems , url =. 2022 , isbn =
work page 2022
-
[52]
Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation
Chen, Beiduo and Liu, Yang Janet and Korhonen, Anna and Plank, Barbara. Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1682
-
[53]
Are Humans as Brittle as Large Language Models?
Li, Jiahui and Papay, Sean and Klinger, Roman. Are Humans as Brittle as Large Language Models?. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics. 2025
work page 2025
-
[54]
Do LLM s Exhibit Human-like Response Biases? A Case Study in Survey Design
Tjuatja, Lindia and Chen, Valerie and Wu, Tongshuang and Talwalkwar, Ameet and Neubig, Graham. Do LLM s Exhibit Human-like Response Biases? A Case Study in Survey Design. Transactions of the Association for Computational Linguistics. 2024. doi:10.1162/tacl_a_00685
-
[55]
Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Ichter, Brian and Xia, Fei and Chi, Ed H. and Le, Quoc V. and Zhou, Denny , title =. Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =. 2022 , isbn =
work page 2022
-
[56]
and Schaekermann, Mike and Lease, Matthew , title =
Pradhan, Vivek K. and Schaekermann, Mike and Lease, Matthew , title =. Frontiers in Artificial Intelligence , volume =. 2022 , pages =. doi:10.3389/frai.2022.828187 , url =
-
[57]
GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction , url =
Sainz, Oscar and Garc\'. GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction , url =. International Conference on Learning Representations , editor =
-
[58]
Fonseca, Marcio and Cohen, Shay. Can Large Language Models Follow Concept Annotation Guidelines? A Case Study on Scientific and Financial Domains. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.478
-
[59]
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , pages =
Kulesza, Todd and Amershi, Saleema and Caruana, Rich and Fisher, Danyel and Charles, Denis , title =. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , pages =. 2014 , isbn =. doi:10.1145/2556288.2557238 , abstract =
-
[60]
Cole, Charles , title =. J. Am. Soc. Inf. Sci. Technol. , month = jul, pages =. 2011 , issue_date =. doi:10.1002/asi.21541 , abstract =
-
[61]
Sina Fazelpour and Maria De-Arteaga , title =. Big Data & Society , volume =. 2022 , doi =. https://doi.org/10.1177/20539517221082027 , abstract =
-
[62]
Geva, Mor and Goldberg, Yoav and Berant, Jonathan. Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. d...
-
[63]
InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction , author=. 2023 , eprint=
work page 2023
-
[64]
International Conference on Learning Representations , year=
Mapping Language Models to Grounded Conceptual Spaces , author=. International Conference on Learning Representations , year=
-
[65]
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Min, Sewon and Lyu, Xinxi and Holtzman, Ari and Artetxe, Mikel and Lewis, Mike and Hajishirzi, Hannaneh and Zettlemoyer, Luke. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.759
-
[66]
PromptNER: A Prompting Method for Few-shot Named Entity Recognition via k Nearest Neighbor Search , author=. 2023 , eprint=
work page 2023
-
[67]
Troiano, Enrica and Oberl. Dimensional Modeling of Emotions in Text with Appraisal Theories: Corpus Creation, Annotation Reliability, and Prediction. Computational Linguistics. 2023. doi:10.1162/coli_a_00461
-
[68]
Evidence-based Fact-Checking of Health-related Claims
Sarrouti, Mourad and Ben Abacha, Asma and Mrabet, Yassine and Demner-Fushman, Dina. Evidence-based Fact-Checking of Health-related Claims. Findings of the Association for Computational Linguistics: EMNLP 2021. 2021. doi:10.18653/v1/2021.findings-emnlp.297
-
[69]
H ealth FC : Verifying Health Claims with Evidence-Based Medical Fact-Checking
Vladika, Juraj and Schneider, Phillip and Matthes, Florian. H ealth FC : Verifying Health Claims with Evidence-Based Medical Fact-Checking. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024
work page 2024
-
[70]
Proceedings of the AAAI Conference on Artificial Intelligence , author=
HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2021 , month=. doi:10.1609/aaai.v35i17.17745 , abstractNote=
-
[71]
The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink , author=. 2022 , eprint=
work page 2022
- [72]
-
[73]
Griffitt, Kira and Strassel, Stephanie. The Query of Everything: Developing Open-Domain, Natural-Language Queries for BOLT Information Retrieval. Proceedings of the Tenth International Conference on Language Resources and Evaluation ( LREC '16). 2016
work page 2016
-
[74]
i P r O p: Interactive Prompt Optimization for Large Language Models with a Human in the Loop
Li, Jiahui and Klinger, Roman. i P r O p: Interactive Prompt Optimization for Large Language Models with a Human in the Loop. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop). 2025. doi:10.18653/v1/2025.acl-srw.18
-
[75]
Can language models learn from explanations in context?
Lampinen, Andrew and Dasgupta, Ishita and Chan, Stephanie and Mathewson, Kory and Tessler, Mh and Creswell, Antonia and McClelland, James and Wang, Jane and Hill, Felix. Can language models learn from explanations in context?. Findings of the Association for Computational Linguistics: EMNLP 2022. 2022. doi:10.18653/v1/2022.findings-emnlp.38
-
[76]
Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks
Yugeswardeenoo, Dharunish and Zhu, Kevin and O ' Brien, Sean. Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop). 2024. doi:10.18653/v1/2024.acl-srw.45
- [77]
-
[78]
and Wallace, Eric and Singh, Sameer
Shin, Taylor and Razeghi, Yasaman and Logan IV, Robert L. and Wallace, Eric and Singh, Sameer. A uto P rompt: E liciting K nowledge from L anguage M odels with A utomatically G enerated P rompts. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.346
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.