Evaluating Chinese Ambiguity Understanding in Large Language Models
Pith reviewed 2026-05-20 19:42 UTC · model grok-4.3
The pith
Large language models often fail to detect linguistic ambiguity in Chinese.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLMs struggle with ambiguity detection in Chinese. Analysis of Qwen3-32B CoT rationales reveals three common failure modes (ambiguity blindness, misattribution, and premature resolution). Instruction tuning induces overconfidence while base models better capture semantic diversity. Models exhibit a bias toward dominant interpretations. Uncertainty quantification with semantic entropy shows higher uncertainty for ambiguous sentences.
What carries the argument
Semi-automatic pipeline guided by Potential Ambiguity (PA) Theory to generate and label 5,712 Chinese sentences across 18 ambiguous structures as either ambiguous or unambiguous.
If this is right
- Chain-of-thought prompting raises accuracy on Chinese ambiguity detection compared with direct answers.
- Semantic entropy is measurably higher on ambiguous sentences than on unambiguous ones.
- Instruction tuning produces overconfidence and reduces the models' ability to represent multiple meanings.
- Base models preserve greater semantic diversity than their instruction-tuned versions.
- Models systematically favor the dominant interpretation when facing ambiguity.
Where Pith is reading between the lines
- Training procedures that reward explicit listing of alternative meanings could reduce the observed overconfidence in future Chinese-capable models.
- The same pipeline approach could generate comparable ambiguity test sets for other languages that share structural features with Chinese.
- Real-world tasks such as Chinese machine translation or legal text analysis may improve if models are explicitly trained to flag and resolve ambiguity.
- Comparing failure rates across model sizes and families on the same dataset would clarify which architectures handle Chinese ambiguity best.
Load-bearing premise
The semi-automatic pipeline guided by Potential Ambiguity Theory produces a dataset whose ambiguous and unambiguous labels accurately reflect genuine linguistic ambiguity in Chinese and are not artifacts of the generation process.
What would settle it
Independent human linguists labeling a random sample of CHA-Gen sentences and agreeing with the pipeline labels on fewer than 70 percent of cases would show the dataset does not capture real ambiguity.
Figures
read the original abstract
Linguistic ambiguity is critical to the robustness of Large Language Models (LLMs), yet existing research focuses mostly on English, with limited attention devoted to Chinese. Existing Chinese ambiguity datasets (e.g., CHAmbi) suffer from poor scalability. Guided by Potential Ambiguity (PA) Theory, we design a semi-automatic pipeline to construct CHA-Gen. It is the first PA Theory-grounded Chinese ambiguity dataset, which comprises 5,712 sentences (2,414 ambiguous, 3,298 unambiguous) across 18 potential ambiguous structures. Evaluating LLMs (e.g. Gemma 3, Qwen 2.5/3 series) via direct querying and machine translation, we find that LLMs struggle with ambiguity detection (improved by CoT prompting). Analysis of Qwen3-32B's CoT rationales reveals three common failure modes: ambiguity blindness, misattribution, and premature resolution. Uncertainty quantification with semantic entropy metric shows higher uncertainty for ambiguous sentences. Moreover, instruction tuning induces overconfidence, whereas Base models better capture semantic diversity. We further observe that models exhibit a bias toward dominant interpretations. Our work provides a scalable approach for Chinese ambiguity corpus and insights into LLMs' ambiguity handling, laying a foundation for enhancing Chinese ambiguity research in LLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces CHA-Gen, the first PA Theory-grounded Chinese ambiguity dataset containing 5,712 sentences (2,414 ambiguous, 3,298 unambiguous) across 18 structures, built via a semi-automatic pipeline. It evaluates LLMs (Gemma 3, Qwen 2.5/3 series) on ambiguity detection using direct prompting and machine translation, analyzes three failure modes in Qwen3-32B CoT rationales (ambiguity blindness, misattribution, premature resolution), reports higher semantic entropy on ambiguous items, and finds that instruction tuning produces overconfidence while base models better preserve semantic diversity, with an additional bias toward dominant interpretations.
Significance. If the dataset labels are shown to reflect genuine linguistic ambiguity rather than pipeline artifacts, the work addresses a clear gap in Chinese-focused LLM evaluation and supplies concrete failure-mode diagnostics plus a scalable construction method that could support larger corpora. The empirical comparison of base versus instruction-tuned models and the semantic-entropy uncertainty analysis are useful contributions that could guide robustness improvements in Chinese NLP.
major comments (1)
- [§3 and §4] §3 (Dataset Construction) and §4 (Evaluation): The central claims—that LLMs struggle with Chinese ambiguity detection, exhibit the three listed failure modes, show elevated semantic entropy on ambiguous items, and that instruction tuning induces overconfidence—rest on the accuracy of the CHA-Gen ambiguous/unambiguous labels. The manuscript describes a PA-Theory-guided semi-automatic pipeline but reports no independent human validation, inter-annotator agreement, or error analysis on the generated labels. Without such validation it remains possible that the 2,414/3,298 split reflects generation heuristics rather than Chinese linguistic reality, which would confound all reported performance numbers, rationale analyses, and base-vs-tuned comparisons.
minor comments (2)
- [Abstract] Abstract: The abstract states directional findings and failure modes but omits quantitative effect sizes, statistical tests, or explicit baseline comparisons, making it difficult to gauge the practical magnitude of the reported difficulties.
- [Results] Results section: Provide more detail on how the machine-translation evaluation protocol was implemented and on the exact computation of semantic entropy, including any hyperparameters or sampling settings.
Simulated Author's Rebuttal
We are grateful to the referee for their thorough review and valuable suggestions. The feedback has helped us identify areas where the manuscript can be improved. Below, we provide a point-by-point response to the major comment.
read point-by-point responses
-
Referee: [§3 and §4] §3 (Dataset Construction) and §4 (Evaluation): The central claims—that LLMs struggle with Chinese ambiguity detection, exhibit the three listed failure modes, show elevated semantic entropy on ambiguous items, and that instruction tuning induces overconfidence—rest on the accuracy of the CHA-Gen ambiguous/unambiguous labels. The manuscript describes a PA-Theory-guided semi-automatic pipeline but reports no independent human validation, inter-annotator agreement, or error analysis on the generated labels. Without such validation it remains possible that the 2,414/3,298 split reflects generation heuristics rather than Chinese linguistic reality, which would confound all reported performance numbers, rationale analyses, and base-vs-tuned comparisons.
Authors: We acknowledge the referee's concern regarding the lack of human validation for the CHA-Gen labels. The dataset construction follows a PA-Theory-guided semi-automatic pipeline, where ambiguous structures are derived from established linguistic theories on potential ambiguity in Chinese. However, to address this point directly, we will incorporate an independent human validation study in the revised manuscript. This will include recruiting native Chinese speakers to annotate a representative sample of the sentences, computing inter-annotator agreement (e.g., Cohen's kappa or Fleiss' kappa), and performing error analysis to identify any discrepancies between the pipeline labels and human judgments. We believe this addition will substantiate that the labels reflect genuine linguistic ambiguity and strengthen the validity of our empirical findings on LLM performance, failure modes, and the effects of instruction tuning. revision: yes
Circularity Check
No significant circularity: empirical evaluation against externally constructed dataset
full rationale
The paper is an empirical evaluation study that constructs the CHA-Gen dataset via a PA-Theory-guided semi-automatic pipeline and measures LLM performance, CoT failure modes, semantic entropy, and base-vs-tuned differences directly against those labels. No equations, derivations, or first-principles results are present that reduce any reported observation or claim to quantities defined by the authors' own fitted parameters or self-referential definitions. The central findings are observational measurements on an independently generated test set rather than predictions forced by construction from the same inputs; the dataset serves as an external benchmark for the evaluation, making the work self-contained against its own measurements.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Potential Ambiguity (PA) Theory supplies a reliable and complete set of 18 structures for identifying potential ambiguity in Chinese sentences.
Reference graph
Works this paper leans on
-
[9]
and Rao, Jun and Li, Bei and Ding, Liang and Chao, Lidia S
Ma, Xinyu and Liu, Xuebo and Wong, Derek F. and Rao, Jun and Li, Bei and Ding, Liang and Chao, Lidia S. and Tao, Dacheng and Zhang, Min. 3 AM : An Ambiguity-Aware Multi-Modal Machine Translation Dataset. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024
work page 2024
-
[15]
Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =
work page 2015
-
[16]
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =
Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
-
[17]
7th International Conference on Social Science and Higher Education (ICSSHE 2021) , pages=
Study on Chinese Semantic Content Based on Syntactic Differences Between Chinese and English , author=. 7th International Conference on Social Science and Higher Education (ICSSHE 2021) , pages=. 2021 , organization=
work page 2021
-
[18]
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models
Ashwin K. Vijayakumar and Michael Cogswell and Ramprasaath R. Selvaraju and Qing Sun and Stefan Lee and David J. Crandall and Dhruv Batra , title =. CoRR , volume =. 2016 , url =. 1610.02424 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[19]
Journal of Chinese Information Processing , number =
Zhiwei Feng , title =. Journal of Chinese Information Processing , number =. 1989 , issn =
work page 1989
-
[20]
Zhiwei Feng , title =. 1995
work page 1995
-
[21]
Journal of Chinese Information Processing , number =
Maosong Sun, Changning Huang , title =. Journal of Chinese Information Processing , number =. 1989 , issn =
work page 1989
-
[22]
Tom B. Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and Sandhini Agarwal and Ariel Herbert. Language Models are Few-Shot Learners , booktitle =. 2020 , url =
work page 2020
-
[23]
Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine , author=. 2023 , url=
work page 2023
-
[24]
Amr Hendy and Mohamed Abdelrehim and Amr Sharaf and Vikas Raunak and Mohamed Gabr and Hitokazu Matsushita and Young Jin Kim and Mohamed Afify and Hany Hassan Awadalla , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2302.09210 , eprinttype =. 2302.09210 , timestamp =
-
[25]
Applied Psycholinguistics , volume=
Translation ambiguity in and out of context , author=. Applied Psycholinguistics , volume=. 2011 , publisher=
work page 2011
-
[26]
An Yang and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoran Wei and Huan Lin and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and Jingren Zhou and Junyang Lin and Kai Dang and Keming Lu and Keqin Bao and Kexin Yang and Le Yu and Mei Li and Mi...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.15115 2024
-
[27]
The Twelfth International Conference on Learning Representations,
Haoran Xu and Young Jin Kim and Amr Sharaf and Hany Hassan Awadalla , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =
work page 2024
-
[28]
Treviso and Nuno Miguel Guerreiro and Chrysoula Zerva and Ana C
Ricardo Rei and Marcos V. Treviso and Nuno Miguel Guerreiro and Chrysoula Zerva and Ana C. Farinha and Christine Maroti and Jos. CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task , booktitle =. 2022 , url =
work page 2022
-
[29]
Mixture Models for Diverse Machine Translation: Tricks of the Trade , booktitle =
Tianxiao Shen and Myle Ott and Michael Auli and Marc'Aurelio Ranzato , editor =. Mixture Models for Diverse Machine Translation: Tricks of the Trade , booktitle =. 2019 , url =
work page 2019
-
[30]
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
Zhang, Biao and Williams, Philip and Titov, Ivan and Sennrich, Rico. Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.148
-
[31]
Determinantal point processes for machine learning , author=. Foundations and Trends. 2012 , publisher=
work page 2012
-
[32]
Laming Chen and Guoxin Zhang and Eric Zhou , editor =. Fast Greedy. Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr. 2018 , url =
work page 2018
-
[33]
From Contract Drafting to Software Specification: Linguistic Sources of Ambiguity , volume =
Kamsties, Erik and Science, Ph and Krieger, Michael and Mathematics, Ph and Berry, M , year =. From Contract Drafting to Software Specification: Linguistic Sources of Ambiguity , volume =
-
[35]
and Fellbaum, Christiane and Gross, Derek and Miller, Katherine , year =
Miller, George and Beckwith, R. and Fellbaum, Christiane and Gross, Derek and Miller, Katherine , year =. Introduction to WordNet: An On-line Lexical Database* , journal =
-
[36]
Roberto Navigli and Simone Paolo Ponzetto , keywords =. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , journal =. 2012 , issn =. doi:https://doi.org/10.1016/j.artint.2012.07.001 , url =
-
[37]
Mohammadmostafa Rostamkhani and Baktash Ansari and Hoorieh Sabzevari and Farzan Rahmani and Sauleh Eetemadi , title =. 2025 , url =
work page 2025
-
[39]
The Eleventh International Conference on Learning Representations,
Lorenz Kuhn and Yarin Gal and Sebastian Farquhar , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =
work page 2023
-
[41]
The Twelfth International Conference on Learning Representations,
Chujie Zheng and Hao Zhou and Fandong Meng and Jie Zhou and Minlie Huang , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =
work page 2024
-
[42]
The communicative function of ambiguity in language , author=. Cognition , volume=. 2012 , publisher=
work page 2012
-
[43]
Aleksandar Bajceta and Miguel Leon and Wasif Afzal and Pernilla Lindberg and Markus Bohlin , editor =. Using. Joint Proceedings of. 2022 , url =
work page 2022
-
[47]
o rr, J. , editor Ruiz, M. , editor Stegh \
author Bajceta, A. , author Leon, M. , author Afzal, W. , author Lindberg, P. , author Bohlin, M. , year 2022 . title Using NLP tools to detect ambiguities in system requirements - A comparison study , in: editor Fischbach, J. , editor Condori - Fern \' a ndez, N. , editor D \" o rr, J. , editor Ruiz, M. , editor Stegh \" o fer, J. , editor Pasquale, L. ,...
work page 2022
-
[48]
author Bhaskar, A. , author Tomar, T. , author Sathe, A. , author Sarawagi, S. , year 2023 . title Benchmarking and improving text-to- SQL generation under ambiguity , in: editor Bouamor, H. , editor Pino, J. , editor Bali, K. (Eds.), booktitle Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , publisher Association f...
-
[49]
author Brown, T.B. , author Mann, B. , author Ryder, N. , author Subbiah, M. , author Kaplan, J. , author Dhariwal, P. , author Neelakantan, A. , author Shyam, P. , author Sastry, G. , author Askell, A. , author Agarwal, S. , author Herbert - Voss, A. , author Krueger, G. , author Henighan, T. , author Child, R. , author Ramesh, A. , author Ziegler, D.M. ...
work page 2020
-
[50]
author Chen, X. , author Wang, C. , author Xue, Y. , author Zhang, N. , author Yang, X. , author Li, Q. , author Shen, Y. , author Liang, L. , author Gu, J. , author Chen, H. , year 2024 . title Unified hallucination detection for multimodal large language models , in: editor Ku, L.W. , editor Martins, A. , editor Srikumar, V. (Eds.), booktitle Proceeding...
-
[51]
author Feng, Z. , year 1989 . title Structural description of chinese scientific terms and potential ambiguity . journal Journal of Chinese Information Processing , pages 1--16
work page 1989
-
[52]
author Feng, Z. , year 1995 . title On potential nature of ambiguous construction . journal Journal of Chinese Information Processing , pages 14--24
work page 1995
-
[53]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
author Guo, D. , author Yang, D. , author Zhang, H. , author Song, J. , author Zhang, R. , author Xu, R. , author Zhu, Q. , author Ma, S. , author Wang, P. , author Bi, X. , et al., year 2025 . title Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning . journal arXiv preprint arXiv:2501.12948
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[54]
author He, J. , author Wang, T. , author Xiong, D. , author Liu, Q. , year 2020 . title The box is in the pen: Evaluating commonsense reasoning in neural machine translation , in: editor Cohn, T. , editor He, Y. , editor Liu, Y. (Eds.), booktitle Findings of the Association for Computational Linguistics: EMNLP 2020 , publisher Association for Computationa...
-
[55]
author Itzhak, I. , author Stanovsky, G. , author Rosenfeld, N. , author Belinkov, Y. , year 2024 . title Instructed to bias: Instruction-tuned language models exhibit emergent cognitive bias . journal Transactions of the Association for Computational Linguistics volume 12 , pages 771--785 . https://aclanthology.org/2024.tacl-1.43/, :10.1162/tacl_a_00673
-
[56]
author Kim, H.J. , author Kim, Y. , author Park, C. , author Kim, J. , author Park, C. , author Yoo, K.M. , author Lee, S.g. , author Kim, T. , year 2024 . title Aligning language models to explicitly handle ambiguity , in: editor Al-Onaizan, Y. , editor Bansal, M. , editor Chen, Y.N. (Eds.), booktitle Proceedings of the 2024 Conference on Empirical Metho...
-
[57]
author Kuhn, L. , author Gal, Y. , author Farquhar, S. , year 2023 . title Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation , in: booktitle The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 , publisher OpenReview.net . https://openreview.net/for...
work page 2023
-
[58]
author Li, F. , year 2021 . title Study on chinese semantic content based on syntactic differences between chinese and english , in: booktitle 7th International Conference on Social Science and Higher Education (ICSSHE 2021) , organization Atlantis Press . pp. pages 542--545
work page 2021
-
[59]
author Lin, Y.T. , author Chen, Y.N. , year 2023 . title LLM -eval: Unified multi-dimensional automatic evaluation for open-domain conversations with large language models , in: editor Chen, Y.N. , editor Rastogi, A. (Eds.), booktitle Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023) , publisher Association for Computational L...
-
[60]
author Liu, A. , author Feng, B. , author Xue, B. , author Wang, B. , author Wu, B. , author Lu, C. , author Zhao, C. , author Deng, C. , author Zhang, C. , author Ruan, C. , et al., year 2024 . title Deepseek-v3 technical report . journal arXiv preprint arXiv:2412.19437
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[61]
author Liu, A. , author Wu, Z. , author Michael, J. , author Suhr, A. , author West, P. , author Koller, A. , author Swayamdipta, S. , author Smith, N. , author Choi, Y. , year 2023 . title We ' re afraid language models aren ' t modeling ambiguity , in: editor Bouamor, H. , editor Pino, J. , editor Bali, K. (Eds.), booktitle Proceedings of the 2023 Confe...
-
[62]
author Ma, X. , author Liu, X. , author Wong, D.F. , author Rao, J. , author Li, B. , author Ding, L. , author Chao, L.S. , author Tao, D. , author Zhang, M. , year 2024 . title 3 AM : An ambiguity-aware multi-modal machine translation dataset , in: editor Calzolari, N. , editor Kan, M.Y. , editor Hoste, V. , editor Lenci, A. , editor Sakti, S. , editor X...
work page 2024
-
[63]
author Mehrabi, N. , author Goyal, P. , author Verma, A. , author Dhamala, J. , author Kumar, V. , author Hu, Q. , author Chang, K.W. , author Zemel, R. , author Galstyan, A. , author Gupta, R. , year 2023 . title Resolving ambiguities in text-to-image generative models , in: editor Rogers, A. , editor Boyd-Graber, J. , editor Okazaki, N. (Eds.), booktitl...
-
[64]
author Mehrparvar, B. , author Pezzelle, S. , year 2024 . title Detecting and translating language ambiguity with multilingual LLM s , in: editor S \"a lev \"a , J. , editor Owodunni, A. (Eds.), booktitle Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024) , publisher Association for Computational Linguistics , address Mi...
-
[65]
author Min, S. , author Michael, J. , author Hajishirzi, H. , author Zettlemoyer, L. , year 2020 . title A mbig QA : Answering ambiguous open-domain questions , in: editor Webber, B. , editor Cohn, T. , editor He, Y. , editor Liu, Y. (Eds.), booktitle Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , publishe...
-
[66]
, author Garc \' a-Sierra, \'O
author Ortega-Mart \' n, M. , author Garc \' a-Sierra, \'O . , author Ardoiz, A. , author \'A lvarez, J. , author Armenteros, J.C. , author Alonso, A. , year 2023 . title Linguistic ambiguity analysis in chatgpt . journal arXiv preprint arXiv:2302.06426
-
[67]
author Piantadosi, S.T. , author Tily, H. , author Gibson, E. , year 2012 . title The communicative function of ambiguity in language . journal Cognition volume 122 , pages 280--291
work page 2012
-
[68]
author Rostamkhani, M. , author Ansari, B. , author Sabzevari, H. , author Rahmani, F. , author Eetemadi, S. , year 2025 . title Illusory VQA: benchmarking and enhancing multimodal models on visual illusions , in: booktitle IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2025, Nashville, TN, USA, June 11-15, 2025 ,...
work page 2025
-
[69]
Gemini: A Family of Highly Capable Multimodal Models
author Team, G. , author Anil, R. , author Borgeaud, S. , author Alayrac, J.B. , author Yu, J. , author Soricut, R. , author Schalkwyk, J. , author Dai, A.M. , author Hauth, A. , author Millican, K. , et al., year 2023 . title Gemini: a family of highly capable multimodal models . journal arXiv preprint arXiv:2312.11805
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[70]
author Wang, B. , author Gao, Y. , author Li, Z. , author Lou, J.G. , year 2023 . title Know what I don ' t know: Handling ambiguous and unknown questions for text-to- SQL , in: editor Rogers, A. , editor Boyd-Graber, J. , editor Okazaki, N. (Eds.), booktitle Findings of the Association for Computational Linguistics: ACL 2023 , publisher Association for C...
-
[71]
author Wang, X. , author Kang, Z. , author Zhai, W. , author Lou, X. , author Lai, Y. , author Wang, Z. , author Wang, Y. , author Huang, K. , author Wang, Y. , author Li, P. , author Liu, Y. , year 2025 . title MUCAR : Benchmarking multilingual cross-modal ambiguity resolution for multimodal large language models , in: editor Christodoulopoulos, C. , edi...
-
[72]
author Wildenburg, F. , author Hanna, M. , author Pezzelle, S. , year 2024 . title Do pre-trained language models detect and understand semantic underspecification? ask the DUST ! , in: editor Ku, L.W. , editor Martins, A. , editor Srikumar, V. (Eds.), booktitle Findings of the Association for Computational Linguistics: ACL 2024 , publisher Association fo...
-
[73]
Uncovering the Fragility of Trustworthy LLMs through Chinese Textual Ambiguity
author Wu, X. , author Li, H. , author Liu, H. , author Ji, X. , author Li, R. , author Chen, Y. , author Zhang, Y. , year 2025 . title Uncovering the fragility of trustworthy llms through chinese textual ambiguity . journal CoRR volume abs/2507.23121 . https://doi.org/10.48550/arXiv.2507.23121, :10.48550/ARXIV.2507.23121, arXiv:2507.23121 http://arxiv.or...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.23121 2025
-
[74]
author Zhang, Q. , author Cai, S. , author Zhao, J. , author Pechenizkiy, M. , author Fang, M. , year 2024 . title CHA mbi: A new benchmark on C hinese ambiguity challenges for large language models , in: editor Al-Onaizan, Y. , editor Bansal, M. , editor Chen, Y.N. (Eds.), booktitle Findings of the Association for Computational Linguistics: EMNLP 2024 , ...
-
[75]
author Zheng, C. , author Zhou, H. , author Meng, F. , author Zhou, J. , author Huang, M. , year 2024 . title Large language models are not robust multiple choice selectors , in: booktitle The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 , publisher OpenReview.net . https://openreview.net/forum?i...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.