ChatGPT outperforms zero-shot LLMs on most tasks and improves with interaction but scores only 63.41 percent on reasoning categories and generates extrinsic hallucinations from its training data.
In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7890–7900
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2023 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
ChatGPT outperforms zero-shot LLMs on most tasks and improves with interaction but scores only 63.41 percent on reasoning categories and generates extrinsic hallucinations from its training data.