Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments
read the original abstract
Multi-step reasoning is essential for large language models (LLMs), yet multilingual performance remains challenging. While Chain-of-Thought (CoT) prompting improves reasoning, it struggles with non-English languages due to the entanglement of reasoning and execution. Program-of-Thought (PoT) prompting separates reasoning from execution, offering a promising alternative but shifting the challenge to generating programs from non-English questions. We propose a framework to evaluate PoT by separating multilingual reasoning from code execution to examine (i) the impact of fine-tuning on question-reasoning alignment and (ii) how reasoning quality affects answer correctness. Our findings demonstrate that PoT fine-tuning substantially enhances multilingual reasoning, outperforming CoT fine-tuned models. We further demonstrate a strong correlation between reasoning quality (measured through code quality) and answer accuracy, highlighting its potential as a test-time performance improvement heuristic.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Sustainability Analysis of Prompt Strategies for SLM-based Automated Test Generation
Prompt strategies for SLM-based automated test generation vary widely in energy consumption and carbon emissions, with simpler strategies delivering competitive coverage at markedly lower environmental cost.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.