Code LLMs generate substantially worse comments outside English, and no tested automatic metric or LLM judge reliably matches human assessment of those outputs.
Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
citing papers explorer
-
Evaluating Non-English Developer Support in Machine Learning for Software Engineering
Code LLMs generate substantially worse comments outside English, and no tested automatic metric or LLM judge reliably matches human assessment of those outputs.
- REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak