Build-bench is the first architecture-aware benchmark that evaluates LLMs on repairing cross-ISA build failures via iterative tool-augmented reasoning, with the best model reaching 63.19% success.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SE 2years
2025 2verdicts
UNVERDICTED 2representative citing papers
Mixed-methods study of Ansible identifies common user challenges from large-scale forum data and interviews, recommending improvements in debugging support, language clarity, documentation, and performance for IaC tools.
citing papers explorer
-
Can Language Models Go Beyond Coding? Assessing the Capability of Language Models to Build Real-World Systems
Build-bench is the first architecture-aware benchmark that evaluates LLMs on repairing cross-ISA build failures via iterative tool-augmented reasoning, with the best model reaching 63.19% success.
-
The Ultimate Configuration Management Tool? Lessons from a Mixed Methods Study of Ansible's Challenges
Mixed-methods study of Ansible identifies common user challenges from large-scale forum data and interviews, recommending improvements in debugging support, language clarity, documentation, and performance for IaC tools.