Build-bench is the first architecture-aware benchmark that evaluates LLMs on repairing cross-ISA build failures via iterative tool-augmented reasoning, with the best model reaching 63.19% success.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2025 2representative citing papers
An exploit-heavy multi-agent LLM system with error-fixing agents delivers 2.88x average speedup over PyTorch Eager and 1.85x over torch.compile on H100 GPUs across KernelBench tasks.
citing papers explorer
-
Can Language Models Go Beyond Coding? Assessing the Capability of Language Models to Build Real-World Systems
Build-bench is the first architecture-aware benchmark that evaluates LLMs on repairing cross-ISA build failures via iterative tool-augmented reasoning, with the best model reaching 63.19% success.
-
Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems
An exploit-heavy multi-agent LLM system with error-fixing agents delivers 2.88x average speedup over PyTorch Eager and 1.85x over torch.compile on H100 GPUs across KernelBench tasks.