EngiBench shows LLMs accuracy drops with task complexity, degrades under perturbations, and stays below human performance on open-ended engineering problems.
Elecbench: a power dispatch evaluation benchmark for large language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2025 2representative citing papers
Hybrid pipeline using YOLO vision and ngspice verification raises circuit analysis accuracy from Gemini's 79.52% baseline to 97.59%, with similar gains on hand-drawn diagrams.
citing papers explorer
-
EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving
EngiBench shows LLMs accuracy drops with task complexity, degrades under perturbations, and stays below human performance on open-ended engineering problems.