Elecbench: a power dispatch evaluation benchmark for large language models

Xiyuan Zhou, Huan Zhao, Yuheng Cheng, Yuji Cao, Gaoqi Liang, Guolong Liu, Wenxuan Liu, Yan Xu, Junhua Zhao · 2024 · arXiv 2407.05365

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving

cs.AI · 2025-09-22 · unverdicted · novelty 7.0

EngiBench shows LLMs accuracy drops with task complexity, degrades under perturbations, and stays below human performance on open-ended engineering problems.

Enhancing Large Language Model-Based Systems for End-to-End Circuit Analysis Problem Solving

cs.CY · 2025-12-10 · conditional · novelty 5.0

Hybrid pipeline using YOLO vision and ngspice verification raises circuit analysis accuracy from Gemini's 79.52% baseline to 97.59%, with similar gains on hand-drawn diagrams.

citing papers explorer

Showing 1 of 1 citing paper after filters.

EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving cs.AI · 2025-09-22 · unverdicted · none · ref 54
EngiBench shows LLMs accuracy drops with task complexity, degrades under perturbations, and stays below human performance on open-ended engineering problems.

Elecbench: a power dispatch evaluation benchmark for large language models

fields

years

verdicts

representative citing papers

citing papers explorer