For each scenario, sample: - Which ewes con- ceive (Bernoulli, 85%) - Their gestation (G) - Number of lambs (Ls), apply mortality - Lactation length ( Ld)

Full Mark (Avg

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving

cs.AI · 2025-09-22 · unverdicted · novelty 7.0

EngiBench shows LLMs accuracy drops with task complexity, degrades under perturbations, and stays below human performance on open-ended engineering problems.

citing papers explorer

Showing 1 of 1 citing paper.

EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving cs.AI · 2025-09-22 · unverdicted · none · ref 71
EngiBench shows LLMs accuracy drops with task complexity, degrades under perturbations, and stays below human performance on open-ended engineering problems.

For each scenario, sample: - Which ewes con- ceive (Bernoulli, 85%) - Their gestation (G) - Number of lambs (Ls), apply mortality - Lactation length ( Ld)

fields

years

verdicts

representative citing papers

citing papers explorer