Assetopsbench: Benchmarking ai agents for task automation in industrial asset operations and maintenance, 2025

Dhaval Patel, Shuxin Lin, James Rayfield, Nianjun Zhou, Roman Vaculin, Natalia Martinez, Fearghal O’donncha, Jayant Kalagnanam · 2025 · arXiv 2506.03828

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 3 extension 1

citation-polarity summary

background 3 extend 1

representative citing papers

DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

DiagnosticIQ benchmark shows frontier LLMs perform similarly on standard rule-to-action tasks but lose substantial accuracy under distractor expansion and condition inversion, pointing to calibration as the key deployment issue.

PHMForge: Evaluating LLM Agents on Industrial Prognostics through MCP-Native, Algorithm-Grounded Tools

cs.AI · 2026-04-02 · unverdicted · novelty 7.0

PHMForge benchmark shows LLM agents achieve 80.8% pass@1 on prognostic tasks with native MCP tools but performance collapses from 100% to 20% when using text RAG instead.

IndustryBench: Probing the Industrial Knowledge Boundaries of LLMs

cs.AI · 2026-05-11 · conditional · novelty 6.0 · 3 refs

IndustryBench is a standards-grounded Chinese benchmark that exposes LLMs' persistent gaps in industrial terminology, safety compliance, and parameter accuracy, with safety checks reshuffling model rankings.

SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks

cs.AI · 2026-05-13 · conditional · novelty 5.0

SPIN enforces DAG-valid plans and prefix-based stopping for LLM agents, cutting executed tasks from 1061 to 623 and tool calls from 11.81 to 6.82 per run on AssetOpsBench while raising success from 0.638 to 0.706.

Results and Retrospective Analysis of the CODS 2025 AssetOpsBench Challenge

cs.AI · 2026-05-08 · unverdicted · novelty 4.0

Retrospective of a 2025 AI agent competition finds public-private score misalignment, an inert composite component, multi-account registrations, and guardrail fixes outperforming architectural novelty.

Foundation-Model-Based Agents in Industrial Automation: Purposes, Capabilities, and Open Challenges

cs.AI · 2026-05-04 · unverdicted · novelty 4.0

A literature survey finds foundation-model agents in industry are 75% at prototype stages with gains in human interaction and uncertainty handling but deficits in negotiation, plus limitations like hallucinations and latency.

citing papers explorer

Showing 6 of 6 citing papers.

DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules cs.AI · 2026-05-09 · unverdicted · none · ref 28
DiagnosticIQ benchmark shows frontier LLMs perform similarly on standard rule-to-action tasks but lose substantial accuracy under distractor expansion and condition inversion, pointing to calibration as the key deployment issue.
PHMForge: Evaluating LLM Agents on Industrial Prognostics through MCP-Native, Algorithm-Grounded Tools cs.AI · 2026-04-02 · unverdicted · none · ref 18
PHMForge benchmark shows LLM agents achieve 80.8% pass@1 on prognostic tasks with native MCP tools but performance collapses from 100% to 20% when using text RAG instead.
IndustryBench: Probing the Industrial Knowledge Boundaries of LLMs cs.AI · 2026-05-11 · conditional · none · ref 2 · 3 links
IndustryBench is a standards-grounded Chinese benchmark that exposes LLMs' persistent gaps in industrial terminology, safety compliance, and parameter accuracy, with safety checks reshuffling model rankings.
SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks cs.AI · 2026-05-13 · conditional · none · ref 12
SPIN enforces DAG-valid plans and prefix-based stopping for LLM agents, cutting executed tasks from 1061 to 623 and tool calls from 11.81 to 6.82 per run on AssetOpsBench while raising success from 0.638 to 0.706.
Results and Retrospective Analysis of the CODS 2025 AssetOpsBench Challenge cs.AI · 2026-05-08 · unverdicted · none · ref 22
Retrospective of a 2025 AI agent competition finds public-private score misalignment, an inert composite component, multi-account registrations, and guardrail fixes outperforming architectural novelty.
Foundation-Model-Based Agents in Industrial Automation: Purposes, Capabilities, and Open Challenges cs.AI · 2026-05-04 · unverdicted · none · ref 50
A literature survey finds foundation-model agents in industry are 75% at prototype stages with gains in human interaction and uncertainty handling but deficits in negotiation, plus limitations like hallucinations and latency.

Assetopsbench: Benchmarking ai agents for task automation in industrial asset operations and maintenance, 2025

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer