Phm- bench: A domain-specific benchmarking framework for systematic evaluation of large models in prognostics and health management

Puyu Yang, Laifa Tao, Zijian Huang, Haifei Liu, Wenyan Cao, Hao Ji, Jianan Qiu, Qixuan Huang, Xuanyuan Su, Yuhang Xie, Jun Zhang, Shangyu Li, Chen Lu, Zhixuan Lian · 2025 · arXiv 2508.02490

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

baseline 1 method 1

citation-polarity summary

baseline 1 use method 1

representative citing papers

DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

DiagnosticIQ benchmark shows frontier LLMs perform similarly on standard rule-to-action tasks but lose substantial accuracy under distractor expansion and condition inversion, pointing to calibration as the key deployment issue.

PHMForge: Evaluating LLM Agents on Industrial Prognostics through MCP-Native, Algorithm-Grounded Tools

cs.AI · 2026-04-02 · unverdicted · novelty 7.0

PHMForge benchmark shows LLM agents achieve 80.8% pass@1 on prognostic tasks with native MCP tools but performance collapses from 100% to 20% when using text RAG instead.

citing papers explorer

Showing 2 of 2 citing papers.

DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules cs.AI · 2026-05-09 · unverdicted · none · ref 44
DiagnosticIQ benchmark shows frontier LLMs perform similarly on standard rule-to-action tasks but lose substantial accuracy under distractor expansion and condition inversion, pointing to calibration as the key deployment issue.
PHMForge: Evaluating LLM Agents on Industrial Prognostics through MCP-Native, Algorithm-Grounded Tools cs.AI · 2026-04-02 · unverdicted · none · ref 28
PHMForge benchmark shows LLM agents achieve 80.8% pass@1 on prognostic tasks with native MCP tools but performance collapses from 100% to 20% when using text RAG instead.

Phm- bench: A domain-specific benchmarking framework for systematic evaluation of large models in prognostics and health management

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer