{"paper":{"title":"CAX-Agent: A Lightweight Agent Harness for Reliable APDL Automation","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A model-driven recovery policy inside a lightweight agent harness raises APDL automation completion rates above 92 percent.","cross_cats":["cs.CE"],"primary_cat":"cs.AI","authors_text":"Chenying Lin, Haiyan Qiang, Liang Yu, Ran Wang, Yichen Hai, Yi He","submitted_at":"2026-05-12T14:46:34Z","abstract_excerpt":"Large language models deployed for MAPDL finite-element simulation face practical reliability challenges: without structured execution control, tool encapsulation, and fault recovery, outputs may be inconsistent and task failures are common. The Agent Harness paradigm addresses this by inserting domain-specific orchestration middleware that manages tool lifecycles, workflow state, and recovery escalation. This paper presents the architecture of CAX-Agent, a lightweight agent harness purpose-built for MAPDL automation, and empirically evaluates one of its core components -- the recovery policy."},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Model_only achieves the best completion rate (0.9267), task score (3.59/4), total score (9.16/10), and zero-intervention rate (0.84), outperforming rule_only (0.7733, 3.17/4, 7.03/10, 0.00) and no_recovery (0.6933, 2.74/4, 5.60/10, 0.00) with large effect sizes (Cliff's delta = 0.81-0.87).","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The benchmark uses deliberately simple geometries to isolate recovery-policy effects, and the observed performance differences will hold when the same recovery ladder is applied to more complex real-world geometries and loading conditions.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"CAX-Agent is a three-layer agent harness for MAPDL automation whose model-driven recovery policy reaches 0.93 task completion and 0.84 zero-intervention rate on 50 simple structural benchmarks, outperforming rule-only and no-recovery baselines.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A model-driven recovery policy inside a lightweight agent harness raises APDL automation completion rates above 92 percent.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"ea9f9323b240dc106826de0880d5ac5132d398ffb7d51c7d820bdb64dbbf9a83"},"source":{"id":"2605.15218","kind":"arxiv","version":1},"verdict":{"id":"3e475f8f-3a97-4c74-9554-ad605f8fbef5","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T17:35:23.801661Z","strongest_claim":"Model_only achieves the best completion rate (0.9267), task score (3.59/4), total score (9.16/10), and zero-intervention rate (0.84), outperforming rule_only (0.7733, 3.17/4, 7.03/10, 0.00) and no_recovery (0.6933, 2.74/4, 5.60/10, 0.00) with large effect sizes (Cliff's delta = 0.81-0.87).","one_line_summary":"CAX-Agent is a three-layer agent harness for MAPDL automation whose model-driven recovery policy reaches 0.93 task completion and 0.84 zero-intervention rate on 50 simple structural benchmarks, outperforming rule-only and no-recovery baselines.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The benchmark uses deliberately simple geometries to isolate recovery-policy effects, and the observed performance differences will hold when the same recovery ladder is applied to more complex real-world geometries and loading conditions.","pith_extraction_headline":"A model-driven recovery policy inside a lightweight agent harness raises APDL automation completion rates above 92 percent."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.15218/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"claim_evidence","ran_at":"2026-05-19T22:41:58.372997Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T18:01:18.634046Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T17:50:44.232531Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T13:33:22.838325Z","status":"skipped","version":"1.0.0","findings_count":0}],"snapshot_sha256":"de625f5e90dda18b25d6e66bf361230bbb472354ed0c95bf509daad80425ada2"},"references":{"count":26,"sample":[{"doi":"","year":2017,"title":"Attention is all you need,","work_id":"a479d910-ec22-4c4f-8745-0e478756ccba","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2019,"title":"BERT: Pre- training of deep bidirectional transformers for language understanding,","work_id":"281d14ff-34d2-42df-9297-1358c352bfa1","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Language models are few-shot learners,","work_id":"bac81291-4816-4ff3-ac72-60203570d359","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"ReAct: Synergizing reasoning and acting in language models","work_id":"dcede5c1-a91b-43d9-a097-8083603cb625","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2026,"title":"From Agent Loops to Structured Graphs:A Scheduler-Theoretic Framework for LLM Agent Execution","work_id":"6171cc48-d73f-46d7-8202-375ba39c6d1b","ref_index":5,"cited_arxiv_id":"2604.11378","is_internal_anchor":true}],"resolved_work":26,"snapshot_sha256":"71285e62d6b854f79167296532d6f9c9af1ffadec12626211f4cafd43ecf66c9","internal_anchors":2},"formal_canon":{"evidence_count":2,"snapshot_sha256":"7d67b0ff6b6a2fcfe339b225ed386ef4adf1ac65a679d6261dfe5765b0df22f8"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}