pith. sign in
Pith Number

pith:PUPEL7FF

pith:2026:PUPEL7FFCWNSFQ5E5UU7IHT2W3
not attested not anchored not stored refs resolved

SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks

Dhaval Patel, Yusuke Ozaki

SPIN wraps LLM planners with DAG validation and prefix execution control to produce shorter, more reliable industrial workflows.

arxiv:2605.14051 v1 · 2026-05-13 · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{PUPEL7FFCWNSFQ5E5UU7IHT2W3}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

On AssetOpsBench across 261 scenarios, SPIN reduces executed tasks from 1061 to 623 and improves Accomplished from 0.638 to 0.706, while reducing tool calls from 11.81 to 6.82 per run. On MCP Bench it improves planning, grounding, and dependency scores for GPT OSS1 and Llama 4 Maverick.

C2weakest assumption

That LLM-based validation and repair prompting will consistently produce executable DAG plans without introducing new structural errors or missing invalid cases, and that the LLM can accurately judge when a prefix is sufficient.

C3one line summary

SPIN enforces DAG-valid plans and prefix-based stopping for LLM agents, cutting executed tasks from 1061 to 623 and tool calls from 11.81 to 6.82 per run on AssetOpsBench while raising success from 0.638 to 0.706.

References

21 extracted · 21 resolved · 2 Pith anchors

[1] Automating thought of search: A journey towards soundness and completeness, 2024 2024
[2] Chang and Longling Geng 2025 · doi:10.14778/3750601.3750611
[3] Assetopsbench – codabench competition 2025
[4] Grammar-constrained decoding for structured NLP tasks without finetuning 2023 · doi:10.18653/v1/2023.emnlp-main.674
[5] Jsonschemabench: A rigorous benchmark of structured outputs for language models 2025

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-17T23:39:12.640635Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

7d1e45fca5159b22c3a4ed29f41e7ab6c26d7c97542fef59f0c8610628c62b9b

Aliases

arxiv: 2605.14051 · arxiv_version: 2605.14051v1 · doi: 10.48550/arxiv.2605.14051 · pith_short_12: PUPEL7FFCWNS · pith_short_16: PUPEL7FFCWNSFQ5E · pith_short_8: PUPEL7FF
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/PUPEL7FFCWNSFQ5E5UU7IHT2W3 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7d1e45fca5159b22c3a4ed29f41e7ab6c26d7c97542fef59f0c8610628c62b9b
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "2f06b8f498ca5d763c5da01469a189b29e97e7e93a8dff3937c8bf20e93635a8",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2026-05-13T19:12:24Z",
    "title_canon_sha256": "9851e0c3639448ad30710ec5fb12e2194b63b7d017dfa434d613e41cd263e002"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14051",
    "kind": "arxiv",
    "version": 1
  }
}