Planning Task Shielding: Detecting and Repairing Flaws in Planning Tasks through Turning them Unsolvable

Alberto Pozanco; Daniel Borrajo; Marianela Morales; Pietro Totis

arxiv: 2604.07042 · v2 · submitted 2026-04-08 · 💻 cs.AI

Planning Task Shielding: Detecting and Repairing Flaws in Planning Tasks through Turning them Unsolvable

Alberto Pozanco , Marianela Morales , Pietro Totis , Daniel Borrajo This is my paper

Pith reviewed 2026-05-10 17:13 UTC · model grok-4.3

classification 💻 cs.AI

keywords planning task shieldingunsolvable planning tasksflaw detectionminimal action modificationAI planningsafety propertiesallmin algorithmtask repair

0 comments

The pith

Flaws in planning tasks can be detected by planning to bad states and repaired by minimally modifying actions to make the task unsolvable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces planning task shielding as a dual use of goal specifications: one to reach desired states and another to identify states that should never be reached. A planner finds traces leading to a flawed state, exposing the problem. The repair then applies minimal changes to action definitions so that no plan can reach the flaw, rendering the overall task unsolvable for that bad goal. This matters because many automated systems need guarantees against errors as much as they need goal achievement, and the method reuses standard planners for both verification and correction. The authors present an optimal algorithm called allmin for the repair step and test it on tasks of growing size.

Core claim

Planning task shielding treats a property that should never hold as a goal to let a planner discover traces to flawed states. The allmin algorithm then solves the shielding problem by computing the smallest set of modifications to the original actions that make the planning task unsolvable with respect to the flaw, while preserving as much of the intended behavior as possible.

What carries the argument

The allmin algorithm, which computes minimal modifications to the original actions to render a given planning task unsolvable for a specified bad goal.

If this is right

The same planner can be used both to achieve goals and to verify that bad states remain unreachable.
Minimal action edits fix flaws without requiring a complete redesign of the planning model.
Shielded tasks stay solvable for proper goals while blocking the identified flaws.
The method applies to planning tasks of increasing size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The idea of turning unsolvability into a safety feature could transfer to other formal verification settings where proving unreachability is the goal.
Iterative application of shielding might serve as an automated debugging loop for refining planning models against edge cases.
Integration into planning software could let users supply bad-state goals and receive suggested action repairs automatically.

Load-bearing premise

Any flaw can be detected by planning to a bad state and then repaired by some minimal set of action modifications that leaves the task solvable for its intended goals.

What would settle it

A planning task with a flaw for which no minimal action modification exists that makes the bad goal unreachable while the task remains solvable for its original goals.

Figures

Figures reproduced from arXiv: 2604.07042 by Alberto Pozanco, Daniel Borrajo, Marianela Morales, Pietro Totis.

**Figure 1.** Figure 1: ALLMIN100 succeeds on more instances, with a better balance between available information over the plans and the time dedicated to plan computation. ing that ALLMIN effectively identifies actions shared across multiple plans and modifies them so that several plans become invalid simultaneously. The execution time of ALLMIN increases exponentially as the size of the planning task and the number of plans th… view at source ↗

**Figure 2.** Figure 2: Execution time split into the time to compute the set of plans with [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

Most research in planning focuses on generating a plan to achieve a desired set of goals. However, a goal specification can also be used to encode a property that should never hold, allowing a planner to identify a trace that would reach a flawed state. In such cases, the objective may shift to modifying the planning task to ensure that the flawed state is never reached-in other words, to make the planning task unsolvable. In this paper we introduce planning task shielding: the problem of detecting and repairing flaws in planning tasks. We propose $allmin$, an optimal algorithm that solves these tasks by minimally modifying the original actions to render the planning task unsolvable. We empirically evaluate the performance of $allmin$ in shielding planning tasks of increasing size, showing how it can effectively shield the system by turning the planning task unsolvable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames planning task shielding as a new problem and gives allmin as an optimal way to minimally edit actions and block bad states, but the edits risk breaking valid plans without clear checks.

read the letter

The main thing to know is that this work turns flaw detection into a planning problem itself: run a planner to find paths to undesired states, then tweak the actions just enough to make those paths impossible. allmin is their algorithm for doing the tweaks optimally. They test it on tasks that grow in size and report it works in practice for shielding the system. That framing is new and the empirical scaling is a reasonable start for applied planning work. The detection side is straightforward since it reuses existing planners. The repair side tries to keep changes small so the original task stays mostly intact. That part is where the soft spots show up. The abstract claims the modifications are minimal and preserve intended behavior, but it gives no formal definition of minimality, no measure of edit distance or action impact, and no argument that good plans survive the changes. If the smallest fix that blocks a flaw also disables a legitimate goal, the repair just trades one problem for another. There is no proof of optimality and the experiments on larger tasks do not check whether the shielded versions still allow the original valid plans. The stress-test concern about side effects holds based on what is shown. This is for people in AI planning who build or verify models and want a targeted safety step. A reader working on robust planning or model repair could pick up the problem statement and try extending the algorithm. I would send it to peer review. The new formulation and the basic empirical results are worth referee time even if the preservation guarantees need more work in revision.

Referee Report

3 major / 2 minor

Summary. The paper introduces planning task shielding as the problem of detecting flaws in planning tasks (via plans reaching bad states) and repairing them by making the task unsolvable. It proposes the allmin algorithm, claimed to be optimal, which minimally modifies original actions to achieve this, and reports an empirical evaluation on planning tasks of increasing size.

Significance. If allmin is shown to be optimal with respect to a well-defined minimality metric and the modifications provably preserve intended plans while blocking flaws, the approach could offer a principled method for safety verification and repair in automated planning, particularly for domains where goal specifications encode forbidden properties. The empirical component on larger instances suggests potential practicality, though formal guarantees would strengthen its contribution.

major comments (3)

[Abstract] Abstract: the claim that allmin is an 'optimal algorithm' for minimally modifying actions is load-bearing but unsupported by any proof, complexity argument, or formal optimality criterion (e.g., with respect to number of edits, precondition changes, or action count) in the manuscript description.
[Abstract / Problem Definition] The central repair claim requires that modifications render bad states unreachable while leaving original intended plans intact, yet no formal definition of 'intended behavior,' no measure of minimality, and no argument that such modifications are always feasible without disabling valid plans appear in the provided text.
[Empirical Evaluation] Evaluation: the empirical results are summarized only as 'effectively shield the system' on 'increasing size' tasks, with no reported metrics, instance sizes, runtimes, success rates, or baselines, preventing assessment of whether the optimality claim holds in practice.

minor comments (2)

[Abstract] The notation $allmin$ should be typeset consistently (e.g., as texttt{allmin}) throughout.
[Introduction] Clarify early how 'flawed states' are specified via goals and distinguish them from the original planning goals.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment in turn below, clarifying our claims where possible and indicating revisions that will be incorporated to strengthen the presentation.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that allmin is an 'optimal algorithm' for minimally modifying actions is load-bearing but unsupported by any proof, complexity argument, or formal optimality criterion (e.g., with respect to number of edits, precondition changes, or action count) in the manuscript description.

Authors: We agree that the abstract's optimality claim requires explicit support. In the manuscript, optimality is defined with respect to the smallest number of action modifications (changes to preconditions or effects) that render all paths to bad states unreachable. The allmin algorithm enumerates candidate minimal modification sets in order of increasing size and returns the first that succeeds, which by construction yields an optimal solution under this metric. However, we concede that a dedicated formal proof and complexity discussion are not present in the current text. We will add a theorem establishing optimality together with a brief complexity argument in the revised version. revision: yes
Referee: [Abstract / Problem Definition] The central repair claim requires that modifications render bad states unreachable while leaving original intended plans intact, yet no formal definition of 'intended behavior,' no measure of minimality, and no argument that such modifications are always feasible without disabling valid plans appear in the provided text.

Authors: The manuscript defines intended behavior as the set of plans that reach the goal without visiting any bad state. Minimality is measured by the cardinality of the modification set applied to the original action set. We maintain that targeted modifications (those that only affect transitions leading to bad states) preserve all originally valid plans, but we acknowledge that these notions are introduced informally and lack a formal statement or proof of plan preservation. We will insert precise definitions and a short proof that the returned modification set leaves all good plans intact in the revision. revision: yes
Referee: [Empirical Evaluation] Evaluation: the empirical results are summarized only as 'effectively shield the system' on 'increasing size' tasks, with no reported metrics, instance sizes, runtimes, success rates, or baselines, preventing assessment of whether the optimality claim holds in practice.

Authors: We accept that the current empirical section is summarized at too high a level. The experiments were performed on a suite of planning tasks whose size (number of actions and reachable states) was systematically increased, and allmin succeeded in producing unsolvable tasks in every case. To allow proper evaluation of the optimality claim, we will expand the section with concrete instance sizes, runtimes, the number of modifications returned, success rates, and a simple baseline (random minimal modification) for comparison. revision: yes

Circularity Check

0 steps flagged

No circularity: new problem definition and algorithmic proposal

full rationale

The paper introduces planning task shielding as a novel problem (detecting flaws via plans to bad states, then repairing by making the task unsolvable) and defines allmin as an optimal algorithm that performs minimal action modifications. This is a constructive algorithmic contribution with an empirical evaluation on tasks of increasing size; no derivation reduces a claimed result to its own inputs by construction, no parameters are fitted then relabeled as predictions, and no self-citations or uniqueness theorems bear the central load. The approach is self-contained as a definition plus solver plus experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract relies on standard definitions of planning tasks but introduces no new free parameters, axioms beyond domain norms, or invented entities.

axioms (1)

domain assumption Planning tasks are defined by actions with preconditions and effects, an initial state, and goals.
Implicit in the description of modifying actions and rendering tasks unsolvable.

pith-pipeline@v0.9.0 · 5447 in / 1144 out tokens · 46263 ms · 2026-05-10T17:13:27.829890+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Counterfactual Reasoning in Automated Planning
cs.AI 2026-05 unverdicted novelty 4.0

A survey categorizes existing work on counterfactual reasoning in automated planning by changed elements, timing of reasoning, reasons for changes, and methods used.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages · cited by 1 Pith paper

[1]

InProceedings of the International Conference on Automated Planning and Scheduling, volume 27, 88–97

Unsolv- ability certificates for classical planning. InProceedings of the International Conference on Automated Planning and Scheduling, volume 27, 88–97. Ghallab, M.; Nau, D.; and Traverso, P. 2004.Automated Planning: theory and practice. Elsevier. Gragera, A.; Fuentetaja, R.; Garc ´ıa-Olaya, ´A.; and Fern´andez, F

work page 2004
[2]

https://www.highs.dev

HiGHS. https://www.highs.dev. Ac- cessed: 17/02/2022. Haslum, P.; Lipovetzky, N.; Magazzeni, D.; Muise, C.; Brachman, R.; Rossi, F.; and Stone, P. 2019.An introduc- tion to the planning domain definition language, volume

work page 2022
[3]

Proving Security of Cryptographic Protocols using Automated Planning.FinPlan 2021,

work page 2021
[4]

In Conitzer, V .; and Sha, F., eds.,Proceed- ings of the Thirty-Fourth AAAI Conference on Artificial In- telligence (AAAI 2020), 9967–9974

Symbolic Top-k Planning. In Conitzer, V .; and Sha, F., eds.,Proceed- ings of the Thirty-Fourth AAAI Conference on Artificial In- telligence (AAAI 2020), 9967–9974. AAAI Press. St˚ahlberg, S.; Franc`es, G.; and Seipp, J

work page 2020
[5]

In The Thirtieth International Joint Conference on Artificial In- telligence, Montreal, 19-27 August 2021, 4175–4181

Learning gen- eralized unsolvability heuristics for classical planning. In The Thirtieth International Joint Conference on Artificial In- telligence, Montreal, 19-27 August 2021, 4175–4181. Inter- national Joint Conferences on Artifical Intelligence (IJCAI). Torralba, A.; Seipp, J.; and Sievers, S

work page 2021
[6]

In Thi ´ebaux, S.; and Yeoh, W., eds.,Proceedings of the Thirty-Second International Con- ference on Automated Planning and Scheduling (ICAPS 2022), 380–384

Loopless Top-K Planning. In Thi ´ebaux, S.; and Yeoh, W., eds.,Proceedings of the Thirty-Second International Con- ference on Automated Planning and Scheduling (ICAPS 2022), 380–384. AAAI Press

work page 2022

[1] [1]

InProceedings of the International Conference on Automated Planning and Scheduling, volume 27, 88–97

Unsolv- ability certificates for classical planning. InProceedings of the International Conference on Automated Planning and Scheduling, volume 27, 88–97. Ghallab, M.; Nau, D.; and Traverso, P. 2004.Automated Planning: theory and practice. Elsevier. Gragera, A.; Fuentetaja, R.; Garc ´ıa-Olaya, ´A.; and Fern´andez, F

work page 2004

[2] [2]

https://www.highs.dev

HiGHS. https://www.highs.dev. Ac- cessed: 17/02/2022. Haslum, P.; Lipovetzky, N.; Magazzeni, D.; Muise, C.; Brachman, R.; Rossi, F.; and Stone, P. 2019.An introduc- tion to the planning domain definition language, volume

work page 2022

[3] [3]

Proving Security of Cryptographic Protocols using Automated Planning.FinPlan 2021,

work page 2021

[4] [4]

In Conitzer, V .; and Sha, F., eds.,Proceed- ings of the Thirty-Fourth AAAI Conference on Artificial In- telligence (AAAI 2020), 9967–9974

Symbolic Top-k Planning. In Conitzer, V .; and Sha, F., eds.,Proceed- ings of the Thirty-Fourth AAAI Conference on Artificial In- telligence (AAAI 2020), 9967–9974. AAAI Press. St˚ahlberg, S.; Franc`es, G.; and Seipp, J

work page 2020

[5] [5]

In The Thirtieth International Joint Conference on Artificial In- telligence, Montreal, 19-27 August 2021, 4175–4181

Learning gen- eralized unsolvability heuristics for classical planning. In The Thirtieth International Joint Conference on Artificial In- telligence, Montreal, 19-27 August 2021, 4175–4181. Inter- national Joint Conferences on Artifical Intelligence (IJCAI). Torralba, A.; Seipp, J.; and Sievers, S

work page 2021

[6] [6]

In Thi ´ebaux, S.; and Yeoh, W., eds.,Proceedings of the Thirty-Second International Con- ference on Automated Planning and Scheduling (ICAPS 2022), 380–384

Loopless Top-K Planning. In Thi ´ebaux, S.; and Yeoh, W., eds.,Proceedings of the Thirty-Second International Con- ference on Automated Planning and Scheduling (ICAPS 2022), 380–384. AAAI Press

work page 2022