pith. sign in

arxiv: 2607.01518 · v1 · pith:RDSSPCQYnew · submitted 2026-07-01 · 💻 cs.CR · cs.RO

Overthink-Triggered Slowdown Attacks on LVLM-Based Robotic Systems

Pith reviewed 2026-07-03 19:50 UTC · model grok-4.3

classification 💻 cs.CR cs.RO
keywords overthinking attacksLVLMrobotic systemsscene text triggersslowdown attacksblack-box transferlatency amplificationvision-language models
0
0 comments X

The pith

Adversaries can embed crafted scene text to trigger overthinking in LVLMs, causing up to 6.96x slowdowns in robotic decision-making even without model access.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large vision-language models integrated into robots can be induced to produce excessively long reasoning traces when specific human-readable text appears in their visual input. The paper develops a three-stage process to discover such triggers: building a corpus of reasoning-heavy scene text, pulling lexical features from short response prefixes, and running an efficient black-box search that confirms candidates by actual latency. Across three LVLMs the triggers consistently produce slowdown ratios above 1.0x, with the best single trigger reaching 6.96x digitally and 4.74x when physically printed. This matters because delayed inference in robots can create safety problems when an attacker controls only the environment the robot sees.

Core claim

The central claim is that scene text can function as transferable triggers for overthinking in LVLMs used by robotic systems. The three-stage framework extracts overthinking-correlated lexical features from short response prefixes, then uses a prefix-based proxy score for black-box search and validates top candidates with full latency measurements. Evaluation on unseen images and multiple LVLMs shows all triggers produce slowdown ratios greater than 1.0x, with the strongest reaching 6.96x, and physical prints still reaching 4.74x.

What carries the argument

The three-stage framework that extracts lexical features from short response prefixes to guide black-box search for transferable overthinking triggers.

If this is right

  • All discovered triggers produce slowdown ratios above 1.0x on three representative LVLMs.
  • The strongest single trigger achieves 6.96x latency amplification under black-box conditions.
  • The same triggers remain effective when physically printed, reaching 4.74x amplification.
  • Triggers transfer to unseen images while meeting standard attack-success thresholds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar text-based triggers might affect other multimodal models that rely on long reasoning traces outside robotics.
  • Robotic deployments could reduce risk by adding explicit limits on reasoning length or by screening incoming scene text.
  • Testing the triggers on moving robots in changing physical environments would reveal whether motion or lighting alters the observed slowdowns.

Load-bearing premise

Lexical features taken from short response prefixes will reliably cause overthinking when applied to new images and different LVLMs under black-box transfer.

What would settle it

Measuring latency on a fourth unseen LVLM with the published trigger set and finding that every trigger produces a slowdown ratio of 1.0x or lower across the test images.

Figures

Figures reproduced from arXiv: 2607.01518 by Bo Chen, Jie Wu, Qiang Han.

Figure 1
Figure 1. Figure 1: An example of the overthinking-triggered slowdown [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An overview of our three-stage framework for identifying the scene-text triggers. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Stability of discovered keyword sets measured by [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: High group latency distribution under full inference [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Low group latency distribution under full inference and [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Real-world evaluation setup. Left: clean physical scene [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The robotic vehicle used in our evaluation. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Empirical CDF of added latency under attack across [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 9
Figure 9. Figure 9: Per-trigger slowdown-ratio spread (Attack/Clean) for three models (T1–T4). Dots mark mean ratio and vertical whiskers show the min-max range across images. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
read the original abstract

Large Vision-Language Models (LVLMs) have been increasingly integrated into robotic systems. However, these models may exhibit overthinking behaviors, where they generate excessively long reasoning traces, incurring an excessive inference time. This overthinking behavior poses a serious risk to robotic systems, as the adversary can deliberately trigger overthinking to slow down the decision making of a victim robotic system, causing a variety of safety issues (i.e., an overthinking-induced slowdown attack). To initiate this attack, an adversary can embed carefully crafted, human-readable scene text into the visual scene observed by a victim robotic agent, causing significant inference delays even under a strict black-box setting. Therefore, the embedded scene text serves as a significant "trigger" for the attack. This work systematically identifies and validates transferable triggers of overthinking in robotic systems by introducing a three-stage framework. First, we construct a diverse corpus of reasoning-intensive scene text and extract overthinking-correlated lexical features from short response prefixes. Second, we perform an efficient black-box search guided by a prefix-based proxy score while selectively confirming a small set of top candidates with full latency measurements. Third, we evaluate black-box transfer using a fixed pool of triggers on unseen images and multiple LVLMs, reporting latency amplification and attack success rates under standard thresholds. Across three representative LVLMs, all triggers yield slowdown ratios greater than 1.0x, with the strongest single-trigger case reaching 6.96x. The physical printing of the text trigger still causes up to 4.74x latency amplification. These results demonstrate that our discovered triggers are transferred between multiple LVLM models and consistently cause significant slowdowns in robotic systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces a three-stage framework to discover human-readable scene-text triggers that induce overthinking in LVLMs deployed in robotic systems, thereby causing inference slowdown attacks. Stage 1 builds a corpus of reasoning-intensive text and extracts lexical features from short response prefixes; Stage 2 performs black-box search guided by a prefix-based proxy score with selective full-latency confirmation; Stage 3 evaluates transfer to unseen images and multiple LVLMs. The central empirical claim is that all discovered triggers produce slowdown ratios >1.0x across three representative LVLMs, reaching a maximum of 6.96x digitally and 4.74x when the trigger is physically printed.

Significance. If the reported latency amplifications and transfer results hold under rigorous controls, the work identifies a practical, low-effort attack vector on LVLM-based robotics that does not require model access or adversarial perturbations. The inclusion of physical-print experiments is a concrete strength that moves the demonstration beyond purely digital settings. The absence of any machine-checked proofs or parameter-free derivations is expected for an empirical attack paper; credit is due for the selective full-latency confirmation step, but this does not yet compensate for missing validation of the proxy mechanism.

major comments (2)
  1. [Abstract (three-stage framework)] Abstract (three-stage framework description): the central claim that prefix-extracted lexical features enable reliable black-box search and transfer to unseen images/LVLMs rests on an unverified assumption that the short-prefix proxy score correlates with full-query inference latency. No correlation coefficient, ablation on proxy vs. full latency, or ranking-consistency metric is referenced, so it is unclear whether the reported 6.96x and 4.74x maxima are caused by the identified features or are incidental to the search procedure.
  2. [Abstract (experimental results)] Abstract (experimental results): the quantitative claims of slowdown ratios >1.0x for all triggers and transfer success rates are presented without reference to dataset size, number of images, error bars, statistical tests, or controls for measurement artifacts (e.g., caching, batching, or hardware variability). These omissions are load-bearing because the transfer and physical-print results cannot be assessed for robustness without them.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and commit to revisions that strengthen the empirical validation of the proxy mechanism and the reporting of experimental details.

read point-by-point responses
  1. Referee: [Abstract (three-stage framework)] Abstract (three-stage framework description): the central claim that prefix-extracted lexical features enable reliable black-box search and transfer to unseen images/LVLMs rests on an unverified assumption that the short-prefix proxy score correlates with full-query inference latency. No correlation coefficient, ablation on proxy vs. full latency, or ranking-consistency metric is referenced, so it is unclear whether the reported 6.96x and 4.74x maxima are caused by the identified features or are incidental to the search procedure.

    Authors: We agree that the abstract does not explicitly quantify the proxy's reliability. The manuscript uses the prefix proxy for search efficiency followed by selective full-latency confirmation on top candidates, but we acknowledge the need for direct validation. In the revision we will add a new analysis subsection reporting (1) the Pearson correlation coefficient between proxy scores and full inference latencies across the candidate pool, (2) an ablation comparing proxy-guided search against random selection in terms of discovered slowdown ratios, and (3) ranking-consistency metrics (e.g., Kendall tau) between proxy ordering and full-latency ordering. These additions will clarify that the reported maxima are attributable to the identified lexical features rather than search artifacts. revision: yes

  2. Referee: [Abstract (experimental results)] Abstract (experimental results): the quantitative claims of slowdown ratios >1.0x for all triggers and transfer success rates are presented without reference to dataset size, number of images, error bars, statistical tests, or controls for measurement artifacts (e.g., caching, batching, or hardware variability). These omissions are load-bearing because the transfer and physical-print results cannot be assessed for robustness without them.

    Authors: We concur that the abstract omits key experimental metadata. The full manuscript evaluates transfer across a fixed pool of triggers on unseen images and multiple LVLMs, yet the abstract does not specify scale or controls. In the revision we will (1) expand the abstract to reference the number of evaluation images and models, (2) add error bars derived from repeated measurements, (3) include statistical significance tests (e.g., paired t-tests) for slowdown ratios, and (4) describe controls for caching, batching, and hardware variability by reporting mean latencies over multiple independent runs on fixed hardware configurations. These changes will allow readers to assess robustness of the transfer and physical-print results. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical attack demonstration with direct latency measurements

full rationale

The paper presents an empirical three-stage framework for discovering and validating overthinking triggers via corpus construction, lexical feature extraction from prefixes, proxy-guided black-box search, and full-latency confirmation on unseen images across LVLMs. All reported results (slowdown ratios >1.0x, max 6.96x, physical 4.74x) are direct measurements of inference time, not predictions or derivations that reduce to fitted inputs or self-citations. No equations, parameter fitting, uniqueness theorems, or ansatzes appear; the proxy is used only for search efficiency with selective full confirmation, and transfer is validated by explicit testing rather than by construction. This is a standard empirical security evaluation with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Empirical security evaluation paper with no mathematical model, fitted constants, or new postulated entities; relies on the domain premise that LVLMs can be induced into long reasoning traces by visual text.

axioms (1)
  • domain assumption LVLMs exhibit overthinking behaviors that produce excessively long reasoning traces and high inference latency
    Invoked in the opening paragraph as the mechanism enabling the slowdown attack.

pith-pipeline@v0.9.1-grok · 5833 in / 1262 out tokens · 31399 ms · 2026-07-03T19:50:18.010237+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 21 canonical work pages · 1 internal anchor

  1. [1]

    On-board vision-language models for personalized autonomous vehicle motion control: System design and real-world validation,

    C. Cui, Z. Yang, Y . Zhou, J. Peng, S.-Y . Park, C. Zhang, Y . Ma, X. Cao, W. Ye, Y . Fenget al., “On-board vision-language models for personalized autonomous vehicle motion control: System design and real-world validation,”arXiv preprint arXiv:2411.11913, 2024

  2. [2]

    Drivevlm: The convergence of autonomous driving and large vision-language models,

    X. Tian, J. Gu, B. Li, Y . Liu, Y . Wang, Z. Zhao, K. Zhan, P. Jia, X. Lang, and H. Zhao, “Drivevlm: The convergence of autonomous driving and large vision-language models,” inConference on Robot Learning. PMLR, 2025, pp. 4698–4726

  3. [3]

    Multi-frame, lightweight & efficient vision-language models for question answering in autonomous driving,

    A. Gopalkrishnan, R. Greer, and M. Trivedi, “Multi-frame, lightweight & efficient vision-language models for question answering in autonomous driving,”arXiv preprint arXiv:2403.19838, 2024

  4. [4]

    Large (vision) language models for autonomous vehicles: Current trends and future directions,

    H. Tian, K. Reddy, Y . Feng, M. Quddus, Y . Demiris, and P. An- geloudis, “Large (vision) language models for autonomous vehicles: Current trends and future directions,”IEEE Transactions on Intelligent Transportation Systems, vol. 27, no. 1, pp. 187–210, 2025

  5. [5]

    Overthink: Slowdown attacks on reasoning llms,

    A. Kumar, J. Roh, A. Naseh, M. Karpinska, M. Iyyer, A. Houmansadr, and E. Bagdasarian, “Overthink: Slowdown attacks on reasoning llms,” arXiv preprint arXiv:2502.02542, 2025

  6. [6]

    Dnr bench: Benchmarking over-reasoning in reasoning llms,

    M. Hashemi, O. Bamgbose, S. T. Madhusudhan, J. S. Nair, A. Tiwari, and V . Yadav, “Dnr bench: Benchmarking over-reasoning in reasoning llms,”arXiv preprint arXiv:2503.15793, 2025

  7. [7]

    Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

    X. Chen, J. Xu, T. Liang, Z. He, J. Pang, D. Yu, L. Song, Q. Liu, M. Zhou, Z. Zhanget al., “Do not think that much for 2+ 3=? on the overthinking of o1-like llms,”arXiv preprint arXiv:2412.21187, 2024

  8. [8]

    Induc- ing high energy-latency of large vision-language models with verbose images,

    K. Gao, Y . Bai, J. Gu, S.-T. Xia, P. Torr, Z. Li, and W. Liu, “Induc- ing high energy-latency of large vision-language models with verbose images,”arXiv preprint arXiv:2401.11170, 2024

  9. [9]

    Drivegpt4: Interpretable end-to-end autonomous driving via large language model,

    Z. Xu, Y . Zhang, E. Xie, Z. Zhao, Y . Guo, K.-Y . K. Wong, Z. Li, and H. Zhao, “Drivegpt4: Interpretable end-to-end autonomous driving via large language model,”IEEE Robotics and Automation Letters, 2024

  10. [10]

    Simlingo: Vision-only closed-loop autonomous driving with language-action alignment,

    K. Renz, L. Chen, E. Arani, and O. Sinavski, “Simlingo: Vision-only closed-loop autonomous driving with language-action alignment,” in Proceedings of the Computer Vision and Pattern Recognition Confer- ence, 2025, pp. 11 993–12 003

  11. [11]

    Chai: Command hijacking against embodied ai,

    L. Burbano, D. Ortiz, Q. Sun, S. Yang, H. Tu, C. Xie, Y . Cao, and A. A. Cardenas, “Chai: Command hijacking against embodied ai,”arXiv preprint arXiv:2510.00181, 2025

  12. [12]

    AutoDAN: Interpretable gradient-based adversarial attacks on large language models,

    S. Zhu, R. Zhang, B. An, G. Wu, J. Barrow, Z. Wang, F. Huang, A. Nenkova, and T. Sun, “AutoDAN: Interpretable gradient-based adversarial attacks on large language models,” in First Conference on Language Modeling, 2024. [Online]. Available: https://openreview.net/forum?id=INivcBeIDK

  13. [13]

    NVIDIA Jetson Nano,

    NVIDIA, “NVIDIA Jetson Nano,” Retrieved on January 28, 2026, from https://www.nvidia.com/en-us/autonomous-machines/embedded- systems/jetson-nano/product-development/, 2026

  14. [14]

    A survey of security and privacy issues in v2x communication systems,

    T. Yoshizawa, D. Singel ´ee, J. T. Muehlberg, S. Delbruel, A. Taherkordi, D. Hughes, and B. Preneel, “A survey of security and privacy issues in v2x communication systems,”ACM Computing Surveys, vol. 55, no. 9, pp. 1–36, 2023

  15. [15]

    Bdd100k: A diverse driving dataset for heterogeneous multitask learning,

    F. Yu, H. Chen, X. Wang, W. Xian, Y . Chen, F. Liu, V . Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” inThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

  16. [16]

    Dodge,The concise encyclopedia of statistics

    Y . Dodge,The concise encyclopedia of statistics. Springer Science & Business Media, 2008

  17. [17]

    Autospearman: Automatically mitigating correlated software metrics for interpreting defect models,

    J. Jiarpakdee, C. Tantithamthavorn, and C. Treude, “Autospearman: Automatically mitigating correlated software metrics for interpreting defect models,” in2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2018, pp. 92–103

  18. [18]

    Thinktrap: Denial-of-service attacks against black-box llm services via infinite thinking,

    Y . Li, J. Wang, H. Zhu, J. Lin, S. Chang, and M. Guo, “Thinktrap: Denial-of-service attacks against black-box llm services via infinite thinking,”arXiv preprint arXiv:2512.07086, 2025

  19. [19]

    Crabs: Consuming resource via auto-generation for llm-dos attack un- der black-box settings,

    Y . Zhang, Z. Zhou, W. Zhang, X. Wang, X. Jia, Y . Liu, and S. Su, “Crabs: Consuming resource via auto-generation for llm-dos attack un- der black-box settings,” inFindings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 11 128–11 150

  20. [20]

    Prompt-induced over-generation as denial-of-service: A black-box attack-side benchmark,

    Y . Guo, J. Plested, T. Lynar, K. Thilakarathna, N. Sivaroopan, J. Yang, W. Yanget al., “Prompt-induced over-generation as denial-of-service: A black-box attack-side benchmark,”arXiv preprint arXiv:2512.23779, 2025

  21. [21]

    Excessive reasoning attack on reasoning llms,

    W. M. Si, M. Li, M. Backes, and Y . Zhang, “Excessive reasoning attack on reasoning llms,”arXiv preprint arXiv:2506.14374, 2025

  22. [22]

    Extendattack: Attacking servers of lrms via extending reasoning,

    Z. Zhu, Y . Liu, Z. Xu, Y . Ma, H. Gao, N. Chen, Y . Guo, W. Qu, H. Xu, Z. Kanget al., “Extendattack: Attacking servers of lrms via extending reasoning,”arXiv preprint arXiv:2506.13737, 2025

  23. [23]

    Distractor injection attacks on large reasoning models: Characterization and defense,

    Z. Zhang, W. Xu, S. Cui, and C. K. Reddy, “Distractor injection attacks on large reasoning models: Characterization and defense,”arXiv preprint arXiv:2510.16259, 2025

  24. [24]

    Denial-of- service poisoning attacks against large language models,

    K. Gao, T. Pang, C. Du, Y . Yang, S.-T. Xia, and M. Lin, “Denial-of- service poisoning attacks against large language models,”arXiv preprint arXiv:2410.10760, 2024

  25. [25]

    Badthink: Triggered overthinking attacks on chain-of-thought reasoning in large language models,

    S. Liu, R. Li, L. Yu, L. Zhang, Z. Liu, and G. Jin, “Badthink: Triggered overthinking attacks on chain-of-thought reasoning in large language models,”arXiv preprint arXiv:2511.10714, 2025

  26. [26]

    Badreasoner: Planting tunable overthinking backdoors into large reasoning models for fun or profit,

    B. Yi, Z. Fei, J. Geng, T. Li, L. Nie, Z. Liu, and Y . Li, “Badreasoner: Planting tunable overthinking backdoors into large reasoning models for fun or profit,”arXiv preprint arXiv:2507.18305, 2025

  27. [27]

    Beyond max tokens: Stealthy resource amplification via tool calling chains in llm agents,

    K. Zhou, Y . Zheng, Y . He, M. Xue, X. Gong, Y . Wang, and K.-Y . Lam, “Beyond max tokens: Stealthy resource amplification via tool calling chains in llm agents,”arXiv preprint arXiv:2601.10955, 2026

  28. [28]

    Hidden tail: Adversarial image causing stealthy resource consumption in vision-language models,

    R. Zhang, Z. Wang, T. Yang, H. Li, W. Jiang, Q. Zhao, Y . Liu, and G. Xu, “Hidden tail: Adversarial image causing stealthy resource consumption in vision-language models,”arXiv preprint arXiv:2508.18805, 2025

  29. [29]

    Unveiling typographic deceptions: Insights of the typographic vulnerability in large vision-language models,

    H. Cheng, E. Xiao, J. Gu, L. Yang, J. Duan, J. Zhang, J. Cao, K. Xu, and R. Xu, “Unveiling typographic deceptions: Insights of the typographic vulnerability in large vision-language models,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 179–196

  30. [30]

    Vision-llms can fool themselves with self-generated typographic at- tacks,

    M. Qraitem, N. Tasnim, P. Teterwak, K. Saenko, and B. A. Plummer, “Vision-llms can fool themselves with self-generated typographic at- tacks,”arXiv preprint arXiv:2402.00626, 2024

  31. [31]

    Scenetap: Scene-coherent typographic adversarial planner against vision-language models in real-world environments,

    Y . Cao, Y . Xing, J. Zhang, D. Lin, T. Zhang, I. Tsang, Y . Liu, and Q. Guo, “Scenetap: Scene-coherent typographic adversarial planner against vision-language models in real-world environments,” inProceed- ings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 25 050–25 059

  32. [32]

    Transferable multimodal attack on vision-language pre-training 14 models,

    H. Wang, K. Dong, Z. Zhu, H. Qin, A. Liu, X. Fang, J. Wang, and X. Liu, “Transferable multimodal attack on vision-language pre-training 14 models,” in2024 IEEE Symposium on Security and Privacy (SP). IEEE, 2024, pp. 1722–1740

  33. [33]

    Thinkswitcher: When to think hard, when to think fast,

    G. Liang, L. Zhong, Z. Yang, and X. Quan, “Thinkswitcher: When to think hard, when to think fast,”arXiv preprint arXiv:2505.14183, 2025

  34. [34]

    Lapo: Internalizing reasoning efficiency via length- adaptive policy optimization,

    X. Wu, Y . Yan, S. Lyu, L. Wu, Y . Qiu, Y . Shen, W. Lu, J. Shao, J. Xiao, and Y . Zhuang, “Lapo: Internalizing reasoning efficiency via length- adaptive policy optimization,”arXiv preprint arXiv:2507.15758, 2025

  35. [35]

    Frost: Filtering reasoning outliers with attention for efficient reasoning,

    H. Luo, Z. Jiang, M. Z. Hasan, Y . Chen, and S. Sarkar, “Frost: Filtering reasoning outliers with attention for efficient reasoning,”arXiv preprint arXiv:2601.19001, 2026. APPENDIXA PROMPTTEMPLATES To ensure the reproducibility of our trigger discovery frame- work, we provide the exact templates used for seed generation (Stage 1) and structural decomposit...

  36. [36]

    scenario description: The concrete driving environment or scene being described (e.g., road condition, traffic situation, objects)

  37. [37]

    ambiguity or conflict: Any contradiction, ethical dilemma, uncertainty, unexpected event, or conflict that makes the situation non-trivial

  38. [38]

    reasoning requirement: Explicit or implicit instructions that encourage careful thinking, step-by-step reasoning, ethical analysis, or deliberation

  39. [39]

    constraints or formatting: Any constraints on how the answer should be structured or expressed (e.g., ”explain first, then answer”, ”step by step”, repetition rules, formatting requirements, or meta-instructions)

  40. [40]

    final action request: The final question or decision request that asks the model to output an action, judgment, or recommendation. Input Prompt: ”{full text}” Return ONLY a valid JSON object with EXACTLY these five keys: - scenario description - ambiguity or conflict - reasoning requirement - constraints or formatting - final action request Do NOT include...