pith. machine review for the scientific record. sign in

arxiv: 2505.13360 · v3 · submitted 2025-05-19 · 💻 cs.CL · cs.SE

Recognition: unknown

What Prompts Don't Say: Understanding and Managing Underspecification in LLM Prompts

Authors on Pith no claims yet
classification 💻 cs.CL cs.SE
keywords promptrequirementspromptsunderspecificationllmsabilityaccuracyacross
0
0 comments X
read the original abstract

Prompt underspecification is a common challenge when interacting with LLMs. In this paper, we present an in-depth analysis of this problem, showing that while LLMs can often infer unspecified requirements by default (41.1%), such behavior is fragile: Under-specified prompts are 2x as likely to regress across model or prompt changes, sometimes with accuracy drops exceeding 20%. This instability makes it difficult to reliably build LLM applications. Moreover, simply specifying all requirements does not consistently help, as models have limited instruction-following ability and requirements can conflict. Standard prompt optimizers likewise provide little benefit. To address these issues, we propose requirements-aware prompt optimization mechanisms that improve performance by 4.8% on average over baselines. We further advocate for a systematic process of proactive requirements discovery, evaluation, and monitoring to better manage prompt underspecification in practice.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Instruction Complexity Induces Positional Collapse in Adversarial LLM Evaluation

    cs.CL 2026-04 unverdicted novelty 7.0

    Complex adversarial instructions induce positional collapse in LLMs, with extreme cases showing 99.9% concentration on a single response position and zero content sensitivity.

  2. When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation

    cs.SE 2026-04 unverdicted novelty 7.0

    Structurally rich task descriptions make LLMs robust to prompt under-specification, and under-specification can enhance code correctness by disrupting misleading lexical or structural cues.

  3. Intent Lenses: Inferring Capture-Time Intent to Transform Opportunistic Photo Captures into Structured Visual Notes

    cs.HC 2026-04 unverdicted novelty 6.0

    Intent Lenses infer capture-time user intent from photos via LLMs to create dynamic, reusable interactive objects that generate and organize structured visual notes for later sensemaking.

  4. Consistency as a Testable Property: Statistical Methods to Evaluate AI Agent Reliability

    cs.AI 2026-05 unverdicted novelty 5.0

    A framework with U-statistics and kernel-based metrics quantifies AI agent consistency and robustness, showing trajectory metrics outperform pass@1 rates in diagnosing failures.

  5. Quantifying the Utility of User Simulators for Building Collaborative LLM Assistants

    cs.CL 2026-05 unverdicted novelty 5.0

    Fine-tuned simulators grounded in real human data produce LLM assistants that win more often against real users than those trained against role-playing simulators.

  6. Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

    cs.SE 2026-04 unverdicted novelty 5.0

    Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.