Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation
Pith reviewed 2026-05-07 07:24 UTC · model grok-4.3
The pith
Large language models accurately recall constraints yet frequently violate them in multi-turn scientific ideation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In multi-turn LLM-assisted scientific ideation, models exhibit a dissociation between declarative recall and behavioral adherence to constraints: they accurately restate the original rules in a restatement probe while simultaneously violating those rules in their generated ideas. Iterative pressure reliably increases structural complexity and reduces adherence, with knows-but-violates rates varying from 8% to 99% across models. Structured checkpointing partially mitigates the dissociation without eliminating it, and complexity inflation persists even when recall remains intact.
What carries the argument
DriftBench benchmark together with the restatement probe that directly measures the knows-but-violates rate, defined as the share of cases where a model correctly recalls a constraint yet produces an output that violates it.
If this is right
- Iterative pressure increases structural complexity while reducing constraint adherence.
- Structured checkpointing partially reduces knows-but-violates rates but leaves the recall-adherence dissociation intact.
- LLM judges under-detect violations, rendering reported adherence scores conservative.
- The dissociation and complexity inflation hold across temperature settings and both novelty-driven and rigor-driven pressure.
Where Pith is reading between the lines
- Teams relying on LLMs for extended research ideation may need external monitoring tools to catch violations that the models themselves cannot self-correct.
- The pattern suggests that prompt-based consistency mechanisms alone are insufficient for tasks where an initial specification must survive many refinement cycles.
- Testing the same probes on non-scientific creative or engineering workflows could show whether the knows-but-violates behavior is domain-specific or general to multi-turn LLM use.
Load-bearing premise
The 38 research briefs and four interaction conditions sufficiently represent real scientific ideation workflows, and the LLM-as-judge scoring captures the full scope of constraint violations without systematic bias.
What would settle it
A follow-up experiment with practicing researchers conducting real multi-turn ideation sessions on new scientific briefs that finds knows-but-violates rates consistently near zero across models would falsify the reported dissociation between recall and adherence.
Figures
read the original abstract
When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench, a benchmark for evaluating constraint adherence in multi-turn LLM-assisted scientific ideation. Across 2,146 scored benchmark runs spanning seven models from five providers (including two open-weight), four interaction conditions, and 38 research briefs from 24 scientific domains, we find that iterative pressure reliably increases structural complexity and often reduces adherence to original constraints. A restatement probe reveals a dissociation between declarative recall and behavioral adherence, as models accurately restate constraints they simultaneously violate. The knows-but-violates (KBV) rate, measuring constraint non-compliance despite preserved recall, ranges from 8% to 99% across models. Structured checkpointing partially reduces KBV rates but does not close the dissociation, and complexity inflation persists. Human validation against blind raters confirms that the LLM judge under-detects constraint violations, making reported constraint adherence scores conservative. Sensitivity analyses confirm the findings are robust to temperature (0.7 vs.\ 1.0) and pressure type (novelty vs.\ rigor). We release all briefs, prompts, rubrics, transcripts, and scores as an open benchmark.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces DriftBench, a benchmark for constraint adherence in multi-turn LLM-assisted scientific ideation. Across 2146 scored runs with seven models, 38 research briefs from 24 domains, and four interaction conditions, it reports that iterative pressure increases structural complexity while reducing adherence to original constraints. A restatement probe demonstrates a dissociation: models accurately recall constraints they violate, yielding knows-but-violates (KBV) rates from 8% to 99%. Structured checkpointing partially mitigates KBV but does not eliminate the dissociation or complexity inflation. Human validation confirms the LLM judge under-detects violations, making adherence scores conservative. All materials are released openly.
Significance. If the results hold, this provides important empirical evidence of a recall-adherence dissociation in LLMs during iterative scientific tasks, with implications for tool design in research assistance. Strengths include the large evaluation scale, sensitivity checks on temperature and pressure type, human validation of the LLM judge, and full open release of briefs, prompts, rubrics, transcripts, and scores, which support reproducibility and community use.
minor comments (3)
- The abstract would benefit from briefly naming and describing the four interaction conditions to provide immediate context.
- Include a summary table or figure of KBV rates and adherence scores broken down by model and condition to facilitate direct comparison of the reported 8-99% range.
- Clarify the exact operationalization and measurement of 'structural complexity' (e.g., via clause count, dependency depth, or LLM rubric) in the methods, as it underpins one of the main findings.
Simulated Author's Rebuttal
We thank the referee for their accurate and positive summary of our work on DriftBench, including the scale of the evaluation (2,146 runs), the dissociation between recall and adherence, the open release of all materials, and the human validation confirming conservative scoring. We appreciate the recognition of robustness to temperature and pressure type, as well as the recommendation for minor revision. No specific major comments were raised in the report.
Circularity Check
No significant circularity: purely empirical benchmark with direct measurements
full rationale
The paper introduces DriftBench as an empirical benchmark and reports direct measurements of constraint adherence and KBV rates across 2146 runs on seven models. No derivation chain, equations, fitted parameters, or predictions exist that reduce the central findings to inputs by construction. The KBV rate is computed from scored model outputs and restatement probes; human validation and sensitivity checks are external to any internal reduction. No self-citation load-bearing steps or ansatzes are invoked for the core claims. The study is self-contained against external benchmarks and falsifiable via the released transcripts and rubrics.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM-as-judge can reliably score constraint adherence in generated ideas (with acknowledged under-detection)
invented entities (2)
-
DriftBench
no independent evidence
-
knows-but-violates (KBV) rate
no independent evidence
Reference graph
Works this paper leans on
-
[1]
URLhttps://api.semanticscholar.org/CorpusID:281950814. Sikun Guo, Amir Hassan Shariatmadari, Guangzhi Xiong, Albert Huang, Eric Xie, Stefan Bekiranov, and Aidong Zhang. Ideabench: Benchmarking large language models for research idea generation. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V .2,
-
[2]
URLhttps://api.semanticscholar.org/CorpusID:273821733. Peter Jansen, Marc-Alexandre Côté, Tushar Khot, Erin Bransom, Bhavana Dalvi Mishra, Bodhisattwa Prasad Majumder, Oyvind Tafjord, and Peter Clark. Discoveryworld: A virtual environment for developing and evaluating automated scientific discovery agents. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, ...
-
[3]
P ro SA : Assessing and understanding the prompt sensitivity of LLM s
URLhttps://api.semanticscholar.org/CorpusID:286366317. 11 Xingyao Wang, Zihan Wang, Jiateng Liu, Yangyi Chen, Lifan Yuan, Hao Peng, and Heng Ji. Mint: Evaluating llms in multi-turn interaction with tools and language feed- back. In B. Kim, Y . Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y . Sun, ed- itors,International Conference on Learning Represent...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.