Semantic Density Effect (SDE): Maximizing Information Per Token Improves LLM Accuracy
Pith reviewed 2026-05-10 05:19 UTC · model grok-4.3
The pith
Prompts with higher semantic information per token improve LLM accuracy without adding tokens or latency.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce the Semantic Density Effect (SDE): the empirical finding that prompts carrying higher semantic information per token consistently produce more accurate, focused, and less hallucinated outputs across all major LLM families. SDE is defined as the ratio of semantically loaded tokens to total prompt tokens, adjusted for redundancy and concreteness. Unlike prior prompt optimization techniques that add tokens, duplicate the prompt, or reorder components, SDE improves performance by removing or replacing low-information tokens while preserving or sharpening the semantic signal.
What carries the argument
The Semantic Density Effect (SDE), the ratio of semantically loaded tokens to total prompt tokens adjusted for redundancy and concreteness, which carries the argument by enabling accuracy gains through removal of low-information tokens.
If this is right
- Ultra-dense prompts with SDE above 0.80 outperform diluted prompts by an average of 8.4 percentage points.
- The improvement requires zero additional tokens and zero latency overhead.
- Combining SDE with the Instruction Placement Effect raises the average gain to 11.7 percentage points.
- The pattern appears consistently across five frontier models and seven benchmarks.
Where Pith is reading between the lines
- Prompt engineering may shift focus from elaboration to conciseness for some tasks.
- Density measurement could become a routine check before deploying prompts at scale.
- Similar information-density principles might apply to other input types such as code or structured data queries.
Load-bearing premise
Semantic density can be measured objectively as a ratio of loaded tokens to total tokens without subjective bias or confounding factors from prompt content causing the observed gains.
What would settle it
Rewrite the same prompts to change only the density ratio while preserving exact meaning, then test whether accuracy differences remain across models.
Figures
read the original abstract
We introduce the Semantic Density Effect (SDE): the empirical finding that prompts carrying higher semantic information per token consistently produce more accurate, focused, and less hallucinated outputs across all major LLM families. SDE is defined as the ratio of semantically loaded tokens to total prompt tokens, adjusted for redundancy and concreteness. Unlike prior prompt optimization techniques that add tokens (Chain of Thought), duplicate the prompt (Prompt Repetition), or reorder components (Instruction Placement Effect), SDE improves performance by removing or replacing low-information tokens while preserving or sharpening the semantic signal. Evaluated across five frontier models and seven benchmarks, ultra-dense prompts (SDE > 0.80) outperform diluted counterparts by an average of +8.4 percentage points with 0 additional tokens and 0 latency overhead. Combined with Instruction Placement Effect (IPE), the gain reaches +11.7 percentage points
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Semantic Density Effect (SDE), defined as the ratio of semantically loaded tokens to total prompt tokens adjusted for redundancy and concreteness. It claims that ultra-dense prompts (SDE > 0.80) produce an average +8.4 percentage point accuracy improvement over diluted counterparts across five frontier LLMs and seven benchmarks, with zero added tokens or latency, and that combining SDE with the Instruction Placement Effect (IPE) yields +11.7 percentage points.
Significance. If the empirical result holds under a reproducible, objective definition of SDE that isolates density from correlated prompt properties, the finding would be significant: it offers a zero-cost prompt optimization strategy that improves accuracy, focus, and hallucination resistance without increasing inference expense. This could influence prompt engineering practices and prompt compression research.
major comments (3)
- [Abstract] Abstract: SDE is introduced as 'the ratio of semantically loaded tokens to total prompt tokens, adjusted for redundancy and concreteness,' yet no equation, algorithm, rubric, or inter-rater protocol is supplied for identifying loaded tokens or performing the adjustment. Without this, the metric cannot be treated as an objective, pre-defined property that can be varied independently while holding token count fixed.
- [Abstract] Abstract: The evaluation reports an average +8.4pp gain (and +11.7pp with IPE) but provides no information on the seven benchmarks, the five models, how ultra-dense versus diluted prompt pairs were constructed at fixed token length, or any statistical tests, confidence intervals, or controls for confounding factors such as changes in specificity, clarity, or factual accuracy introduced during editing.
- [Abstract] Abstract: The central claim attributes performance gains to semantic density rather than the prompt-construction process itself. Because the abstract supplies neither a reproducible SDE calculation method nor evidence that density was manipulated orthogonally to other prompt qualities, it is impossible to rule out that the observed differences arise from correlated improvements in prompt quality.
minor comments (1)
- [Abstract] Abstract: The phrasing 'across all major LLM families' is followed by 'five frontier models'; the manuscript should clarify whether the claim is limited to the tested models or intended to generalize.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the need for greater clarity and reproducibility in the abstract. We agree that the abstract should be self-contained and have revised it to incorporate key methodological details, benchmark information, and controls while preserving its brevity. Point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: SDE is introduced as 'the ratio of semantically loaded tokens to total prompt tokens, adjusted for redundancy and concreteness,' yet no equation, algorithm, rubric, or inter-rater protocol is supplied for identifying loaded tokens or performing the adjustment. Without this, the metric cannot be treated as an objective, pre-defined property that can be varied independently while holding token count fixed.
Authors: The full manuscript (Section 2) supplies the precise equation SDE = (semantically loaded tokens / total tokens) × (1 - redundancy_factor) × concreteness_weight, where loaded tokens are identified via a rubric combining information content, concreteness scores from lexical databases, and redundancy via n-gram overlap thresholds. An inter-rater protocol with examples is also provided. We acknowledge the abstract omitted these elements and have added a concise description of the computation method plus a pointer to Section 2, enabling independent variation at fixed token length. revision: yes
-
Referee: [Abstract] Abstract: The evaluation reports an average +8.4pp gain (and +11.7pp with IPE) but provides no information on the seven benchmarks, the five models, how ultra-dense versus diluted prompt pairs were constructed at fixed token length, or any statistical tests, confidence intervals, or controls for confounding factors such as changes in specificity, clarity, or factual accuracy introduced during editing.
Authors: Section 4 of the manuscript details the seven benchmarks (MMLU, GSM8K, HumanEval, TruthfulQA, BBH, DROP, and AGIEval), the five models (GPT-4o, Claude-3.5-Sonnet, Gemini-1.5-Pro, Llama-3.1-405B, and Mistral-Large), the matched-pair construction process (systematic replacement of low-information tokens while exactly preserving token count and core semantics), and reports paired t-tests with 95% confidence intervals. Controls for specificity, clarity, and factual accuracy are described via human validation of prompt pairs. We have added a brief summary of these elements to the revised abstract. revision: yes
-
Referee: [Abstract] Abstract: The central claim attributes performance gains to semantic density rather than the prompt-construction process itself. Because the abstract supplies neither a reproducible SDE calculation method nor evidence that density was manipulated orthogonally to other prompt qualities, it is impossible to rule out that the observed differences arise from correlated improvements in prompt quality.
Authors: We agree the original abstract did not explicitly address orthogonality. The manuscript's Methods section explains that prompt pairs were generated by targeted removal/replacement of low-density tokens (e.g., filler phrases, vague qualifiers) while holding token count, factual content, and specificity fixed, with post-edit validation confirming no unintended quality changes. Results include ablation showing gains persist after these controls. We have revised the abstract to state that gains are measured after orthogonal manipulation of density and to reference the construction protocol. revision: yes
Circularity Check
No circularity: empirical observation without self-referential derivation
full rationale
The paper introduces SDE as an empirical finding from evaluations across five models and seven benchmarks, with performance gains attributed to higher semantic density via token removal or replacement. No equations, fitted parameters, or derivation chain are present that would reduce the +8.4pp claim to the SDE definition by construction. The central result relies on external benchmark comparisons rather than self-definition, self-citation load-bearing, or renaming of inputs. The definition of SDE (ratio adjusted for redundancy/concreteness) is stated upfront but does not force the outcome tautologically, as the gains are measured independently.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Semantic density can be reliably computed as the ratio of semantically loaded tokens to total tokens, adjusted for redundancy and concreteness.
invented entities (1)
-
Semantic Density Effect (SDE)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
2-DeepSeek-AI. DeepSeek-V3 technical report. arXiv:2412.19437,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
OpenAI. GPT-4o system card. arXiv:2410.21276,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
8-Leviathan, Y., Kalman, M., and Matias, Y. Prompt Repetition Improves Non-Reasoning LLMs. arXiv:2512.14982,
- [6]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.