pith. sign in

arxiv: 2605.21948 · v1 · pith:FY7PWNWCnew · submitted 2026-05-21 · 💻 cs.LG

SCI-Defense: Defending Manipulation Attacks from Generative Engine Optimization

Pith reviewed 2026-05-22 07:39 UTC · model grok-4.3

classification 💻 cs.LG
keywords Generative Engine OptimizationLLM ranking defenseSemantic manipulationProduct description attacksPerplexity detectionIntegrity scoring
0
0 comments X

The pith

SCI-Defense detects semantic manipulation attacks on LLM rankings by scoring four specific signals in product descriptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SCI-Defense as a framework to counter Generative Engine Optimization attacks that add misleading semantic cues to product descriptions in order to inflate their rankings in LLM-based systems. It combines perplexity checks with Semantic Integrity Scoring across authority attribution, narrative purposiveness, comparative claims, and temporal claims, plus inter-candidate comparison. A reader would care because these attacks could distort recommendations and reduce trust in AI search tools. The approach reaches perfect precision and zero false positives on Amazon data while exposing that standard defenses miss purely semantic tricks.

Core claim

SCI-Defense combines Perplexity detection, Semantic Integrity Scoring on four manipulation dimensions, and Inter-Candidate Detection to identify GEO attacks, reaching Precision of 1.000 and FPR of 0.000 with high recall against string, reasoning, and review attacks on 600 Amazon product descriptions while showing that existing PPL-only, classifier, and paraphrasing defenses record zero recall.

What carries the argument

Semantic Integrity Scoring that checks content along Authority Attribution, Narrative Purposiveness, Comparative Claims, and Temporal Claims to flag manipulation.

Load-bearing premise

The four manipulation dimensions capture the main detectable semantic signals used by GEO attacks.

What would settle it

A manipulation method that raises product rankings in an LLM system without increasing scores on any of the four dimensions would show the defense misses the attack.

Figures

Figures reproduced from arXiv: 2605.21948 by Haibo Jin, Haohan Wang, Huimin Zeng, Xucheng Yu.

Figure 1
Figure 1. Figure 1: SCI-Defense: a three-component framework defending LLM-based ranking against GEO attacks. Each attack type is matched to the component designed to detect it: PPL (GPT-2 perplexity) intercepts statistically anomalous String attacks; SIS (GPT-4o) scores four semantic dimensions to expose the persuasion structure of Reasoning and Review attacks; ICD (cross-candidate embedding similarity) provides complementar… view at source ↗
Figure 2
Figure 2. Figure 2: Estimated Sfinal score distributions for clean descriptions and three attack types. Clean descriptions cluster well below τs=0.55 (zero false positives). String attacks are predominantly intercepted by the PPL early-exit stage before reaching SIS scoring. Reasoning attacks concentrate above τm=0.65, yielding Recall= 0.952. Review attacks straddle τs, with 74.7% of scores falling in [0.45, 0.55), explaining… view at source ↗
read the original abstract

LLM-based ranking systems are vulnerable to Generative Engine Optimization (GEO) attacks, where adversaries inject semantic signals into product descriptions to artificially boost rankings. We propose SCI-Defense, a three-component defense framework combining Perplexity detection (PPL), Semantic Integrity Scoring (SIS), and Inter-Candidate Detection (ICD). SIS evaluates four manipulation dimensions: Authority Attribution (AA), Narrative Purposiveness (NP), Comparative Claims (CA), and Temporal Claims (TC). Evaluated on 600 Amazon product descriptions across 6 categories, SCI-Defense achieves Precision=1.000 and FPR=0.000, with Recall of 1.000, 0.952, and 0.830 against String, Reasoning, and Review attacks respectively. On 600 MS MARCO web passages, String attacks are blocked with perfect recall while Review attacks yield near-zero recall, as web passages lack the persuasion-oriented signals that SIS targets in product descriptions. We demonstrate that existing defenses -- PPL-only filters, SafetyClf content classifiers, and paraphrasing -- achieve zero recall against semantic manipulation attacks. We further demonstrate new attacks such as Specification Amplification and Use-Case Saturation can expose semantic relevance manipulation as a structural defense blind spot that suggests directions for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes SCI-Defense, a three-component defense (Perplexity detection (PPL), Semantic Integrity Scoring (SIS) on four manipulation dimensions—Authority Attribution, Narrative Purposiveness, Comparative Claims, Temporal Claims—and Inter-Candidate Detection (ICD)) against Generative Engine Optimization attacks that inject semantic signals into content to manipulate LLM-based rankings. Evaluated on 600 Amazon product descriptions, it reports Precision=1.000, FPR=0.000, and recalls of 1.000/0.952/0.830 against String/Reasoning/Review attacks; on 600 MS MARCO passages, String attacks are fully blocked but Review attacks yield near-zero recall. The work shows existing defenses (PPL-only, SafetyClf, paraphrasing) achieve zero recall and identifies new attacks (Specification Amplification, Use-Case Saturation) that expose structural blind spots.

Significance. If the results hold, the paper contributes by demonstrating concrete vulnerabilities in generative ranking systems and by showing that existing content filters fail against semantic manipulation. Explicitly surfacing new attack vectors and domain-specific limitations (persuasion signals in product text vs. web passages) provides a useful map for future defenses rather than claiming a complete solution.

major comments (3)
  1. [Abstract] Abstract: The headline metrics (Precision=1.000, FPR=0.000, Recall 1.000/0.952/0.830 on Amazon data) are reported without the scoring formulas for SIS across the four dimensions or implementation details for the PPL+SIS+ICD pipeline, which is load-bearing for assessing whether the central performance claims can be reproduced or generalized.
  2. [Abstract] Abstract: Near-zero recall on Review attacks for MS MARCO web passages (versus strong results on Amazon product descriptions) shows that SIS effectiveness depends on persuasion-oriented signals absent from general web text; this domain specificity directly limits the scope of the claim that SCI-Defense defends GEO attacks.
  3. [Abstract] Abstract: The explicit statement that Specification Amplification and Use-Case Saturation expose a structural blind spot for semantic relevance manipulation indicates that the four SIS dimensions may not capture primary signals used by all GEO tactics, undermining robustness claims even if the three-component pipeline is implemented as described.
minor comments (2)
  1. The evaluation would be strengthened by reporting statistical significance tests or confidence intervals alongside the precision/recall figures.
  2. Adding pseudocode or a detailed algorithmic description of how SIS aggregates the four dimensions would improve clarity and reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, indicating where we agree with the need for clarification or revision and where we provide additional context from the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline metrics (Precision=1.000, FPR=0.000, Recall 1.000/0.952/0.830 on Amazon data) are reported without the scoring formulas for SIS across the four dimensions or implementation details for the PPL+SIS+ICD pipeline, which is load-bearing for assessing whether the central performance claims can be reproduced or generalized.

    Authors: We agree that the abstract's brevity omits the explicit scoring formulas for Semantic Integrity Scoring (SIS) on the four dimensions and the precise implementation details of the combined PPL+SIS+ICD pipeline. These formulas (e.g., weighted aggregation across Authority Attribution, Narrative Purposiveness, Comparative Claims, and Temporal Claims) and pipeline steps are fully specified in Sections 3.2 and 4.1 of the manuscript to support reproducibility. To improve standalone readability of the abstract, we will make a partial revision by adding one sentence briefly describing the SIS dimensions and noting that full formulas and pipeline details appear in the main text. revision: partial

  2. Referee: [Abstract] Abstract: Near-zero recall on Review attacks for MS MARCO web passages (versus strong results on Amazon product descriptions) shows that SIS effectiveness depends on persuasion-oriented signals absent from general web text; this domain specificity directly limits the scope of the claim that SCI-Defense defends GEO attacks.

    Authors: The referee correctly identifies this as a core finding rather than an oversight. The manuscript explicitly attributes the near-zero recall on Review attacks in MS MARCO to the lack of persuasion-oriented signals in general web passages, contrasting with their presence in Amazon product descriptions. We already frame this as evidence of domain specificity in the results and discussion sections. We will revise the abstract and conclusion to more prominently state that SCI-Defense's effectiveness is strongest in persuasion-rich domains such as product text and to qualify the scope of claims about defending GEO attacks more broadly. revision: yes

  3. Referee: [Abstract] Abstract: The explicit statement that Specification Amplification and Use-Case Saturation expose a structural blind spot for semantic relevance manipulation indicates that the four SIS dimensions may not capture primary signals used by all GEO tactics, undermining robustness claims even if the three-component pipeline is implemented as described.

    Authors: We acknowledge the referee's concern and note that the manuscript already presents Specification Amplification and Use-Case Saturation as exposing a structural blind spot in the current four SIS dimensions for certain semantic relevance manipulations. This is positioned as identifying an avenue for future research rather than a claim of comprehensive robustness against every possible GEO tactic. To prevent any overinterpretation, we will revise the discussion to more explicitly separate the demonstrated effectiveness against the three evaluated attack types from the acknowledged limitations against other semantic strategies, while reiterating the need for expanded dimensions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation on independent attack datasets

full rationale

The paper proposes SCI-Defense as a three-component pipeline (PPL + SIS + ICD) where SIS explicitly scores four hand-specified manipulation dimensions (Authority Attribution, Narrative Purposiveness, Comparative Claims, Temporal Claims). Performance metrics are obtained by direct evaluation on separately generated attack datasets (600 Amazon descriptions and 600 MS MARCO passages) rather than by any fitted parameter, self-referential equation, or self-citation that reduces the claimed result to the input by construction. The authors themselves note structural blind spots for unmodeled tactics such as Specification Amplification, confirming that the assessment is externally falsifiable and not tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical applied ML paper with no formal derivations or mathematical axioms; claims rest on experimental results from constructed attack datasets.

pith-pipeline@v0.9.0 · 5761 in / 1171 out tokens · 33162 ms · 2026-05-22T07:39:18.697430+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 4 internal anchors

  1. [1]

    GEO: Generative engine optimization

    Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, and Ameet Deshpande. GEO: Generative engine optimization. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

  2. [2]

    Detecting Language Model Attacks with Perplexity

    Gabriel Alon and Michael Kamfonas. Detecting language model attacks with perplexity.arXiv preprint arXiv:2308.14132, 2023

  3. [3]

    Core: Corpus-based ranking exploitation via llm manipulation.arXiv preprint arXiv:2602.03608, 2026

    Anonymous. Core: Corpus-based ranking exploitation via llm manipulation.arXiv preprint arXiv:2602.03608, 2026

  4. [4]

    Adversarial examples are not easily detected: Bypassing ten detection methods

    Nicholas Carlini, Milad Nasr, Christopher A Choquette-Choo, Matthew Jagielski, Irena Garg, Andreas Terzis, Florian Tramer, and Ludwig Schmidt. Are aligned neural networks adversarially aligned?arXiv preprint arXiv:2306.15447, 2023

  5. [5]

    HotFlip: White-box adversarial examples for text classification

    Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. HotFlip: White-box adversarial examples for text classification. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

  6. [6]

    Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injections

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injections. InAISec Workshop at CCS, 2023

  7. [7]

    Large language models are zero-shot rankers for recommender systems

    Yupeng Hou, Junjie Zhang, Zihan Lin, Hongyu Lu, Ruobing Xie, Julian McAuley, and Wayne Xin Zhao. Large language models are zero-shot rankers for recommender systems. In Proceedings of ECIR, 2024

  8. [8]

    Llama guard: Llm-based input-output safeguard for human-ai conversations

    Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, et al. Llama guard: Llm-based input-output safeguard for human-ai conversations. 2023

  9. [9]

    Baseline defenses for adversarial attacks against aligned language models

    Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping-yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, and Tom Goldstein. Baseline defenses for adversarial attacks against aligned language models. 2023

  10. [10]

    Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models

    Haibo Jin, Leyang Hu, Xinnuo Li, Peiyan Zhang, Chonghan Chen, Jun Zhuang, and Haohan Wang. Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models.arXiv preprint arXiv:2407.01599, 2024

  11. [11]

    Jailbreaking large language models against moderation guardrails via cipher characters.Advances in Neural Information Processing Systems, 37:59408–59435, 2024

    Haibo Jin, Andy Zhou, Joe D Menke, and Haohan Wang. Jailbreaking large language models against moderation guardrails via cipher characters.Advances in Neural Information Processing Systems, 37:59408–59435, 2024

  12. [12]

    A watermark for large language models.arXiv preprint arXiv:2301.10226, 2023a

    John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models.arXiv preprint arXiv:2301.10226, 2023

  13. [13]

    Manning, and Chelsea Finn

    Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D Manning, and Chelsea Finn. DetectGPT: Zero-shot machine-generated text detection using probability curvature.arXiv preprint arXiv:2301.11305, 2023

  14. [14]

    Adversarial search engine optimization for large language models

    Fredrik Nestaas, Edvard Hallström, and Samuele Mücke. Adversarial search engine optimization for large language models. InProceedings of the ACM Web Conference 2024, 2024

  15. [15]

    Ignore Previous Prompt: Attack Techniques For Language Models

    Fábio Perez and Ian Ribeiro. Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527, 2022

  16. [16]

    Ranking manipulation for conversational search engines.arXiv preprint arXiv:2406.03589, 2024

    Samuel Pfrommer, Yatong Cohen, Stefano Soatto, et al. Ranking manipulation for conversational search engines.arXiv preprint arXiv:2406.03589, 2024

  17. [17]

    Rankvicuna: Zero-shot listwise doc- ument reranking with open-source large language models

    Ronak Pradeep, Sahel Sharifymoghaddam, and Jimmy Lin. Rankvicuna: Zero-shot listwise doc- ument reranking with open-source large language models. InarXiv preprint arXiv:2309.15088, 2023. 11

  18. [18]

    RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!

    Ronak Pradeep, Sahel Sharifymoghaddam, and Jimmy Lin. RankZephyr: Effective and robust zero-shot listwise reranking is a breeze!arXiv preprint arXiv:2312.02724, 2023

  19. [19]

    Large language models are effective text rankers with pairwise ranking prompting

    Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Le Yan, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, and Michael Bendersky. Large language models are effective text rankers with pairwise ranking prompting. InFindings of the Association for Computational Linguistics: NAACL 2024, 2024

  20. [20]

    Language models are unsupervised multitask learners.OpenAI Blog, 1(8), 2019

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners.OpenAI Blog, 1(8), 2019

  21. [21]

    Is chatgpt good at search? investigating large language models as re-ranking agents

    Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. Is chatgpt good at search? investigating large language models as re-ranking agents. InProceedings of EMNLP, 2023

  22. [22]

    Datasentinel: A game-theoretic detection of prompt injection attacks

    Yupei Yu, Yuqi Liu, Jianfeng Gao, and Kai Chen. Datasentinel: A game-theoretic detection of prompt injection attacks. InProceedings of IEEE S&P, 2025

  23. [23]

    ShieldLM: Empowering llms as aligned, trustworthy and responsible language models.arXiv preprint arXiv:2402.04269, 2024

    Zheng Zhang, Puhan Shi, Lixin Hu, Biao Qin, Yangqiu Li, Dawei Yin, and Ping Li. ShieldLM: Empowering llms as aligned, trustworthy and responsible language models.arXiv preprint arXiv:2402.04269, 2024

  24. [24]

    Poisoning retrieval corpora by injecting adversarial passages

    Zexuan Zhong, Ziqing Huang, Alexander Wettig, and Danqi Chen. Poisoning retrieval corpora by injecting adversarial passages. InProceedings of EMNLP, 2023

  25. [25]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023. 12 A SCI-Defense Pseudocode Algorithm 2 provides the complete pseudocode for SCI-Defense, with all symbolic parameters defined. Concrete values for all thre...

  26. [26]

    Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...