SCI-Defense: Defending Manipulation Attacks from Generative Engine Optimization
Pith reviewed 2026-05-22 07:39 UTC · model grok-4.3
The pith
SCI-Defense detects semantic manipulation attacks on LLM rankings by scoring four specific signals in product descriptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SCI-Defense combines Perplexity detection, Semantic Integrity Scoring on four manipulation dimensions, and Inter-Candidate Detection to identify GEO attacks, reaching Precision of 1.000 and FPR of 0.000 with high recall against string, reasoning, and review attacks on 600 Amazon product descriptions while showing that existing PPL-only, classifier, and paraphrasing defenses record zero recall.
What carries the argument
Semantic Integrity Scoring that checks content along Authority Attribution, Narrative Purposiveness, Comparative Claims, and Temporal Claims to flag manipulation.
Load-bearing premise
The four manipulation dimensions capture the main detectable semantic signals used by GEO attacks.
What would settle it
A manipulation method that raises product rankings in an LLM system without increasing scores on any of the four dimensions would show the defense misses the attack.
Figures
read the original abstract
LLM-based ranking systems are vulnerable to Generative Engine Optimization (GEO) attacks, where adversaries inject semantic signals into product descriptions to artificially boost rankings. We propose SCI-Defense, a three-component defense framework combining Perplexity detection (PPL), Semantic Integrity Scoring (SIS), and Inter-Candidate Detection (ICD). SIS evaluates four manipulation dimensions: Authority Attribution (AA), Narrative Purposiveness (NP), Comparative Claims (CA), and Temporal Claims (TC). Evaluated on 600 Amazon product descriptions across 6 categories, SCI-Defense achieves Precision=1.000 and FPR=0.000, with Recall of 1.000, 0.952, and 0.830 against String, Reasoning, and Review attacks respectively. On 600 MS MARCO web passages, String attacks are blocked with perfect recall while Review attacks yield near-zero recall, as web passages lack the persuasion-oriented signals that SIS targets in product descriptions. We demonstrate that existing defenses -- PPL-only filters, SafetyClf content classifiers, and paraphrasing -- achieve zero recall against semantic manipulation attacks. We further demonstrate new attacks such as Specification Amplification and Use-Case Saturation can expose semantic relevance manipulation as a structural defense blind spot that suggests directions for future research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes SCI-Defense, a three-component defense (Perplexity detection (PPL), Semantic Integrity Scoring (SIS) on four manipulation dimensions—Authority Attribution, Narrative Purposiveness, Comparative Claims, Temporal Claims—and Inter-Candidate Detection (ICD)) against Generative Engine Optimization attacks that inject semantic signals into content to manipulate LLM-based rankings. Evaluated on 600 Amazon product descriptions, it reports Precision=1.000, FPR=0.000, and recalls of 1.000/0.952/0.830 against String/Reasoning/Review attacks; on 600 MS MARCO passages, String attacks are fully blocked but Review attacks yield near-zero recall. The work shows existing defenses (PPL-only, SafetyClf, paraphrasing) achieve zero recall and identifies new attacks (Specification Amplification, Use-Case Saturation) that expose structural blind spots.
Significance. If the results hold, the paper contributes by demonstrating concrete vulnerabilities in generative ranking systems and by showing that existing content filters fail against semantic manipulation. Explicitly surfacing new attack vectors and domain-specific limitations (persuasion signals in product text vs. web passages) provides a useful map for future defenses rather than claiming a complete solution.
major comments (3)
- [Abstract] Abstract: The headline metrics (Precision=1.000, FPR=0.000, Recall 1.000/0.952/0.830 on Amazon data) are reported without the scoring formulas for SIS across the four dimensions or implementation details for the PPL+SIS+ICD pipeline, which is load-bearing for assessing whether the central performance claims can be reproduced or generalized.
- [Abstract] Abstract: Near-zero recall on Review attacks for MS MARCO web passages (versus strong results on Amazon product descriptions) shows that SIS effectiveness depends on persuasion-oriented signals absent from general web text; this domain specificity directly limits the scope of the claim that SCI-Defense defends GEO attacks.
- [Abstract] Abstract: The explicit statement that Specification Amplification and Use-Case Saturation expose a structural blind spot for semantic relevance manipulation indicates that the four SIS dimensions may not capture primary signals used by all GEO tactics, undermining robustness claims even if the three-component pipeline is implemented as described.
minor comments (2)
- The evaluation would be strengthened by reporting statistical significance tests or confidence intervals alongside the precision/recall figures.
- Adding pseudocode or a detailed algorithmic description of how SIS aggregates the four dimensions would improve clarity and reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, indicating where we agree with the need for clarification or revision and where we provide additional context from the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline metrics (Precision=1.000, FPR=0.000, Recall 1.000/0.952/0.830 on Amazon data) are reported without the scoring formulas for SIS across the four dimensions or implementation details for the PPL+SIS+ICD pipeline, which is load-bearing for assessing whether the central performance claims can be reproduced or generalized.
Authors: We agree that the abstract's brevity omits the explicit scoring formulas for Semantic Integrity Scoring (SIS) on the four dimensions and the precise implementation details of the combined PPL+SIS+ICD pipeline. These formulas (e.g., weighted aggregation across Authority Attribution, Narrative Purposiveness, Comparative Claims, and Temporal Claims) and pipeline steps are fully specified in Sections 3.2 and 4.1 of the manuscript to support reproducibility. To improve standalone readability of the abstract, we will make a partial revision by adding one sentence briefly describing the SIS dimensions and noting that full formulas and pipeline details appear in the main text. revision: partial
-
Referee: [Abstract] Abstract: Near-zero recall on Review attacks for MS MARCO web passages (versus strong results on Amazon product descriptions) shows that SIS effectiveness depends on persuasion-oriented signals absent from general web text; this domain specificity directly limits the scope of the claim that SCI-Defense defends GEO attacks.
Authors: The referee correctly identifies this as a core finding rather than an oversight. The manuscript explicitly attributes the near-zero recall on Review attacks in MS MARCO to the lack of persuasion-oriented signals in general web passages, contrasting with their presence in Amazon product descriptions. We already frame this as evidence of domain specificity in the results and discussion sections. We will revise the abstract and conclusion to more prominently state that SCI-Defense's effectiveness is strongest in persuasion-rich domains such as product text and to qualify the scope of claims about defending GEO attacks more broadly. revision: yes
-
Referee: [Abstract] Abstract: The explicit statement that Specification Amplification and Use-Case Saturation expose a structural blind spot for semantic relevance manipulation indicates that the four SIS dimensions may not capture primary signals used by all GEO tactics, undermining robustness claims even if the three-component pipeline is implemented as described.
Authors: We acknowledge the referee's concern and note that the manuscript already presents Specification Amplification and Use-Case Saturation as exposing a structural blind spot in the current four SIS dimensions for certain semantic relevance manipulations. This is positioned as identifying an avenue for future research rather than a claim of comprehensive robustness against every possible GEO tactic. To prevent any overinterpretation, we will revise the discussion to more explicitly separate the demonstrated effectiveness against the three evaluated attack types from the acknowledged limitations against other semantic strategies, while reiterating the need for expanded dimensions. revision: yes
Circularity Check
No circularity: empirical evaluation on independent attack datasets
full rationale
The paper proposes SCI-Defense as a three-component pipeline (PPL + SIS + ICD) where SIS explicitly scores four hand-specified manipulation dimensions (Authority Attribution, Narrative Purposiveness, Comparative Claims, Temporal Claims). Performance metrics are obtained by direct evaluation on separately generated attack datasets (600 Amazon descriptions and 600 MS MARCO passages) rather than by any fitted parameter, self-referential equation, or self-citation that reduces the claimed result to the input by construction. The authors themselves note structural blind spots for unmodeled tactics such as Specification Amplification, confirming that the assessment is externally falsifiable and not tautological.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
SIS evaluates four manipulation dimensions: Authority Attribution (AA), Narrative Purposiveness (NP), Comparative Claims (CA), and Temporal Claims (TC). ... Sbase = λAA·SAA + λNP·SNP + λCA·SCA + λTC·STC
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose SCI-Defense, a three-component defense framework combining Perplexity detection (PPL), Semantic Integrity Scoring (SIS), and Inter-Candidate Detection (ICD).
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
GEO: Generative engine optimization
Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, and Ameet Deshpande. GEO: Generative engine optimization. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024
work page 2024
-
[2]
Detecting Language Model Attacks with Perplexity
Gabriel Alon and Michael Kamfonas. Detecting language model attacks with perplexity.arXiv preprint arXiv:2308.14132, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Core: Corpus-based ranking exploitation via llm manipulation.arXiv preprint arXiv:2602.03608, 2026
Anonymous. Core: Corpus-based ranking exploitation via llm manipulation.arXiv preprint arXiv:2602.03608, 2026
-
[4]
Adversarial examples are not easily detected: Bypassing ten detection methods
Nicholas Carlini, Milad Nasr, Christopher A Choquette-Choo, Matthew Jagielski, Irena Garg, Andreas Terzis, Florian Tramer, and Ludwig Schmidt. Are aligned neural networks adversarially aligned?arXiv preprint arXiv:2306.15447, 2023
-
[5]
HotFlip: White-box adversarial examples for text classification
Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. HotFlip: White-box adversarial examples for text classification. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018
work page 2018
-
[6]
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injections. InAISec Workshop at CCS, 2023
work page 2023
-
[7]
Large language models are zero-shot rankers for recommender systems
Yupeng Hou, Junjie Zhang, Zihan Lin, Hongyu Lu, Ruobing Xie, Julian McAuley, and Wayne Xin Zhao. Large language models are zero-shot rankers for recommender systems. In Proceedings of ECIR, 2024
work page 2024
-
[8]
Llama guard: Llm-based input-output safeguard for human-ai conversations
Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, et al. Llama guard: Llm-based input-output safeguard for human-ai conversations. 2023
work page 2023
-
[9]
Baseline defenses for adversarial attacks against aligned language models
Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping-yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, and Tom Goldstein. Baseline defenses for adversarial attacks against aligned language models. 2023
work page 2023
-
[10]
Haibo Jin, Leyang Hu, Xinnuo Li, Peiyan Zhang, Chonghan Chen, Jun Zhuang, and Haohan Wang. Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models.arXiv preprint arXiv:2407.01599, 2024
-
[11]
Haibo Jin, Andy Zhou, Joe D Menke, and Haohan Wang. Jailbreaking large language models against moderation guardrails via cipher characters.Advances in Neural Information Processing Systems, 37:59408–59435, 2024
work page 2024
-
[12]
A watermark for large language models.arXiv preprint arXiv:2301.10226, 2023a
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models.arXiv preprint arXiv:2301.10226, 2023
-
[13]
Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D Manning, and Chelsea Finn. DetectGPT: Zero-shot machine-generated text detection using probability curvature.arXiv preprint arXiv:2301.11305, 2023
-
[14]
Adversarial search engine optimization for large language models
Fredrik Nestaas, Edvard Hallström, and Samuele Mücke. Adversarial search engine optimization for large language models. InProceedings of the ACM Web Conference 2024, 2024
work page 2024
-
[15]
Ignore Previous Prompt: Attack Techniques For Language Models
Fábio Perez and Ian Ribeiro. Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[16]
Ranking manipulation for conversational search engines.arXiv preprint arXiv:2406.03589, 2024
Samuel Pfrommer, Yatong Cohen, Stefano Soatto, et al. Ranking manipulation for conversational search engines.arXiv preprint arXiv:2406.03589, 2024
-
[17]
Rankvicuna: Zero-shot listwise doc- ument reranking with open-source large language models
Ronak Pradeep, Sahel Sharifymoghaddam, and Jimmy Lin. Rankvicuna: Zero-shot listwise doc- ument reranking with open-source large language models. InarXiv preprint arXiv:2309.15088, 2023. 11
-
[18]
RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!
Ronak Pradeep, Sahel Sharifymoghaddam, and Jimmy Lin. RankZephyr: Effective and robust zero-shot listwise reranking is a breeze!arXiv preprint arXiv:2312.02724, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[19]
Large language models are effective text rankers with pairwise ranking prompting
Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Le Yan, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, and Michael Bendersky. Large language models are effective text rankers with pairwise ranking prompting. InFindings of the Association for Computational Linguistics: NAACL 2024, 2024
work page 2024
-
[20]
Language models are unsupervised multitask learners.OpenAI Blog, 1(8), 2019
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners.OpenAI Blog, 1(8), 2019
work page 2019
-
[21]
Is chatgpt good at search? investigating large language models as re-ranking agents
Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. Is chatgpt good at search? investigating large language models as re-ranking agents. InProceedings of EMNLP, 2023
work page 2023
-
[22]
Datasentinel: A game-theoretic detection of prompt injection attacks
Yupei Yu, Yuqi Liu, Jianfeng Gao, and Kai Chen. Datasentinel: A game-theoretic detection of prompt injection attacks. InProceedings of IEEE S&P, 2025
work page 2025
-
[23]
Zheng Zhang, Puhan Shi, Lixin Hu, Biao Qin, Yangqiu Li, Dawei Yin, and Ping Li. ShieldLM: Empowering llms as aligned, trustworthy and responsible language models.arXiv preprint arXiv:2402.04269, 2024
-
[24]
Poisoning retrieval corpora by injecting adversarial passages
Zexuan Zhong, Ziqing Huang, Alexander Wettig, and Danqi Chen. Poisoning retrieval corpora by injecting adversarial passages. InProceedings of EMNLP, 2023
work page 2023
-
[25]
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023. 12 A SCI-Defense Pseudocode Algorithm 2 provides the complete pseudocode for SCI-Defense, with all symbolic parameters defined. Concrete values for all thre...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[26]
Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.