SST-Guard: Detecting and Characterizing Server-Side Google Analytics in the Wild
Pith reviewed 2026-05-07 08:57 UTC · model grok-4.3
The pith
Server-side Google Analytics evades endpoint blockers but leaves detectable semantic traces in data flows
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SST-Guard detects server-side Google Analytics by using a value-template approach with regular expressions to match semantic value patterns such as identifiers and event metadata across network requests, cookies, and the window object, even when endpoints and payloads are customized. This yields over 93 percent accuracy overall, with network-request analysis reaching 99.8 percent, and identifies sGA on 4.02 percent of the Tranco top-10k sites and 4.21 percent of the top-150k sites.
What carries the argument
The value-template approach that employs regular expressions to match semantic value patterns across network requests, cookies, and the window object.
Load-bearing premise
Real-world server-side Google Analytics deployments necessarily expose the same semantic information such as identifiers and event metadata in detectable patterns within network requests, cookies, or the window object despite customization.
What would settle it
A server-side Google Analytics implementation that collects and forwards tracking data but uses formats that produce no matches against the semantic-value regex templates in any of the three modalities.
Figures
read the original abstract
As web browsers increasingly restrict client-side tracking, the web tracking ecosystem is shifting from client-side to server-side tracking (SST). In SST, the browser sends tracking requests to an intermediate endpoint, which then forwards them to the tracker's endpoint, eliminating direct client-to-tracker requests. As a result, existing tracking protections that block requests to known tracker endpoints are rendered ineffective. In this paper, we investigate server-side implementation of Google Analytics, the most widely deployed third-party tracking service on the web today. We also present SST-Guard, a multi-modal, browser-based system for detecting and blocking server-side Google Analytics (sGA). Our key insight is that even when the tracker's endpoints change, sGA must necessarily still collect and share the same semantic information as client-side Google Analytics (e.g., identifiers, event metadata). Therefore, rather than detecting requests to known Google Analytics endpoints, SST-Guard aims to detect underlying artifacts of collection and sharing of these semantic values to any arbitrary endpoint. Operationalizing this insight is challenging because real-world sGA deployments commonly customize endpoints and obfuscate URLs/payloads. SST-Guard addresses this challenge using a value-template approach that employs regular expressions to match semantic value patterns across multiple modalities: network requests, cookies, and the window object. We validate SST-Guard on Tranco top-10k websites, detecting 4.02\% (403) sGA domains with over 93\% accuracy across three modalities, with network request classifier demonstrating the highest accuracy (99.8\%). By deploying SST-Guard in the wild, we find 4.21\% (6,314) of Tranco top-150k websites using sGA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents SST-Guard, a multi-modal, browser-based detection system for server-side Google Analytics (sGA) that identifies semantic artifacts such as identifiers and event metadata using regex value templates across network requests, cookies, and the window object, rather than relying on known tracker endpoints. It validates the approach on Tranco top-10k sites by detecting sGA on 403 domains (4.02%) with over 93% accuracy (99.8% for network requests) and reports 4.21% (6,314 sites) prevalence when deployed on the top-150k.
Significance. If the detection holds, the work is significant for providing the first large-scale empirical measurement of server-side Google Analytics adoption and a practical tool to characterize tracking that evades client-side blockers. The multi-modal design and deployment at the scale of 150k sites represent a concrete contribution to web privacy research, with potential to inform browser protections.
major comments (2)
- [Abstract] Abstract: The reported accuracy (>93% overall, 99.8% network) and prevalence figures (4.02% on top-10k, 4.21% on top-150k) are presented without any description of ground-truth labeling, test-set construction, or error analysis, which is load-bearing for assessing whether the central detection claims are rigorously supported.
- [Abstract] Abstract (value-template approach): The key assumption that real-world sGA deployments will necessarily expose regex-matchable semantic values in at least one of the three modalities despite endpoint customization, payload obfuscation, or non-standard storage is not tested or bounded; without such analysis the prevalence numbers risk being optimistic upper bounds rather than reliable measurements.
minor comments (2)
- [Abstract] Abstract: The acronym sGA is introduced without an explicit expansion on first use; spelling out 'server-side Google Analytics' at the outset would improve readability.
- The manuscript would benefit from including the complete set of regex patterns and any exclusion rules applied during validation as supplementary material to support reproducibility of the empirical results.
Simulated Author's Rebuttal
Thank you for the detailed and constructive review of our manuscript. We appreciate the focus on ensuring the abstract provides sufficient context for the central claims. We address each major comment point by point below, with proposed revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The reported accuracy (>93% overall, 99.8% network) and prevalence figures (4.02% on top-10k, 4.21% on top-150k) are presented without any description of ground-truth labeling, test-set construction, or error analysis, which is load-bearing for assessing whether the central detection claims are rigorously supported.
Authors: We agree that the abstract would be strengthened by briefly indicating the validation approach. The full manuscript (Section 4) describes the ground-truth process: we sampled sites from the Tranco top-10k, performed manual multi-modal labeling (inspecting network traces for semantic values, cookie contents, and window object properties), and computed accuracy by comparing SST-Guard outputs against this labeled set, with network modality achieving 99.8% due to clearer request patterns. Error analysis includes false positive/negative cases discussed in the evaluation. We will revise the abstract to include a short clause such as 'validated via manual multi-modal labeling on a Tranco top-10k sample with >93% accuracy' to make the claims self-contained without altering the reported numbers. revision: yes
-
Referee: [Abstract] Abstract (value-template approach): The key assumption that real-world sGA deployments will necessarily expose regex-matchable semantic values in at least one of the three modalities despite endpoint customization, payload obfuscation, or non-standard storage is not tested or bounded; without such analysis the prevalence numbers risk being optimistic upper bounds rather than reliable measurements.
Authors: The value-template design targets invariant semantic artifacts (e.g., client IDs formatted as UUIDs, event metadata strings) that sGA must transmit to maintain Google Analytics functionality, even through proxies or customized endpoints. We empirically tested this by identifying and manually verifying detections on sites with non-standard endpoints and storage (detailed in Sections 3 and 5 with examples of obfuscated payloads still matching templates across modalities). The multi-modal design further reduces reliance on any single channel. That said, we acknowledge a formal quantitative bound on all possible obfuscation strategies is not provided, as exhaustive enumeration of evasion techniques lies outside the scope of this measurement-focused study. We will add a dedicated limitations paragraph in the revised manuscript discussing potential false negatives from advanced obfuscation and outlining directions for more robust semantic matching. revision: partial
Circularity Check
No significant circularity: empirical detection tool with external validation
full rationale
The paper presents SST-Guard as a practical, multi-modal detection system that matches semantic patterns (identifiers, event metadata) via regex templates in network requests, cookies, and the window object. No mathematical derivations, equations, fitted parameters, or predictions appear in the abstract or described methodology. Accuracy (93%+ overall, 99.8% for network) and prevalence (4.02% on top-10k, 4.21% on top-150k) are reported as direct empirical measurements against the Tranco list, not quantities forced by construction or self-referential definitions. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing premises. The central insight—that sGA must expose the same semantic artifacts—is an assumption about real-world behavior, not a definitional loop. The work is self-contained against external benchmarks (Tranco crawl + accuracy validation) and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
free parameters (1)
- regex patterns for semantic values
axioms (1)
- domain assumption sGA must necessarily still collect and share the same semantic information as client-side Google Analytics
Reference graph
Works this paper leans on
-
[1]
[n. d.]. About Adblock Plus — adblockplus.org. https://adblockplus.org/en/about. [Accessed 21-01-2025]
2025
-
[2]
[n. d.]. Brave Shields - Blocking Ads, Trackers & more | Brave - brave.com. https://brave.com/shields/. [Accessed 21-01-2025]
2025
-
[3]
[n. d.]. Enhanced Tracking Protection in Firefox for desktop | Firefox Help — support.mozilla.org. https://support.mozilla.org/en-US/kb/enhanced-tracking- protection-firefox-desktop. [Accessed 21-01-2025]
2025
-
[4]
EasyList
2025. EasyList. https://easylist.to/easylist/easylist.txt. [Accessed 21-01-2025]
2025
-
[5]
EasyPrivacy
2025. EasyPrivacy. https://easylist.to/easylist/easyprivacy.txt. [Accessed 21-01- 2025]
2025
-
[6]
A research-oriented top sites ranking hardened against manipulation - Tranco — tranco-list.eu
n.d.. A research-oriented top sites ranking hardened against manipulation - Tranco — tranco-list.eu. https://tranco-list.eu/. [Accessed 17-02-2026]
2026
-
[7]
adblock - Rust — docs.rs
n.d.. adblock - Rust — docs.rs. https://docs.rs/adblock/latest/adblock/. [Accessed 03-02-2026]
2026
-
[8]
All You Need to Know About Cookies | Cookiepedia — cookiepedia.co.uk
n.d.. All You Need to Know About Cookies | Cookiepedia — cookiepedia.co.uk. https://cookiepedia.co.uk. [Accessed 26-01-2026]
2026
-
[9]
AS to organizations mappings (AS2Org) — catalog.caida.org
n.d.. AS to organizations mappings (AS2Org) — catalog.caida.org. https://catalo g.caida.org/dataset/as_organizations. [Accessed 03-02-2026]
2026
-
[10]
Chrome DevTools Protocol — chromedevtools.github.io
n.d.. Chrome DevTools Protocol — chromedevtools.github.io. https://chromede vtools.github.io/devtools-protocol/tot/{N}etwork/#type-{I}nitiator. [Accessed 21-02-2026]
2026
-
[11]
CNAME Cloaking and Bounce Tracking Defense
n.d.. CNAME Cloaking and Bounce Tracking Defense. https://webkit.org/blog/ 11338/cname-cloaking-and-bounce-tracking-defense/. [Accessed 20-02-2026]
2026
-
[12]
Fast and reliable end-to-end testing for modern web apps | Playwright — playwright.dev
n.d.. Fast and reliable end-to-end testing for modern web apps | Playwright — playwright.dev. https://playwright.dev. [Accessed 01-01-2026]
2026
-
[13]
GitHub - cavi-au/Consent-O-Matic: Browser extension that automatically fills out cookie popups based on your preferences — github.com
n.d.. GitHub - cavi-au/Consent-O-Matic: Browser extension that automatically fills out cookie popups based on your preferences — github.com. https://github .com/cavi-au/Consent-O-Matic. [Accessed 01-01-2026]
2026
-
[14]
GitHub - fullstorydev/fullstory-data-layer-observer: Observe, transform, and send data layer content to FullStory — github.com
n.d.. GitHub - fullstorydev/fullstory-data-layer-observer: Observe, transform, and send data layer content to FullStory — github.com. https://github.com/fulls torydev/fullstory-data-layer-observer. [Accessed 04-02-2026]
2026
-
[15]
Google Analytics 4 Server-Side Reference | Freshpaint — documenta- tion.freshpaint.io
n.d.. Google Analytics 4 Server-Side Reference | Freshpaint — documenta- tion.freshpaint.io. https://documentation.freshpaint.io/integrations/destination s/apps/google-analytics-4/google-analytics-4-server-side-reference. [Accessed 23-02-2026]
2026
-
[16]
Google Tag Assistant — tagassistant.google.com
n.d.. Google Tag Assistant — tagassistant.google.com. https://tagassistant.googl e.com. [Accessed 31-01-2026]
2026
-
[17]
How to avoid GA4 and GTM blocking by ad blockers | Stape — stape.io
n.d.. How to avoid GA4 and GTM blocking by ad blockers | Stape — stape.io. https: //stape.io/blog/avoiding-google-tag-manager-ga4-blocking-by-adblockers. [Ac- cessed 04-02-2026]
2026
-
[18]
How to fix duplicate events in GA4 - Optimize Smart — optimizesmart.com
n.d.. How to fix duplicate events in GA4 - Optimize Smart — optimizesmart.com. https://www.optimizesmart.com/how-to-fix-duplicate-events-in-ga4/. [Ac- cessed 03-02-2026]
2026
-
[19]
n.d.. Integrate Conversions API with Server-side Google Tag Manager in Events Manager | Meta Business Help Center — facebook.com. https://www.facebook.c om/business/help/702509907046774. [Accessed 03-02-2026]
-
[20]
Introduction to tagging and the Google tag | Tag Platform | Google for Developers — developers.google.com
n.d.. Introduction to tagging and the Google tag | Tag Platform | Google for Developers — developers.google.com. https://developers.google.com/tag-platfo rm/devguides. [Accessed 30-01-2026]
2026
-
[21]
IP to ASN Mapping Service | Team Cymru — team-cymru.com
n.d.. IP to ASN Mapping Service | Team Cymru — team-cymru.com. https: //www.team-cymru.com/ip-asn-mapping. [Accessed 03-02-2026]
2026
-
[22]
List of All Features in Matomo Analytics — matomo.org
n.d.. List of All Features in Matomo Analytics — matomo.org. https://matomo.o rg/features/. [Accessed 04-02-2026]
2026
-
[23]
Load the web GTM container from your subdomain and modify the request path — stape.io
n.d.. Load the web GTM container from your subdomain and modify the request path — stape.io. https://stape.io/solutions/custom-gtm-loader. [Accessed 04-02- 2026]
2026
-
[24]
Netflix Ads Suite Expands Capabilities - About Netflix — about.netflix.com
n.d.. Netflix Ads Suite Expands Capabilities - About Netflix — about.netflix.com. https://about.netflix.com/en/news/netflix-ads-suite-expands-capabilities. [Ac- cessed 28-03-2026]
2026
-
[25]
Tech Lab Trusted Server — iabtechlab.com
n.d.. Tech Lab Trusted Server — iabtechlab.com. https://iabtechlab.com/tech- lab-trusted-server/. [Accessed 31-01-2026]
2026
-
[26]
Simo Ahava. n.d.. Server-side Tagging In Google Tag Manager — simoa- hava.com. https://www.simoahava.com/analytics/server-side-tagging-google- tag-manager/. [Accessed 31-01-2026]
2026
-
[27]
Nardjes Amieur, Walter Rudametkin, Oana Goga, et al. 2024. Client-side and Server-side Tracking on Meta: Effectiveness and Accuracy. In24th Privacy En- hancing Technologies Symposium (PETS 2024), Vol. 2024. 431–445
2024
-
[28]
Apple. [n. d.]. Safari Privacy Webpage. https://www.apple.com/safari/privacy. [Accessed 21-01-2025]
2025
-
[29]
Paschalis Bekos, Panagiotis Papadopoulos, Nicolas Kourtellis, and Michalis Poly- chronakis. 2025. PIIxel Leaks: Passive Identification of Personally Identifiable Information Leakage through Meta Pixel. InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security. 4229–4243
2025
-
[30]
Paschalis Bekos, Panagiotis Papadopoulos, Evangelos P Markatos, and Nicolas Kourtellis. 2023. The Hitchhiker’s guide to facebook web tracking with invisible pixels and click IDs. InProceedings of the ACM Web Conference 2023. 2132–2143
2023
-
[31]
Ahmed Bouhoula, Karel Kubicek, Amit Zac, Carlos Cotrini, and David Basin. 2024. Automated Large-Scale Analysis of Cookie Notice Compliance. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA, 1723–1739. https://www.usenix.org/conference/usenixsecurity24/presentation/ bouhoula
2024
-
[32]
Brave. [n. d.]. GitHub - brave/adblock-lists: Maintains adblock lists that Brave uses — github.com. https://github.com/brave/adblock-lists/. [Accessed 21-01- 2025]
2025
-
[33]
Quan Chen, Panagiotis Ilia, Michalis Polychronakis, and Alexandros Kapravelos
-
[34]
In Proceedings of the Web Conference 2021
Cookie swap party: Abusing first-party cookies for web tracking. In Proceedings of the Web Conference 2021. 2117–2129
2021
-
[35]
Ha Dao, Johan Mazel, and Kensuke Fukuda. 2021. CNAME cloaking-based track- ing on the web: Characterization, detection, and protection.IEEE Transactions on Network and Service Management18, 3 (2021), 3873–3888
2021
- [36]
-
[37]
Yana Dimova, Gunes Acar, Lukasz Olejnik, Wouter Joosen, and Tom Van Goethem
-
[38]
The CNAME of the Game: Large-scale Analysis of DNS-based Tracking Evasion.Proceedings on Privacy Enhancing Technologies2021, 3 (April 2021), 394–412. doi:10.2478/popets-2021-0053
-
[39]
Federal Trade Commission. 2021. In the Matter of Flo Health, Inc. https://www. ftc.gov/system/files/documents/cases/192_3133_flo_health_complaint.pdf. [Ac- cessed 22-02-2026]
2021
- [40]
-
[41]
Imane Fouad, Nataliia Bielova, Arnaud Legout, and Natasa Sarafijanovic-Djukic
-
[42]
arXiv:1812.01514 [cs.CR] https://arxiv.org/abs/1812.01514
Missed by Filter Lists: Detecting Unknown Third-Party Trackers with Invisible Pixels. arXiv:1812.01514 [cs.CR] https://arxiv.org/abs/1812.01514
-
[43]
Imane Fouad, Cristiana Santos, and Pierre Laperdrix. 2024. The Devil is in the Details: Detection, Measurement and Lawfulness of Server-Side Tracking on the Web. In24th Privacy Enhancing Technologies Symposium (PETS 2024), Vol. 2024
2024
-
[44]
Asmaa El fraihi, Nardjes Amieur, Walter Rudametkin, and Oana Goga. 2024. Client-side and Server-side Tracking on Meta: Effectiveness and Accuracy. In Proceedings on Privacy Enhancing Technologies, Vol. 2024. 431–445. doi:10.56553 /popets-2024-0086
2024
- [45]
-
[46]
Google for Developers. 2024. Why and when to use server-side tag- ging? https://developers.google.com/tag-platform/learn/sst-fundamentals/3- why-and-when-sst. Accessed: 2025-12-17
2024
-
[47]
Ralf Gundelach and Dominik Herrmann. 2023. Cookiescanner: An Automated Tool for Detecting and Evaluating GDPR Consent Notices on Websites. InProceed- ings of the 18th International Conference on A vailability, Reliability and Security (ARES 2023). ACM, 1–8. doi:10.1145/3600160.3605000
- [48]
-
[49]
Muhammad Jazlan, Muhammad Abu Bakar Aziz, and Barry Pollard. 2025.Third Parties. HTTP Archive, Chapter 3. doi:10.5281/zenodo.18246420
-
[50]
jonmeyers. n.d.. Conversions API (CAPI) Guide - Microsoft Advertising API — learn.microsoft.com. https://learn.microsoft.com/en-us/advertising/guides/uet- conversion-api-integration. [Accessed 28-03-2026]
2026
-
[51]
Class Members Katherine Wilson. [n. d.]. Katherine Wilson v. Google LLC. https://cdn.arstechnica.net/wp-content/uploads/2024/05/Wilson-v-Google- Complaint-5-24-2024.pdf. [Accessed 21-02-2026]
2024
- [52]
-
[53]
Gilles Mertens, Nataliia Bielova, Vincent Roca, Ahmed Bouhoula, and Marouanne Akassab. 2026. Detecting and Measuring Client-and Server-Side Google Tag Manager and its Tags in 80K Websites. (2026)
2026
-
[54]
Gilles Mertens, Nataliia Bielova, Vincent Roca, and Cristiana Santos. 2025. You Can’t Trust Your Tag Neither: Privacy Leaks and Potential Legal Violations within the Google Tag Manager. InEuroS&P 2025-10th IEEE European Symposium on Security and Privacy
2025
- [55]
-
[56]
Zahra Moti, Kimberley Frings, Christine Utz, Frederik Zuiderveen Borgesius, and Gunes Acar. 2025. The Bitter Pill: Tracking and Remarketing on EU Phar- macy Websites.Data Privacy Management. https://gunesacar. net/assets/bitter- pill-pharmacy-privacy-dpm-25. pdf(2025)
2025
-
[57]
Shaoor Munir, Patrick Lee, Umar Iqbal, Zubair Shafiq, and Sandra Siby. 2024. PURL: Safe and Effective Sanitization of Link Decoration. In33rd USENIX Security Jazlan et al. Symposium (USENIX Security 24). 4103–4120
2024
-
[58]
Shaoor Munir, Sandra Siby, Umar Iqbal, Steven Englehardt, Zubair Shafiq, and Carmela Troncoso. 2023. Cookiegraph: Understanding and detecting first-party tracking cookies. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. 3490–3504
2023
-
[59]
Pouneh Nikkhah Bahrami, Aurore Fass, and Zubair Shafiq. 2025. CookieGuard: Characterizing and Isolating the First-Party Cookie Jar. InProceedings of the 2025 ACM Internet Measurement Conference. 645–661
2025
-
[60]
Midas Nouwens, Ilaria Liccardi, Michael Veale, David Karger, and Lalana Kagal
-
[61]
Dark Patterns after the GDPR: Scraping Consent Pop-ups and Demonstrat- ing their Influence. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3313831.3376321
-
[62]
2024.IAB Internet Advertising Revenue Report: Full Year 2024 Results
PwC and IAB. 2024.IAB Internet Advertising Revenue Report: Full Year 2024 Results. Technical Report. Interactive Advertising Bureau. https://www.iab.com/wp-content/uploads/2025/04/IAB_PwC-Internet-Ad- Revenue-Report-Full-Year-2024.pdf
2024
-
[63]
Reddit for Business. 2024. Conversions API. https://business.reddithelp.com/s/ article/Conversions-API. Accessed: 2025-12-17
2024
-
[64]
Sandra Siby, Umar Iqbal, Steven Englehardt, Zubair Shafiq, and Carmela Troncoso
-
[65]
In31st USENIX security symposium (USENIX Security 22)
WebGraph: Capturing advertising and tracking information flows for robust blocking. In31st USENIX security symposium (USENIX Security 22). 2875–2892
-
[66]
Alexander Sjösten, Peter Snyder, Antonio Pastor, Panagiotis Papadopoulos, and Benjamin Livshits. 2020. Filter list generation for underserved regions. InPro- ceedings of The Web Conference 2020. 1682–1692
2020
-
[67]
Snap for Developers. 2024. Conversions API. https://developers.snap.com/api/ marketing-api/Conversions-API/Introduction. Accessed: 2025-12-17
2024
-
[68]
Jessica Taylor. n.d.. Everything You Need To Know About Facebook Conversions API. https://portent.com/blog/paid-social/everything-you-need-to-know-about- facebook-conversions-api.html. [Accessed 30-01-2026]
2026
-
[69]
TikTok for Business. 2025. About Events API. https://ads.tiktok.com/help/articl e/events-api. Accessed: 2025-12-17
2025
-
[70]
uBlock. [n. d.]. GitHub - gorhill/uBlock: uBlock Origin - An efficient blocker for Chromium and Firefox. Fast and lean. — github.com. https://github.com/gorhill /uBlock. [Accessed 21-01-2025]
2025
-
[71]
Yash Vekaria, Yohan Beugin, Shaoor Munir, Gunes Acar, Nataliia Bielova, Steven Englehardt, Umar Iqbal, Alexandros Kapravelos, Pierre Laperdrix, Nick Niki- forakis, et al. 2025. SoK: Advances and Open Problems in Web Tracking.arXiv preprint arXiv:2506.14057(2025). A Ethical Considerations In this section, we discuss ethical considerations taken into accoun...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.