LLM-Assisted Web Measurements

Lorenzo Cazzaro; Simone Bozzolan; Stefano Calzavara

arxiv: 2510.08101 · v3 · submitted 2025-10-09 · 💻 cs.CR

LLM-Assisted Web Measurements

Simone Bozzolan , Stefano Calzavara , Lorenzo Cazzaro This is my paper

Pith reviewed 2026-05-18 09:23 UTC · model grok-4.3

classification 💻 cs.CR

keywords large language modelsweb measurementswebsite classificationsecurity measurementsprivacy analysisTranco listtargeted studies

0 comments

The pith

Large language models can classify websites from lists like Tranco to support targeted security and privacy measurements at scale.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates the use of LLMs to add semantic labels to popular but unlabeled website lists, which currently force researchers into ad-hoc choices that limit targeted studies. It evaluates several models on classification tasks drawn from prior web measurement work and finds strong results overall, with clear differences based on which model and settings are chosen. A two-step process is introduced to manage the accuracy-efficiency trade-off when processing large lists. The authors then run example security and privacy measurements with this method and check that the derived conclusions match those from earlier research. This removes a practical barrier to studying specific categories of sites without manual bias.

Core claim

LLMs achieve strong performance across multiple website classification scenarios relevant to security and privacy research, though model choice and configuration significantly affect both accuracy and computational cost. A practical two-step methodology enables scalable targeted web measurements starting from the Tranco list. When this methodology is applied to studies inspired by prior work, the resulting research inferences remain consistent with earlier findings.

What carries the argument

Two-step LLM classification pipeline that narrows candidates from the Tranco list then assigns category labels for security or privacy analysis.

If this is right

Targeted measurements of specific website categories become feasible without ad-hoc selection rules.
Researchers can trade classification accuracy against compute cost by choosing different models or step thresholds.
Prior targeted studies can be replicated or extended more systematically from the same starting list.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pipeline could be applied to other unlabeled web lists or snapshots to enable category-specific analysis in new domains.
Periodic re-classification of evolving sites might keep measurements current without rebuilding the entire dataset each time.
Integration with lighter-weight filters before the LLM step could further reduce cost for very large lists.

Load-bearing premise

The hand-curated test datasets reflect the actual distribution of sites that researchers encounter in security and privacy studies, and LLM outputs stay reliable when the method is applied to millions of sites.

What would settle it

A large-scale manual audit of LLM labels on a fresh random sample drawn from the current Tranco list would show whether classification accuracy holds outside the original evaluation sets.

Figures

Figures reproduced from arXiv: 2510.08101 by Lorenzo Cazzaro, Simone Bozzolan, Stefano Calzavara.

**Figure 2.** Figure 2: Percentage of websites with third-party trackers by [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Percentage of websites with minimal scope by cate [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

read the original abstract

Web measurements are a well-established methodology for assessing the security and privacy landscape of the Internet. However, existing top lists of popular websites are unlabeled and lack semantic information about the nature of the included websites, making targeted web measurements challenging, as researchers often rely on ad-hoc techniques to bias datasets toward specific website classes of interest. In this paper, we investigate the use of Large Language Models (LLMs) to enable targeted web measurement studies. Building on prior literature, we identify key website classification tasks relevant to web measurements and highlight limitations in state-of-the-art classification approaches. We construct carefully curated datasets to evaluate different LLMs on these tasks. Our results show that LLMs can achieve strong performance across multiple classification scenarios, but the choice of model and configuration plays a significant role. Motivated by the observed trade-off between classification accuracy and computational efficiency, we propose a practical two-step methodology for scalable targeted web measurements starting from the Tranco list. Finally, we conduct LLM-assisted web measurement studies inspired by prior work using our methodology and assess the validity of the resulting research inferences, showing that LLMs can effectively enable targeted measurements of security and privacy trends on the Web.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLMs look workable for classifying Tranco sites into security-relevant categories with a two-step filter, but the evaluation leaves the representativeness and error-bias questions open.

read the letter

The paper's main point is straightforward: LLMs can label sites from the Tranco list for common web measurement categories like e-commerce or adult content, and a two-step process lets researchers trade some accuracy for lower cost when scaling up. They build curated datasets for several classification tasks, test a few models, and then run example measurement studies to check whether the resulting inferences look reasonable compared to prior work. That two-step method and the downstream validity checks are the concrete additions here. The authors also document the accuracy-efficiency trade-off explicitly, which is useful for anyone who has tried to label thousands of sites by hand or with brittle heuristics. The work sits in a clear line from earlier ad-hoc filtering papers and treats the LLM step as an engineering improvement rather than a theoretical leap. On the soft spots, the curated evaluation sets are small and hand-selected, so it is still unclear how well they match the label distribution or difficulty profile of the full Tranco list or the subsets researchers actually care about. The paper reports strong performance numbers and some validity checks on the final inferences, but it does not quantify how label noise or distribution shift would move the measured trends once the classifier runs at scale. That gap is real but not fatal; it is the sort of thing a revision can tighten with larger or more diverse test sets and a simple sensitivity analysis. This is aimed at the web security and privacy measurement crowd who already work with top lists and need reproducible ways to focus on particular site types. A reader who runs their own crawls would get immediate practical value from the method and the model-configuration notes. The paper shows honest engagement with the literature and the practical constraints, so it is worth sending to referees who can press on the dataset and error-propagation details.

Referee Report

1 major / 2 minor

Summary. The paper explores using LLMs to classify websites for targeted security and privacy web measurements. It identifies relevant classification tasks, builds curated evaluation datasets, reports strong LLM performance that varies by model and prompt configuration, proposes a two-step methodology to scale from the Tranco list while balancing accuracy and efficiency, and validates the approach through example measurement studies with an assessment of inference validity.

Significance. If the empirical results and validity claims hold under distribution shift, the work would provide a practical, scalable alternative to ad-hoc website selection in web measurement studies, potentially improving reproducibility and focus in security and privacy research. The two-step methodology and explicit validity checks are constructive contributions.

major comments (1)

Validity assessment section: the paper reports strong performance on curated datasets and performs some validity checks on the resulting inferences, but does not quantify how label noise or distribution shift from the evaluation sets to the full Tranco distribution would affect downstream trend measurements (e.g., no sensitivity analysis or error-propagation bounds on reported security/privacy statistics). This is load-bearing for the central claim that LLM-assisted measurements support valid research inferences at scale.

minor comments (2)

Abstract and results sections: quantitative performance numbers, error bars, dataset sizes, and exact evaluation metrics are referenced but not summarized with sufficient detail for readers to assess the 'strong performance' claim without reading the full evaluation tables.
Methodology section: the two-step procedure is described at a high level; clarifying the exact filtering thresholds and fallback rules would improve reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the major comment below and describe the revisions we will make to strengthen the validity assessment.

read point-by-point responses

Referee: Validity assessment section: the paper reports strong performance on curated datasets and performs some validity checks on the resulting inferences, but does not quantify how label noise or distribution shift from the evaluation sets to the full Tranco distribution would affect downstream trend measurements (e.g., no sensitivity analysis or error-propagation bounds on reported security/privacy statistics). This is load-bearing for the central claim that LLM-assisted measurements support valid research inferences at scale.

Authors: We agree that a quantitative treatment of label noise and distribution shift is important for supporting claims about valid inferences at scale. The manuscript currently includes validity checks via example measurement studies that compare LLM-derived trends against prior literature and external benchmarks. To directly address the referee's concern, we will add a sensitivity analysis subsection. This will simulate classification error rates drawn from our evaluation results, model distribution shifts from the curated sets to Tranco, and report bounds on the impact to downstream security and privacy statistics. We believe this revision will make the central claim more robust. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical LLM evaluation on curated tasks

full rationale

The paper is a standard empirical evaluation of off-the-shelf LLMs on website classification tasks relevant to security and privacy measurements. It constructs new curated datasets, measures accuracy and efficiency trade-offs across models and prompts, proposes a practical two-step filtering methodology motivated by those measurements, and then applies the approach to produce example studies while checking validity. No equations, fitted parameters, or derived predictions appear anywhere in the work. There are no self-definitional loops, no renaming of known results as novel derivations, and no load-bearing self-citations that substitute for independent justification. The central claims rest on direct experimental results and external validity checks rather than reducing to the inputs by construction. This is the expected non-circular outcome for an applied measurement paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that LLMs can reliably perform the identified classification tasks when given appropriate prompts and that the evaluation datasets capture the relevant distribution of websites.

free parameters (1)

LLM model and prompt configuration
Performance varies significantly with choice of model and configuration, which must be selected for each task.

axioms (1)

domain assumption LLM outputs on website text can be treated as accurate labels for security and privacy research purposes
Invoked when claiming that the resulting measurements support valid research inferences.

pith-pipeline@v0.9.0 · 5730 in / 1372 out tokens · 37880 ms · 2026-05-18T09:23:07.385162+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Fighting AI with AI: AI-Agent Augmented DNS Blocking of LLM Services during Student Evaluations
cs.NI 2026-03 unverdicted novelty 6.0

AI-Sinkhole uses AI classification with quantized LLMs and Pi-Hole DNS blocking to dynamically prevent access to LLM services during student evaluations, reporting F1 scores above 0.83.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · cited by 1 Pith paper

[1]

https://tranco-list.eu/list/4QY5X [Accessed 23-September-2025]

2025.Tranco list 06 April 2025. https://tranco-list.eu/list/4QY5X [Accessed 23-September-2025]

work page 2025
[2]

Gunes Acar, Christian Eubank, Steven Englehardt, Marc Juarez, Arvind Narayanan, and Claudia Díaz. 2014. The Web Never Forgets: Persistent Track- ing Mechanisms in the Wild. InProceedings of the 2014 ACM SIGSAC Confer- ence on Computer and Communications Security, Scottsdale, AZ, USA, Novem- ber 3-7, 2014, Gail-Joon Ahn, Moti Yung, and Ninghui Li (Eds.). A...

work page doi:10.1145/2660267.2660347 2014
[3]

Syed Suleman Ahmad, Muhammad Daniyal Dar, Muhammad Fareed Zaffar, Narseo Vallina-Rodriguez, and Rishab Nithyanand. 2020. Apophanies or Epipha- nies? How Crawlers Impact Our Understanding of the Web. InWWW ’20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, Yennun Huang, Ir- win King, Tie-Yan Liu, and Maarten van Steen (Eds.). ACM / IW3C2, 271...

work page doi:10.1145/3366423.3380113 2020
[4]

2025.Llama4

AI@Meta. 2025.Llama4. https://github.com/meta-llama/llama-models/blob/ main/models/llama4/MODEL_CARD.md [Accessed 23-September-2025]

work page 2025
[5]

Suood Abdulaziz Al-Roomi and Frank Li. 2023. A Large-Scale Measurement of Website Login Policies. In32nd USENIX Security Symposium, USENIX Security 2023, Anaheim, CA, USA, August 9-11, 2023, Joseph A. Calandrino and Carmela Troncoso (Eds.). USENIX Association, 2061–2078. https://www.usenix.org/ conference/usenixsecurity23/presentation/al-roomi

work page 2023
[6]

Suood Alroomi and Frank Li. 2023. Measuring Website Password Creation Policies At Scale. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS 2023, Copenhagen, Denmark, November 26-30, 2023, Weizhi Meng, Christian Damsgaard Jensen, Cas Cremers, and Engin Kirda (Eds.). ACM, 3108–3122. doi:10.1145/3576915.3623156

work page doi:10.1145/3576915.3623156 2023
[7]

Calvin Ardi and Matt Calder. 2023. The Prevalence of Single Sign-On on the Web: Towards the Next Generation of Web Content Measurement. InProceedings of the 2023 ACM on Internet Measurement Conference, IMC 2023, Montreal, QC, Canada, October 24-26, 2023, Marie-José Montpetit, Aris Leivadeas, Steve Uhlig, and Mobin Javed (Eds.). ACM, 124–130. doi:10.1145/3...

work page doi:10.1145/3618257.3624841 2023
[8]

Artifacts

Artifacts 2025. Artifacts. https://anonymous.4open.science/r/LLM-Assisted- Web-Measurements-Artifacts-2B0F/. Repository containing all of the artifacts related to this paper

work page 2025
[9]

Marzieh Bitaab, Haehyun Cho, Adam Oest, Zhuoer Lyu, Wei Wang, Jorij Abraham, Ruoyu Wang, Tiffany Bao, Yan Shoshitaishvili, and Adam Doupé. 2023. Beyond Phish: Toward Detecting Fraudulent e-Commerce Websites at Scale. In44th IEEE Symposium on Security and Privacy, SP 2023, San Francisco, CA, USA, May 21-25,

work page 2023
[10]

alien traces

IEEE, 2566–2583. doi:10.1109/SP46215.2023.10179461

work page doi:10.1109/sp46215.2023.10179461 2023
[11]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

work page 2020
[12]

Stefano Calzavara, Alvise Rabitti, and Michele Bugliesi. 2018. Semantics-Based Analysis of Content Security Policy Deployment.ACM Trans. Web12, 2 (2018), 10:1–10:36. doi:10.1145/3149408

work page doi:10.1145/3149408 2018
[13]

Stefano Calzavara, Sebastian Roth, Alvise Rabitti, Michael Backes, and Ben Stock

work page
[14]

In29th USENIX Security Symposium, USENIX Security 2020, August 12-14, 2020, Srdjan Capkun and Franziska Roesner (Eds.)

A Tale of Two Headers: A Formal Analysis of Inconsistent Click-Jacking Protection on the Web. In29th USENIX Security Symposium, USENIX Security 2020, August 12-14, 2020, Srdjan Capkun and Franziska Roesner (Eds.). USENIX Association, 683–697. https://www.usenix.org/conference/usenixsecurity20/ presentation/calzavara

work page 2020
[15]

Jianjun Chen, Jian Jiang, Hai-Xin Duan, Tao Wan, Shuo Chen, Vern Paxson, and Min Yang. 2018. We Still Don’t Have Secure Cross-Domain Requests: an Empirical Study of CORS. In27th USENIX Security Symposium, USENIX Security 2018, Baltimore, MD, USA, August 15-17, 2018, William Enck and Adrienne Porter Felt (Eds.). USENIX Association, 1079–1093. https://www.u...

work page 2018
[16]

2020–.Cloudflare Radar Domain Categorization

Cloudflare, Inc. 2020–.Cloudflare Radar Domain Categorization. https://radar. cloudflare.com/domains

work page 2020
[17]

2025.Gemini 2.5 Flash

Google Deepmind. 2025.Gemini 2.5 Flash. https://deepmind.google/models/ gemini/flash/ [Accessed 23-September-2025]

work page 2025
[18]

Google Deepmind. 2025. Gemma3. https://ai.google.dev/gemma/docs/core?hl=it [Accessed 23-September-2025]

work page 2025
[19]

Martin Degeling, Christine Utz, Christopher Lentzsch, Henry Hosseini, Flo- rian Schaub, and Thorsten Holz. 2019. We Value Your Privacy ... Now Take Some Cookies: Measuring the GDPR’s Impact on Web Privacy. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet Soci- ety. ...

work page 2019
[20]

Yana Dimova, Tom van Goethem, and Wouter Joosen. 2023. Everybody’s Looking for SSOmething: A large-scale evaluation on the privacy of OAuth authentication on the web.Proc. Priv. Enhancing Technol.2023, 4 (2023), 452–467. doi:10.56553/ POPETS-2023-0119

work page 2023
[21]

2025.Disconnect – Tracker Protection Services List

Disconnect, Inc. 2025.Disconnect – Tracker Protection Services List. https: //disconnect.me/trackerprotection Open-source list of domains used to identify trackers; used in many privacy and web measurement research projects

work page 2025
[22]

1998–2017.DMOZ – The Open Directory Project

DMOZ Contributors. 1998–2017.DMOZ – The Open Directory Project. https: //dmoz-odp.org/

work page 1998
[23]

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui

work page
[24]

A Survey on In-context Learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Miami, FL, USA, November 12-16, 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, 1107–1128. doi:10.18653/V1/ 2024.EMNLP-MAIN.64

work page doi:10.18653/v1/ 2024
[25]

Kostas Drakonakis, Sotiris Ioannidis, and Jason Polakis. 2020. The Cookie Hunter: Automated Black-box Auditing for Web Authentication and Authorization Flaws. InCCS ’20: 2020 ACM SIGSAC Conference on Computer and Communications Secu- rity, Virtual Event, USA, November 9-13, 2020, Jay Ligatti, Xinming Ou, Jonathan Katz, and Giovanni Vigna (Eds.). ACM, 1953...

work page doi:10.1145/3372297.3417869 2020
[26]

Steven Englehardt and Arvind Narayanan. 2016. Online Tracking: A 1-million-site Measurement and Analysis. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, Edgar R. Weippl, Stefan Katzenbeisser, Christopher Kruegel, Andrew C. Myers, and Shai Halevi (Eds.). ACM, 1388–1401. doi:10....

work page doi:10.1145/2976749.2978313 2016
[27]

2025.Google Developer API

Google. 2025.Google Developer API. https://ai.google.dev/ [Accessed 23- September-2025]

work page 2025
[28]

Matthias Gotze, Srdjan Matic, Costas Iordanou, Georgios Smaragdakis, and Niko- laos Laoutaris. 2022. Measuring Web Cookies in Governmental Websites. In WebSci ’22: 14th ACM Web Science Conference 2022, Barcelona, Spain, June 26 - 29,

work page 2022
[29]

doi:10.1145/3501247.3531545

ACM, 44–54. doi:10.1145/3501247.3531545

work page doi:10.1145/3501247.3531545
[30]

2025.Ollama: Run, create, and share large language models locally

Ollama Inc. 2025.Ollama: Run, create, and share large language models locally. https://ollama.com Accessed: 2025-08-26

work page 2025
[31]

Louis Jannett, Christian Mainka, Maximilian Westers, Andreas Mayer, Tobias Wich, and Vladislav Mladenov. 2024. SoK: SSO-MONITOR - The Current State and Future Research Directions in Single Sign-on Security Measurements. In9th IEEE European Symposium on Security and Privacy, EuroS&P 2024, Vienna, Austria, July 8-12, 2024. IEEE, 173–192. doi:10.1109/EUROSP6...

work page doi:10.1109/eurosp60621.2024.00018 2024
[32]

Kranch and Joseph Bonneau

Michael J. Kranch and Joseph Bonneau. 2015. Upgrading HTTPS in mid- air: An empirical study of strict transport security and key pinning. In 22nd Annual Network and Distributed System Security Symposium, NDSS 2015, San Diego, California, USA, February 8-11, 2015. The Internet Soci- ety. https://www.ndss-symposium.org/ndss2015/upgrading-https-mid-air- empi...

work page 2015
[33]

Pierre Laperdrix, Nataliia Bielova, Benoit Baudry, and Gildas Avoine. 2020. Browser Fingerprinting: A Survey.ACM Trans. Web14, 2 (2020), 8:1–8:33. doi:10.1145/3386040

work page doi:10.1145/3386040 2020
[34]

Sebastian Lekies, Ben Stock, and Martin Johns. 2013. 25 million flows later: large-scale detection of DOM-based XSS. In2013 ACM SIGSAC Conference on Computer and Communications Security, CCS’13, Berlin, Germany, November 4-8, 2013, Ahmad-Reza Sadeghi, Virgil D. Gligor, and Moti Yung (Eds.). ACM, 1193–

work page 2013
[35]

doi:10.1145/2508859.2516703

work page doi:10.1145/2508859.2516703
[36]

Yuejia Liang, Jianjun Chen, Run Guo, Kaiwen Shen, Hui Jiang, Man Hou, Yue Yu, and Haixin Duan. 2024. Internet’s Invisible Enemy: Detecting and Measuring Web Cache Poisoning in the Wild. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, CCS 2024, Salt Lake City, UT, USA, October 14-18, 2024, Bo Luo, Xiaojing Liao, ...

work page doi:10.1145/3658644.3690361 2024
[37]

Yun Lin, Ruofan Liu, Dinil Mon Divakaran, Jun Yang Ng, Qing Zhou Chan, Yiwen Lu, Yuxuan Si, Fan Zhang, and Jin Song Dong. 2021. Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages. In30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, Michael D. Bailey and Rachel Greenstadt (Eds.). USENIX Associat...

work page 2021
[38]

Sylvain Lugeon, Tiziano Piccardi, and Robert West. 2022. Homepage2Vec: Language-Agnostic Website Embedding and Classification. InProceedings of the Sixteenth International AAAI Conference on Web and Social Media, ICWSM 2022, Atlanta, Georgia, USA, June 6-9, 2022, Ceren Budak, Meeyoung Cha, and Conference’17, July 2017, Washington, DC, USA Simone Bozzolan,...

work page 2022
[39]

2006–.McAfee SiteAdvisor

McAfee Corp. 2006–.McAfee SiteAdvisor. https://sitelookup.mcafee.com/

work page 2006
[40]

Abner Mendoza, Phakpoom Chinprutthiwong, and Guofei Gu. 2018. Uncovering HTTP Header Inconsistencies and the Impact on Desktop/Mobile Websites. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018, Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM, 247...

work page arXiv 2018
[41]

2025.Playwright

Microsoft. 2025.Playwright. https://playwright.dev [Accessed 23-September- 2025]

work page 2025
[42]

Shaoor Munir, Sandra Siby, Umar Iqbal, Steven Englehardt, Zubair Shafiq, and Carmela Troncoso. 2023. CookieGraph: Understanding and Detecting First-Party Tracking Cookies. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS 2023, Copenhagen, Denmark, November 26-30, 2023, Weizhi Meng, Christian Damsgaard Jensen, Ca...

work page doi:10.1145/3576915.3616586 2023
[43]

Aceptar Todo, Alle Akzeptieren, Accept All

Aysun Ogut, Berke Turanlioglu, Doruk Can Metiner, Albert Levi, Cemal Yilmaz, Orçun Çetin, and A. Selcuk Uluagac. 2024. Dissecting Privacy Perspectives of Websites Around the World: "Aceptar Todo, Alle Akzeptieren, Accept All... ". In33rd USENIX Security Symposium, USENIX Security 2024, Philadelphia, PA, USA, August 14-16, 2024, Davide Balzarotti and Wenyu...

work page 2024
[44]

O'Brien and Carrie Jun Cai and Meredith Ringel Morris and Percy Liang and Michael S

Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST 2023, San Francisco, CA, USA, 29 October 2023- 1 November 2023, Sean Follmer, Jeff...

work page doi:10.1145/3586183.3606763 2023
[45]

Victor Le Pochat, Tom van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczyn- ski, and Wouter Joosen. 2019. Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation. In26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet Society. https://www.ndss-symposiu...

work page 2019
[46]

Xiaoguang Qi and Brian D. Davison. 2009. Web page classification: Features and algorithms.ACM Comput. Surv.41, 2 (2009), 12:1–12:31. doi:10.1145/1459352. 1459357

work page doi:10.1145/1459352 2009
[47]

Nayanamana Samarasinghe, Aashish Adhikari, Mohammad Mannan, and Amr M. Youssef. 2022. Et tu, Brute? Privacy Analysis of Government Websites and Mobile Apps. InWWW ’22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25 - 29, 2022, Frédérique Laforest, Raphaël Troncy, Elena Simperl, Deepak Agarwal, Aristides Gionis, Ivan Herman, and Lionel ...

work page doi:10.1145/3485447.3512223 2022
[48]

Sudheesh Singanamalla, Esther Han Beol Jang, Richard Anderson, Tadayoshi Kohno, and Kurtis Heimerl. 2020. Accept the Risk and Continue: Measuring the Long Tail of Government https Adoption. InIMC ’20: ACM Internet Measurement Conference, Virtual Event, USA, October 27-29, 2020. ACM, 577–597. doi:10.1145/ 3419394.3423645

work page arXiv 2020
[49]

Alexander Spangher, Gireeja Ranade, Besmira Nushi, Adam Fourney, and Eric Horvitz. 2020. Characterizing Search-Engine Traffic to Internet Research Agency Web Properties. InWWW ’20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, Yennun Huang, Irwin King, Tie-Yan Liu, and Maarten van Steen (Eds.). ACM / IW3C2, 2253–2263. doi:10.1145/3366423.3380290

work page doi:10.1145/3366423.3380290 2020
[50]

Marco Squarcina, Mauro Tempesta, Lorenzo Veronese, Stefano Calzavara, and Matteo Maffei. 2021. Can I Take Your Subdomain? Exploring Same-Site Attacks in the Modern Web. In30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, Michael D. Bailey and Rachel Greenstadt (Eds.). USENIX Association, 2917–2934. https://www.usenix.org/conference...

work page 2021
[51]

Aleksei Stafeev, Tim Recktenwald, Gianluca De Stefano, Soheil Khodayari, and Giancarlo Pellegrino. 2025. YuraScanner: Leveraging LLMs for Task-driven Web App Scanning. In32nd Annual Network and Distributed System Security Symposium, NDSS 2025, San Diego, California, USA, February 24-28, 2025. The Internet Society. https://www.ndss-symposium.org/ndss-paper...

work page 2025
[52]

Avinash Sudhodanan, Roberto Carbone, Luca Compagna, Nicolas Dolgin, Alessan- dro Armando, and Umberto Morelli. 2017. Large-Scale Analysis & Detection of Authentication Cross-Site Request Forgeries. In2017 IEEE European Sympo- sium on Security and Privacy, EuroS&P 2017, Paris, France, April 26-28, 2017. IEEE, 350–365. doi:10.1109/EUROSP.2017.45

work page doi:10.1109/eurosp.2017.45 2017
[53]

2021.Amazon closing down Alexa, the popular web traffic ranking site

The Daily Star. 2021.Amazon closing down Alexa, the popular web traffic ranking site. https://www.thedailystar.net/tech-startup/news/amazon-closing-down- alexa-the-popular-web-traffic-ranking-site-2913401 Accessed on October 10, 2025

work page 2021
[54]

Pelayo Vallina, Álvaro Feal, Julien Gamba, Narseo Vallina-Rodriguez, and An- tonio Fernández Anta. 2019. Tales from the Porn: A Comprehensive Privacy Analysis of the Web Porn Ecosystem. InProceedings of the Internet Measurement Conference, IMC 2019, Amsterdam, The Netherlands, October 21-23, 2019. ACM, 245–258. doi:10.1145/3355369.3355583

work page doi:10.1145/3355369.3355583 2019
[55]

2025.VirusTotal

VirusTotal Team. 2025.VirusTotal. https://www.virustotal.com/ Accessed: 2025-09-25

work page 2025
[56]

Luping Wang, Sheng Chen, Linnan Jiang, Shu Pan, Runze Cai, Sen Yang, and Fei Yang. 2025. Parameter-efficient fine-tuning in large language models: a survey of methodologies.Artif. Intell. Rev.58, 8 (2025), 227. doi:10.1007/S10462-025-11236-4

work page doi:10.1007/s10462-025-11236-4 2025
[57]

Lukas Weichselbaum, Michele Spagnuolo, Sebastian Lekies, and Artur Janc. 2016. CSP Is Dead, Long Live CSP! On the Insecurity of Whitelists and the Future of Content Security Policy. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, Edgar R. Weippl, Stefan Katzenbeisser, Christoph...

work page doi:10.1145/2976749.2978363 2016
[58]

1994–2014.Yahoo Directory

Yahoo! Inc. 1994–2014.Yahoo Directory. https://dir.yahoo.com/

work page 1994
[59]

Eric Ye, Xiao Bai, Neil O’Hare, Eliyar Asgarieh, Kapil Thadani, Francisco Perez- Sorrosal, and Sujyothi Adiga. 2024. Multilingual Taxonomic Web Page Catego- rization Through Ensemble Knowledge Distillation.IEEE Trans. Knowl. Data Eng. 36, 11 (2024), 6614–6627. doi:10.1109/TKDE.2024.3406368

work page doi:10.1109/tkde.2024.3406368 2024
[60]

Eric Zeng, Miranda Wei, Theo Gregersen, Tadayoshi Kohno, and Franziska Roes- ner. 2021. Polls, clickbait, and commemorative $2 bills: problematic political advertising on news and media websites around the 2020 U.S. elections. InIMC ’21: ACM Internet Measurement Conference, Virtual Event, USA, November 2-4, 2021, Dave Levin, Alan Mislove, Johanna Amann, a...

work page doi:10.1145/3487552.3487850 2021

[1] [1]

https://tranco-list.eu/list/4QY5X [Accessed 23-September-2025]

2025.Tranco list 06 April 2025. https://tranco-list.eu/list/4QY5X [Accessed 23-September-2025]

work page 2025

[2] [2]

Gunes Acar, Christian Eubank, Steven Englehardt, Marc Juarez, Arvind Narayanan, and Claudia Díaz. 2014. The Web Never Forgets: Persistent Track- ing Mechanisms in the Wild. InProceedings of the 2014 ACM SIGSAC Confer- ence on Computer and Communications Security, Scottsdale, AZ, USA, Novem- ber 3-7, 2014, Gail-Joon Ahn, Moti Yung, and Ninghui Li (Eds.). A...

work page doi:10.1145/2660267.2660347 2014

[3] [3]

Syed Suleman Ahmad, Muhammad Daniyal Dar, Muhammad Fareed Zaffar, Narseo Vallina-Rodriguez, and Rishab Nithyanand. 2020. Apophanies or Epipha- nies? How Crawlers Impact Our Understanding of the Web. InWWW ’20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, Yennun Huang, Ir- win King, Tie-Yan Liu, and Maarten van Steen (Eds.). ACM / IW3C2, 271...

work page doi:10.1145/3366423.3380113 2020

[4] [4]

2025.Llama4

AI@Meta. 2025.Llama4. https://github.com/meta-llama/llama-models/blob/ main/models/llama4/MODEL_CARD.md [Accessed 23-September-2025]

work page 2025

[5] [5]

Suood Abdulaziz Al-Roomi and Frank Li. 2023. A Large-Scale Measurement of Website Login Policies. In32nd USENIX Security Symposium, USENIX Security 2023, Anaheim, CA, USA, August 9-11, 2023, Joseph A. Calandrino and Carmela Troncoso (Eds.). USENIX Association, 2061–2078. https://www.usenix.org/ conference/usenixsecurity23/presentation/al-roomi

work page 2023

[6] [6]

Suood Alroomi and Frank Li. 2023. Measuring Website Password Creation Policies At Scale. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS 2023, Copenhagen, Denmark, November 26-30, 2023, Weizhi Meng, Christian Damsgaard Jensen, Cas Cremers, and Engin Kirda (Eds.). ACM, 3108–3122. doi:10.1145/3576915.3623156

work page doi:10.1145/3576915.3623156 2023

[7] [7]

Calvin Ardi and Matt Calder. 2023. The Prevalence of Single Sign-On on the Web: Towards the Next Generation of Web Content Measurement. InProceedings of the 2023 ACM on Internet Measurement Conference, IMC 2023, Montreal, QC, Canada, October 24-26, 2023, Marie-José Montpetit, Aris Leivadeas, Steve Uhlig, and Mobin Javed (Eds.). ACM, 124–130. doi:10.1145/3...

work page doi:10.1145/3618257.3624841 2023

[8] [8]

Artifacts

Artifacts 2025. Artifacts. https://anonymous.4open.science/r/LLM-Assisted- Web-Measurements-Artifacts-2B0F/. Repository containing all of the artifacts related to this paper

work page 2025

[9] [9]

Marzieh Bitaab, Haehyun Cho, Adam Oest, Zhuoer Lyu, Wei Wang, Jorij Abraham, Ruoyu Wang, Tiffany Bao, Yan Shoshitaishvili, and Adam Doupé. 2023. Beyond Phish: Toward Detecting Fraudulent e-Commerce Websites at Scale. In44th IEEE Symposium on Security and Privacy, SP 2023, San Francisco, CA, USA, May 21-25,

work page 2023

[10] [10]

alien traces

IEEE, 2566–2583. doi:10.1109/SP46215.2023.10179461

work page doi:10.1109/sp46215.2023.10179461 2023

[11] [11]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

work page 2020

[12] [12]

Stefano Calzavara, Alvise Rabitti, and Michele Bugliesi. 2018. Semantics-Based Analysis of Content Security Policy Deployment.ACM Trans. Web12, 2 (2018), 10:1–10:36. doi:10.1145/3149408

work page doi:10.1145/3149408 2018

[13] [13]

Stefano Calzavara, Sebastian Roth, Alvise Rabitti, Michael Backes, and Ben Stock

work page

[14] [14]

In29th USENIX Security Symposium, USENIX Security 2020, August 12-14, 2020, Srdjan Capkun and Franziska Roesner (Eds.)

A Tale of Two Headers: A Formal Analysis of Inconsistent Click-Jacking Protection on the Web. In29th USENIX Security Symposium, USENIX Security 2020, August 12-14, 2020, Srdjan Capkun and Franziska Roesner (Eds.). USENIX Association, 683–697. https://www.usenix.org/conference/usenixsecurity20/ presentation/calzavara

work page 2020

[15] [15]

Jianjun Chen, Jian Jiang, Hai-Xin Duan, Tao Wan, Shuo Chen, Vern Paxson, and Min Yang. 2018. We Still Don’t Have Secure Cross-Domain Requests: an Empirical Study of CORS. In27th USENIX Security Symposium, USENIX Security 2018, Baltimore, MD, USA, August 15-17, 2018, William Enck and Adrienne Porter Felt (Eds.). USENIX Association, 1079–1093. https://www.u...

work page 2018

[16] [16]

2020–.Cloudflare Radar Domain Categorization

Cloudflare, Inc. 2020–.Cloudflare Radar Domain Categorization. https://radar. cloudflare.com/domains

work page 2020

[17] [17]

2025.Gemini 2.5 Flash

Google Deepmind. 2025.Gemini 2.5 Flash. https://deepmind.google/models/ gemini/flash/ [Accessed 23-September-2025]

work page 2025

[18] [18]

Google Deepmind. 2025. Gemma3. https://ai.google.dev/gemma/docs/core?hl=it [Accessed 23-September-2025]

work page 2025

[19] [19]

Martin Degeling, Christine Utz, Christopher Lentzsch, Henry Hosseini, Flo- rian Schaub, and Thorsten Holz. 2019. We Value Your Privacy ... Now Take Some Cookies: Measuring the GDPR’s Impact on Web Privacy. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet Soci- ety. ...

work page 2019

[20] [20]

Yana Dimova, Tom van Goethem, and Wouter Joosen. 2023. Everybody’s Looking for SSOmething: A large-scale evaluation on the privacy of OAuth authentication on the web.Proc. Priv. Enhancing Technol.2023, 4 (2023), 452–467. doi:10.56553/ POPETS-2023-0119

work page 2023

[21] [21]

2025.Disconnect – Tracker Protection Services List

Disconnect, Inc. 2025.Disconnect – Tracker Protection Services List. https: //disconnect.me/trackerprotection Open-source list of domains used to identify trackers; used in many privacy and web measurement research projects

work page 2025

[22] [22]

1998–2017.DMOZ – The Open Directory Project

DMOZ Contributors. 1998–2017.DMOZ – The Open Directory Project. https: //dmoz-odp.org/

work page 1998

[23] [23]

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui

work page

[24] [24]

A Survey on In-context Learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Miami, FL, USA, November 12-16, 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, 1107–1128. doi:10.18653/V1/ 2024.EMNLP-MAIN.64

work page doi:10.18653/v1/ 2024

[25] [25]

Kostas Drakonakis, Sotiris Ioannidis, and Jason Polakis. 2020. The Cookie Hunter: Automated Black-box Auditing for Web Authentication and Authorization Flaws. InCCS ’20: 2020 ACM SIGSAC Conference on Computer and Communications Secu- rity, Virtual Event, USA, November 9-13, 2020, Jay Ligatti, Xinming Ou, Jonathan Katz, and Giovanni Vigna (Eds.). ACM, 1953...

work page doi:10.1145/3372297.3417869 2020

[26] [26]

Steven Englehardt and Arvind Narayanan. 2016. Online Tracking: A 1-million-site Measurement and Analysis. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, Edgar R. Weippl, Stefan Katzenbeisser, Christopher Kruegel, Andrew C. Myers, and Shai Halevi (Eds.). ACM, 1388–1401. doi:10....

work page doi:10.1145/2976749.2978313 2016

[27] [27]

2025.Google Developer API

Google. 2025.Google Developer API. https://ai.google.dev/ [Accessed 23- September-2025]

work page 2025

[28] [28]

Matthias Gotze, Srdjan Matic, Costas Iordanou, Georgios Smaragdakis, and Niko- laos Laoutaris. 2022. Measuring Web Cookies in Governmental Websites. In WebSci ’22: 14th ACM Web Science Conference 2022, Barcelona, Spain, June 26 - 29,

work page 2022

[29] [29]

doi:10.1145/3501247.3531545

ACM, 44–54. doi:10.1145/3501247.3531545

work page doi:10.1145/3501247.3531545

[30] [30]

2025.Ollama: Run, create, and share large language models locally

Ollama Inc. 2025.Ollama: Run, create, and share large language models locally. https://ollama.com Accessed: 2025-08-26

work page 2025

[31] [31]

Louis Jannett, Christian Mainka, Maximilian Westers, Andreas Mayer, Tobias Wich, and Vladislav Mladenov. 2024. SoK: SSO-MONITOR - The Current State and Future Research Directions in Single Sign-on Security Measurements. In9th IEEE European Symposium on Security and Privacy, EuroS&P 2024, Vienna, Austria, July 8-12, 2024. IEEE, 173–192. doi:10.1109/EUROSP6...

work page doi:10.1109/eurosp60621.2024.00018 2024

[32] [32]

Kranch and Joseph Bonneau

Michael J. Kranch and Joseph Bonneau. 2015. Upgrading HTTPS in mid- air: An empirical study of strict transport security and key pinning. In 22nd Annual Network and Distributed System Security Symposium, NDSS 2015, San Diego, California, USA, February 8-11, 2015. The Internet Soci- ety. https://www.ndss-symposium.org/ndss2015/upgrading-https-mid-air- empi...

work page 2015

[33] [33]

Pierre Laperdrix, Nataliia Bielova, Benoit Baudry, and Gildas Avoine. 2020. Browser Fingerprinting: A Survey.ACM Trans. Web14, 2 (2020), 8:1–8:33. doi:10.1145/3386040

work page doi:10.1145/3386040 2020

[34] [34]

Sebastian Lekies, Ben Stock, and Martin Johns. 2013. 25 million flows later: large-scale detection of DOM-based XSS. In2013 ACM SIGSAC Conference on Computer and Communications Security, CCS’13, Berlin, Germany, November 4-8, 2013, Ahmad-Reza Sadeghi, Virgil D. Gligor, and Moti Yung (Eds.). ACM, 1193–

work page 2013

[35] [35]

doi:10.1145/2508859.2516703

work page doi:10.1145/2508859.2516703

[36] [36]

Yuejia Liang, Jianjun Chen, Run Guo, Kaiwen Shen, Hui Jiang, Man Hou, Yue Yu, and Haixin Duan. 2024. Internet’s Invisible Enemy: Detecting and Measuring Web Cache Poisoning in the Wild. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, CCS 2024, Salt Lake City, UT, USA, October 14-18, 2024, Bo Luo, Xiaojing Liao, ...

work page doi:10.1145/3658644.3690361 2024

[37] [37]

Yun Lin, Ruofan Liu, Dinil Mon Divakaran, Jun Yang Ng, Qing Zhou Chan, Yiwen Lu, Yuxuan Si, Fan Zhang, and Jin Song Dong. 2021. Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages. In30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, Michael D. Bailey and Rachel Greenstadt (Eds.). USENIX Associat...

work page 2021

[38] [38]

Sylvain Lugeon, Tiziano Piccardi, and Robert West. 2022. Homepage2Vec: Language-Agnostic Website Embedding and Classification. InProceedings of the Sixteenth International AAAI Conference on Web and Social Media, ICWSM 2022, Atlanta, Georgia, USA, June 6-9, 2022, Ceren Budak, Meeyoung Cha, and Conference’17, July 2017, Washington, DC, USA Simone Bozzolan,...

work page 2022

[39] [39]

2006–.McAfee SiteAdvisor

McAfee Corp. 2006–.McAfee SiteAdvisor. https://sitelookup.mcafee.com/

work page 2006

[40] [40]

Abner Mendoza, Phakpoom Chinprutthiwong, and Guofei Gu. 2018. Uncovering HTTP Header Inconsistencies and the Impact on Desktop/Mobile Websites. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018, Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM, 247...

work page arXiv 2018

[41] [41]

2025.Playwright

Microsoft. 2025.Playwright. https://playwright.dev [Accessed 23-September- 2025]

work page 2025

[42] [42]

Shaoor Munir, Sandra Siby, Umar Iqbal, Steven Englehardt, Zubair Shafiq, and Carmela Troncoso. 2023. CookieGraph: Understanding and Detecting First-Party Tracking Cookies. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS 2023, Copenhagen, Denmark, November 26-30, 2023, Weizhi Meng, Christian Damsgaard Jensen, Ca...

work page doi:10.1145/3576915.3616586 2023

[43] [43]

Aceptar Todo, Alle Akzeptieren, Accept All

Aysun Ogut, Berke Turanlioglu, Doruk Can Metiner, Albert Levi, Cemal Yilmaz, Orçun Çetin, and A. Selcuk Uluagac. 2024. Dissecting Privacy Perspectives of Websites Around the World: "Aceptar Todo, Alle Akzeptieren, Accept All... ". In33rd USENIX Security Symposium, USENIX Security 2024, Philadelphia, PA, USA, August 14-16, 2024, Davide Balzarotti and Wenyu...

work page 2024

[44] [44]

O'Brien and Carrie Jun Cai and Meredith Ringel Morris and Percy Liang and Michael S

Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST 2023, San Francisco, CA, USA, 29 October 2023- 1 November 2023, Sean Follmer, Jeff...

work page doi:10.1145/3586183.3606763 2023

[45] [45]

Victor Le Pochat, Tom van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczyn- ski, and Wouter Joosen. 2019. Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation. In26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet Society. https://www.ndss-symposiu...

work page 2019

[46] [46]

Xiaoguang Qi and Brian D. Davison. 2009. Web page classification: Features and algorithms.ACM Comput. Surv.41, 2 (2009), 12:1–12:31. doi:10.1145/1459352. 1459357

work page doi:10.1145/1459352 2009

[47] [47]

Nayanamana Samarasinghe, Aashish Adhikari, Mohammad Mannan, and Amr M. Youssef. 2022. Et tu, Brute? Privacy Analysis of Government Websites and Mobile Apps. InWWW ’22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25 - 29, 2022, Frédérique Laforest, Raphaël Troncy, Elena Simperl, Deepak Agarwal, Aristides Gionis, Ivan Herman, and Lionel ...

work page doi:10.1145/3485447.3512223 2022

[48] [48]

Sudheesh Singanamalla, Esther Han Beol Jang, Richard Anderson, Tadayoshi Kohno, and Kurtis Heimerl. 2020. Accept the Risk and Continue: Measuring the Long Tail of Government https Adoption. InIMC ’20: ACM Internet Measurement Conference, Virtual Event, USA, October 27-29, 2020. ACM, 577–597. doi:10.1145/ 3419394.3423645

work page arXiv 2020

[49] [49]

Alexander Spangher, Gireeja Ranade, Besmira Nushi, Adam Fourney, and Eric Horvitz. 2020. Characterizing Search-Engine Traffic to Internet Research Agency Web Properties. InWWW ’20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, Yennun Huang, Irwin King, Tie-Yan Liu, and Maarten van Steen (Eds.). ACM / IW3C2, 2253–2263. doi:10.1145/3366423.3380290

work page doi:10.1145/3366423.3380290 2020

[50] [50]

Marco Squarcina, Mauro Tempesta, Lorenzo Veronese, Stefano Calzavara, and Matteo Maffei. 2021. Can I Take Your Subdomain? Exploring Same-Site Attacks in the Modern Web. In30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, Michael D. Bailey and Rachel Greenstadt (Eds.). USENIX Association, 2917–2934. https://www.usenix.org/conference...

work page 2021

[51] [51]

Aleksei Stafeev, Tim Recktenwald, Gianluca De Stefano, Soheil Khodayari, and Giancarlo Pellegrino. 2025. YuraScanner: Leveraging LLMs for Task-driven Web App Scanning. In32nd Annual Network and Distributed System Security Symposium, NDSS 2025, San Diego, California, USA, February 24-28, 2025. The Internet Society. https://www.ndss-symposium.org/ndss-paper...

work page 2025

[52] [52]

Avinash Sudhodanan, Roberto Carbone, Luca Compagna, Nicolas Dolgin, Alessan- dro Armando, and Umberto Morelli. 2017. Large-Scale Analysis & Detection of Authentication Cross-Site Request Forgeries. In2017 IEEE European Sympo- sium on Security and Privacy, EuroS&P 2017, Paris, France, April 26-28, 2017. IEEE, 350–365. doi:10.1109/EUROSP.2017.45

work page doi:10.1109/eurosp.2017.45 2017

[53] [53]

2021.Amazon closing down Alexa, the popular web traffic ranking site

The Daily Star. 2021.Amazon closing down Alexa, the popular web traffic ranking site. https://www.thedailystar.net/tech-startup/news/amazon-closing-down- alexa-the-popular-web-traffic-ranking-site-2913401 Accessed on October 10, 2025

work page 2021

[54] [54]

Pelayo Vallina, Álvaro Feal, Julien Gamba, Narseo Vallina-Rodriguez, and An- tonio Fernández Anta. 2019. Tales from the Porn: A Comprehensive Privacy Analysis of the Web Porn Ecosystem. InProceedings of the Internet Measurement Conference, IMC 2019, Amsterdam, The Netherlands, October 21-23, 2019. ACM, 245–258. doi:10.1145/3355369.3355583

work page doi:10.1145/3355369.3355583 2019

[55] [55]

2025.VirusTotal

VirusTotal Team. 2025.VirusTotal. https://www.virustotal.com/ Accessed: 2025-09-25

work page 2025

[56] [56]

Luping Wang, Sheng Chen, Linnan Jiang, Shu Pan, Runze Cai, Sen Yang, and Fei Yang. 2025. Parameter-efficient fine-tuning in large language models: a survey of methodologies.Artif. Intell. Rev.58, 8 (2025), 227. doi:10.1007/S10462-025-11236-4

work page doi:10.1007/s10462-025-11236-4 2025

[57] [57]

Lukas Weichselbaum, Michele Spagnuolo, Sebastian Lekies, and Artur Janc. 2016. CSP Is Dead, Long Live CSP! On the Insecurity of Whitelists and the Future of Content Security Policy. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, Edgar R. Weippl, Stefan Katzenbeisser, Christoph...

work page doi:10.1145/2976749.2978363 2016

[58] [58]

1994–2014.Yahoo Directory

Yahoo! Inc. 1994–2014.Yahoo Directory. https://dir.yahoo.com/

work page 1994

[59] [59]

Eric Ye, Xiao Bai, Neil O’Hare, Eliyar Asgarieh, Kapil Thadani, Francisco Perez- Sorrosal, and Sujyothi Adiga. 2024. Multilingual Taxonomic Web Page Catego- rization Through Ensemble Knowledge Distillation.IEEE Trans. Knowl. Data Eng. 36, 11 (2024), 6614–6627. doi:10.1109/TKDE.2024.3406368

work page doi:10.1109/tkde.2024.3406368 2024

[60] [60]

Eric Zeng, Miranda Wei, Theo Gregersen, Tadayoshi Kohno, and Franziska Roes- ner. 2021. Polls, clickbait, and commemorative $2 bills: problematic political advertising on news and media websites around the 2020 U.S. elections. InIMC ’21: ACM Internet Measurement Conference, Virtual Event, USA, November 2-4, 2021, Dave Levin, Alan Mislove, Johanna Amann, a...

work page doi:10.1145/3487552.3487850 2021