Recognition: unknown
LLM-Assisted Web Measurements
read the original abstract
Web measurements are a well-established methodology for assessing the security and privacy landscape of the Internet. However, existing top lists of popular websites are unlabeled and lack semantic information about the nature of the included websites, making targeted web measurements challenging, as researchers often rely on ad-hoc techniques to bias datasets toward specific website classes of interest. In this paper, we investigate the use of Large Language Models (LLMs) to enable targeted web measurement studies. Building on prior literature, we identify key website classification tasks relevant to web measurements and highlight limitations in state-of-the-art classification approaches. We construct carefully curated datasets to evaluate different LLMs on these tasks. Our results show that LLMs can achieve strong performance across multiple classification scenarios, but the choice of model and configuration plays a significant role. Motivated by the observed trade-off between classification accuracy and computational efficiency, we propose a practical two-step methodology for scalable targeted web measurements starting from the Tranco list. Finally, we conduct LLM-assisted web measurement studies inspired by prior work using our methodology and assess the validity of the resulting research inferences, showing that LLMs can effectively enable targeted measurements of security and privacy trends on the Web.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Fighting AI with AI: AI-Agent Augmented DNS Blocking of LLM Services during Student Evaluations
AI-Sinkhole uses AI classification with quantized LLMs and Pi-Hole DNS blocking to dynamically prevent access to LLM services during student evaluations, reporting F1 scores above 0.83.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.