Tracking sex: The implications of widespread sexual data leakage and tracking on porn websites
Pith reviewed 2026-05-24 21:10 UTC · model grok-4.3
The pith
A study of 22,484 pornography websites finds that 93 percent leak user data to third parties.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our analysis of 22,484 pornography websites indicated that 93% leak user data to a third party. Tracking on these sites is highly concentrated by a handful of major companies, which we identify. We successfully extracted privacy policies for 3,856 sites, 17% of the total. The policies were written such that one might need a two-year college education to understand them. Our content analysis of the sample's domains indicated 44.97% of them expose or suggest a specific gender/sexual identity or interest likely to be linked to the user. We identify three core implications of the quantitative results: the unique/elevated risks of porn data leakage versus other types of data, the particular risks
What carries the argument
Large-scale crawl and measurement of third-party data leakage combined with domain-name content analysis on pornography websites.
If this is right
- Sexual data leakage carries elevated privacy risks relative to other categories of personal information.
- Vulnerable populations experience distinct harms from exposure of sexual interests or identities.
- Consent on pornography sites is complicated by opaque policies and tracking, requiring affirmative consent mechanisms.
Where Pith is reading between the lines
- The concentration of trackers suggests that blocking a small number of domains could reduce most observed leakage.
- Similar measurement methods could be applied to other categories of sensitive websites to compare leakage rates.
- The readability barrier in privacy policies points to a general difficulty users face when trying to understand data practices across the web.
Load-bearing premise
The sample of 22,484 sites and the method used to detect third-party data leakage accurately represent widespread practices and correctly identify actual user data exposure without significant false positives or selection bias in site discovery.
What would settle it
Re-running the crawl on a new, independently assembled list of pornography sites and finding a leakage rate below 70 percent or no measurable concentration among a few trackers would undermine the central quantitative claim.
Figures
read the original abstract
This paper explores tracking and privacy risks on pornography websites. Our analysis of 22,484 pornography websites indicated that 93% leak user data to a third party. Tracking on these sites is highly concentrated by a handful of major companies, which we identify. We successfully extracted privacy policies for 3,856 sites, 17% of the total. The policies were written such that one might need a two-year college education to understand them. Our content analysis of the sample's domains indicated 44.97% of them expose or suggest a specific gender/sexual identity or interest likely to be linked to the user. We identify three core implications of the quantitative results: 1) the unique/elevated risks of porn data leakage versus other types of data, 2) the particular risks/impact for vulnerable populations, and 3) the complications of providing consent for porn site users and the need for affirmative consent in these online sexual interactions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports an empirical measurement study of 22,484 pornography websites, claiming that 93% leak user data to third parties with tracking highly concentrated among a few major companies. It extracts privacy policies from 3,856 sites (17% of the sample) and finds they require roughly two years of college education to comprehend. Domain-name content analysis indicates 44.97% of sites expose or suggest a specific gender/sexual identity or interest. The authors discuss three implications: elevated risks of porn data leakage, particular impacts on vulnerable populations, and challenges around consent.
Significance. If the measurements are robust, the large-scale quantification of third-party leakage and tracker concentration on adult sites provides concrete evidence of privacy risks that are qualitatively distinct from general web tracking because of the sensitive nature of the data. The policy readability finding and the domain-content statistic add supporting context. The work supplies falsifiable, large-N observations that could be replicated or extended in future studies of sensitive-category tracking.
major comments (3)
- [Methods] Methods section (data collection and leakage detection): the 93% leakage rate is presented as a direct observation, but the manuscript does not describe validation of the detector (e.g., manual inspection of a subsample to confirm that third-party requests actually transmit user-identifiable or sexual-interest data rather than merely recording the presence of domains such as doubleclick.net). Without such validation or payload analysis, the central percentage cannot be confirmed to measure actual data exposure.
- [Results] Results (sample construction): the claim that the 22,484-site corpus supports statements about 'widespread' practices requires evidence that the sampling frame (search-engine results or popularity lists) does not systematically over-represent commercial, English-language, or already-tracked domains. No robustness checks or alternative sampling comparisons are reported.
- [Results] Results (44.97% content-analysis figure): while less central than the leakage statistic, the domain-name classification lacks reported inter-rater reliability or a clear decision rule for what constitutes 'suggest[ing] a specific gender/sexual identity,' making the percentage difficult to interpret or replicate.
minor comments (2)
- [Abstract] Abstract and Results: the readability claim ('two-year college education') should cite the specific metric (e.g., Flesch-Kincaid) and report the exact grade-level value rather than a paraphrase.
- [Discussion] Discussion: the three implications are stated at a high level; tighter mapping from the quantitative results (e.g., the concentration statistic) to each implication would strengthen the argument.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our measurement study of privacy practices on pornography websites. We address each major comment below and indicate planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Methods] Methods section (data collection and leakage detection): the 93% leakage rate is presented as a direct observation, but the manuscript does not describe validation of the detector (e.g., manual inspection of a subsample to confirm that third-party requests actually transmit user-identifiable or sexual-interest data rather than merely recording the presence of domains such as doubleclick.net). Without such validation or payload analysis, the central percentage cannot be confirmed to measure actual data exposure.
Authors: We agree that the manuscript does not include payload inspection or manual validation of transmitted data. Our detector identifies HTTP requests to domains of known tracking services, following standard methods in web privacy literature. The 93% figure therefore measures sites initiating such requests rather than confirmed transmission of identifiable sexual data. We will revise the methods and results sections to explicitly state this scope, add a limitations paragraph noting the absence of payload analysis, and adjust phrasing from 'leak user data' to 'initiate requests to third-party trackers' where appropriate. revision: yes
-
Referee: [Results] Results (sample construction): the claim that the 22,484-site corpus supports statements about 'widespread' practices requires evidence that the sampling frame (search-engine results or popularity lists) does not systematically over-represent commercial, English-language, or already-tracked domains. No robustness checks or alternative sampling comparisons are reported.
Authors: The corpus was assembled via search-engine queries using pornography-related terms, a common approach for large-scale web studies. While we did not report explicit robustness checks against alternative frames, the scale and diversity of results provide support for the 'widespread' characterization. We will add a methods subsection describing the sampling procedure, potential biases (e.g., toward popular or English-language sites), and a brief discussion of why the frame is appropriate for the research questions. revision: yes
-
Referee: [Results] Results (44.97% content-analysis figure): while less central than the leakage statistic, the domain-name classification lacks reported inter-rater reliability or a clear decision rule for what constitutes 'suggest[ing] a specific gender/sexual identity,' making the percentage difficult to interpret or replicate.
Authors: The classification relied on keyword and pattern matching in domain names for terms indicating gender, orientation, or specific interests. We will expand the methods section with the explicit decision rules and keyword list used, move them to an appendix for replicability, and note that classifications were performed by the author team with internal discussion to resolve edge cases. Inter-rater reliability statistics were not computed because this was a single-team effort; the added documentation should mitigate the replicability concern. revision: yes
Circularity Check
No circularity: purely empirical measurement with direct observations
full rationale
The paper reports observational statistics from crawling 22,484 sites (93% leakage rate, domain-name content analysis yielding 44.97%, policy readability on a 3,856-site subset). No equations, fitted parameters, predictions, or derivations are present. Results are stated as direct counts and percentages from the sample; no step reduces a claimed output to an input by construction, self-definition, or self-citation chain. The central claims rest on the corpus construction and detection method, which are methodological choices open to external validation rather than tautological.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Third-party network requests observed during site visits indicate leakage of user data to external entities
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our analysis of 22,484 pornography websites indicated that 93% leak user data to a third party... webXray... policyXray... content analysis of the sample's domains indicated 44.97%...
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
93% leak user data to a third party... tracking... concentrated by a handful of major companies
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
He pulls up a site and scrolls past a small link to a privacy policy. Assuming a site with a privacy policy will protect his personal information 2, Jack clicks on a video. What Jack does not know is that incog- nito mode only ensures his browsing history is not stored on his computer. /T_he sites he visits, as well as any third-party trackers, may observ...
work page 2018
-
[2]
the complications of giving consent to data collection and tracking for porn site users, and how these problematic understandings of consent mirror more general mis- conceptions and power imbalances of interpersonal sexual consent. 2 RELATED WORK 2.1 Porn Uses, Identity, and ‘Sexual Interests’ Pornography and sexually explicit material related to sex, sex...
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[3]
explains, ‘..sexual images and stories have generally been officially condemned while privately en- joyed. /T_hey also have offered channels for the vicarious expression and satisfaction of minority interests that are difficult, embarrassing, and occasionally illegal to indulge in reality. . . ’ Porn can provide community for those in areas hostile toward thei...
work page 2001
-
[4]
When sex acts and identities are labeled abnormal or normal,all are vulnerable. Sloop (2004:
work page 2004
-
[5]
notes ‘sex positive’ means, “to think of sexual practices and sexuality as being organized into systems of power that must be transgressed if we are to undermine the constraining dimensions of culture on our behavior, ” and, accord- ing to Smith and A/t_twood (2014:
work page 2014
-
[6]
is “. . . o/f_ten associated with opposition to the regulation of sexual practices, the censorship of sexual representations and restrictions on sex education. ” Herein, we take such a ‘sex positive’ view of porn and access to online pornography. While acknowledging the many racist, misogynistic, heteronormative and other problematic histories and themes ...
work page 2014
-
[7]
notes the “rich variety” of re- ported reasons for viewing porn, including: “for reconnection with my body, to get in the mood with my partner, for recognition of my sexual interests, to see things I might do, to see things I can’t do, to see things I wouldn’t do, to see things I shouldn’t do, for a laugh. . . , ” and more. Sexual playfulness is an import...
work page 2018
-
[8]
Further, the site URLs o/f_ten suggest speci/f_ic genders and/or sexual preferences, genres, and acts found in the site content. However, we believe if individuals’ porn use is involuntarily exposed, such nuanced, sex-positive understandings of porn and sexual interest will likely not /f_igure into many outside readings of user activities. /T_hus, we cent...
work page 2019
-
[9]
is concerned with sexual autonomy,self- determination, and dignity
notes, “is concerned with sexual autonomy,self- determination, and dignity. . . ” and “. . . the extent to which others have access to and information about people’s . . . sexual desires, fantasies, and thoughts. . . ” 2.2 Online Tracking and Privacy Although users may perceive a website or app as a single entity (o/f_ten the address in their browsers), m...
work page 2015
-
[10]
states that ‘dataism’ demonstrates, “wide- spread belief in the objective quanti/f_ication and potential tracking of . . . human behavior and sociality. . . (and) also involvestrust in the (institutional) agents that collect, interpret, and share (meta)data. . . ” Despite the normalization of tracking, survey research consis- tently demonstrates that user...
work page 2000
-
[11]
2 information but ensuring that it /f_lowsappropriately
argues, “What people care most about is not simply restricting the /f_low of 5It also doesn’t necessarily reveal actions by the assumed device owner; porn con- sumption can occur on someone’s device without their knowledge. 2 information but ensuring that it /f_lowsappropriately. ” Privacy poli- cies, the primary means for users to learn about tracking, h...
work page 2014
-
[12]
We used four coders from diverse backgrounds: one primary researcher and three volunteers. /T_hree coders were women (one identi/f_ied her sexuality as /f_luid; the others as queer), and one was a heterosexual man. Coders were trained using a code book with guidelines and examples for coding Presence or Absence of words or phrases that ‘reveal or strongly...
work page 2010
-
[13]
argument that: Sexual privacy sits at the apex of privacy values because of its importance to sexual agency, inti- macy, and equality. We are free only insofar as we can manage the boundaries around our bod- ies and intimate activities… It therefore deserves recognition and protection, in the same way that health privacy, /f_inancial privacy, communicatio...
work page 2012
-
[14]
/T_his /f_la/t_ten- ing is even more likely to occur online. As Marwick (2017:
work page 2017
-
[15]
the same properties of social media that facilitate activism and cultural participation can
noted regarding two prominent online sexual privacy controversies: ‘. . . the same properties of social media that facilitate activism and cultural participation can . . . enable networked abuse and targeted intimidation. ’ /T_hose most likely to be impacted by online sexual privacy violations are traditionally marginalized and vulnerable communities, esp...
work page 2019
-
[16]
explains, ‘…the “sexual citizen” . . . occupies a classed, eth- nicized, gendered, and age-grouped position in society. /T_hat is, not all sexual citizens will be treated equally, fairly, in the same way. . . ’ Franks’ (2017:
work page 2017
-
[17]
notion of ‘intersectional surveillance, ’ build- ing on Crenshaw’s (1989) foundational work on intersectionality, emphasizes that ‘those subjected to multiple sources of subordi- nation are also subjected to multiple sources of surveillance. ’ In what Banet-Weiser and Miltner (2016) call ‘networked misogyny, ’ women are singled out for a disproportionate ...
work page 1989
-
[18]
explain that when women’s privacy is violated and they are shamed online: ‘/T_he message is… you deserve no protection, no privacy. …Slut-shaming . . . blames the user—her habits of leaking—for systemic (networked) vulnerabilities. . . ’ /T_hus, surveillance and exposure are deployed to discipline difference. Moral judgments can lead to devastating consequ...
work page 1999
-
[19]
notes, ‘Consent facilitated by sexual privacy is contextual and nuanced - it does not operate like an on-off switch. ’ In 2015, all New York schools adopted an affirmative de/f_inition of sexual consent, stating: ‘Consent can be given by words or actions, as long as those words or actions create clear permission regarding willingness to engage in the sexual ...
work page 2015
-
[20]
argues consent is like a promise, and promises can ‘protect individuals from imbalances of power within a relationship. ’ Our results reveal a troubling power imbalance in the negotiation of users’ private sexual data. Some of the world’s largest and most powerful corporations have access to this data. Not only are users o/f_ten unable to truly give conse...
work page 2015
-
[21]
As in any sexual interaction, silence must not be mistaken for consent, and individuals should have a clear understanding of the power dynamics of the sexual exchange they are entering when visiting porn sites, as well as the procedures for withdrawing consent. 16 7 CONCLUSION RQ1 asked to what extent porn websites reveal user data and al- low for third-p...
work page 2019
-
[22]
contention that: We need to /f_ight for the right to take risks—to be in public—and not be a/t_tacked. . . . (and) by build- ing new forms of interaction that cannot ‘leak’ because they do not seek to create imaginary bub- bles of privacy between users in the /f_irst place. However, our results reveal just how susceptible our sexual data is to accidental ...
work page 2019
-
[23]
warning that, ‘/T_hanks to networked technologies, sexual privacy can be invaded at scale and from across the globe. ’ While we should always work toward ex- posing the constructed and complex nature of ‘normalcy’ (Warner, 1999), we have demonstrated an imperative to swi/f_tly and prag- matically address and disrupt the current and ongoing widespread leak...
work page 1999
-
[24]
argument that regulatory efforts ought to ‘recognize the nature of privacy as both a collective value and a collective social phenomenon, ’ rather than traditional individualist understandings of privacy self- management. such a/t_titudes as evidenced by the ‘privacy paradox’ which implic- itly holds users responsible for privacy violations rather than the...
-
[25]
Dougherty T (2015) Yes means yes: consent as communication. Philosophy & Public Affairs 43(3): 224-253. Englehardt S and Narayanan A (2016) Online tracking: a 1-million-site measurement and analysis. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (pp. 1388-1401). Felten EW and Schneider MA (2000) Timing a/t_tacks on w...
-
[26]
In: A/t_twood F (ed)Porn.com: Making Sense of Online Pornography
Mowlabocus S (2010) Porn 2.0? Technology, social practice, and the new online porn industry. In: A/t_twood F (ed)Porn.com: Making Sense of Online Pornography . New York: Peter Lang, pp. 90-113. Nissenbaum H (2010) Privacy in Context: Technology, Policy, and the Integrity of Social Life . Stanford: Stanford University Press. Paasonen S (2018) Many Splendor...
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.