pith. sign in

arxiv: 1907.06520 · v1 · pith:KF6ODO3Ynew · submitted 2019-07-15 · 💻 cs.CY

Tracking sex: The implications of widespread sexual data leakage and tracking on porn websites

Pith reviewed 2026-05-24 21:10 UTC · model grok-4.3

classification 💻 cs.CY
keywords pornography websitesdata leakageweb trackingprivacy riskssexual identityconsentthird-party trackers
0
0 comments X

The pith

A study of 22,484 pornography websites finds that 93 percent leak user data to third parties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper measures third-party tracking across a large sample of pornography sites and reports that the vast majority transmit user information outward. It shows that this tracking is controlled by a small number of companies and that nearly half the sites reveal or imply specific sexual identities or interests tied to visitors. The authors link these patterns to three distinct problems: the special sensitivity of sexual data compared with other categories, heightened exposure for certain user groups, and the practical barriers to genuine consent on these platforms.

Core claim

Our analysis of 22,484 pornography websites indicated that 93% leak user data to a third party. Tracking on these sites is highly concentrated by a handful of major companies, which we identify. We successfully extracted privacy policies for 3,856 sites, 17% of the total. The policies were written such that one might need a two-year college education to understand them. Our content analysis of the sample's domains indicated 44.97% of them expose or suggest a specific gender/sexual identity or interest likely to be linked to the user. We identify three core implications of the quantitative results: the unique/elevated risks of porn data leakage versus other types of data, the particular risks

What carries the argument

Large-scale crawl and measurement of third-party data leakage combined with domain-name content analysis on pornography websites.

If this is right

  • Sexual data leakage carries elevated privacy risks relative to other categories of personal information.
  • Vulnerable populations experience distinct harms from exposure of sexual interests or identities.
  • Consent on pornography sites is complicated by opaque policies and tracking, requiring affirmative consent mechanisms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The concentration of trackers suggests that blocking a small number of domains could reduce most observed leakage.
  • Similar measurement methods could be applied to other categories of sensitive websites to compare leakage rates.
  • The readability barrier in privacy policies points to a general difficulty users face when trying to understand data practices across the web.

Load-bearing premise

The sample of 22,484 sites and the method used to detect third-party data leakage accurately represent widespread practices and correctly identify actual user data exposure without significant false positives or selection bias in site discovery.

What would settle it

Re-running the crawl on a new, independently assembled list of pornography sites and finding a leakage rate below 70 percent or no measurable concentration among a few trackers would undermine the central quantitative claim.

Figures

Figures reproduced from arXiv: 1907.06520 by Elena Maris, Jennifer Henrichsen, Timothy Libert.

Figure 1
Figure 1. Figure 1: Diagram of data ƒows to third-parties on major porn sites. Note Alphabet is the holding company of Google. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
read the original abstract

This paper explores tracking and privacy risks on pornography websites. Our analysis of 22,484 pornography websites indicated that 93% leak user data to a third party. Tracking on these sites is highly concentrated by a handful of major companies, which we identify. We successfully extracted privacy policies for 3,856 sites, 17% of the total. The policies were written such that one might need a two-year college education to understand them. Our content analysis of the sample's domains indicated 44.97% of them expose or suggest a specific gender/sexual identity or interest likely to be linked to the user. We identify three core implications of the quantitative results: 1) the unique/elevated risks of porn data leakage versus other types of data, 2) the particular risks/impact for vulnerable populations, and 3) the complications of providing consent for porn site users and the need for affirmative consent in these online sexual interactions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper reports an empirical measurement study of 22,484 pornography websites, claiming that 93% leak user data to third parties with tracking highly concentrated among a few major companies. It extracts privacy policies from 3,856 sites (17% of the sample) and finds they require roughly two years of college education to comprehend. Domain-name content analysis indicates 44.97% of sites expose or suggest a specific gender/sexual identity or interest. The authors discuss three implications: elevated risks of porn data leakage, particular impacts on vulnerable populations, and challenges around consent.

Significance. If the measurements are robust, the large-scale quantification of third-party leakage and tracker concentration on adult sites provides concrete evidence of privacy risks that are qualitatively distinct from general web tracking because of the sensitive nature of the data. The policy readability finding and the domain-content statistic add supporting context. The work supplies falsifiable, large-N observations that could be replicated or extended in future studies of sensitive-category tracking.

major comments (3)
  1. [Methods] Methods section (data collection and leakage detection): the 93% leakage rate is presented as a direct observation, but the manuscript does not describe validation of the detector (e.g., manual inspection of a subsample to confirm that third-party requests actually transmit user-identifiable or sexual-interest data rather than merely recording the presence of domains such as doubleclick.net). Without such validation or payload analysis, the central percentage cannot be confirmed to measure actual data exposure.
  2. [Results] Results (sample construction): the claim that the 22,484-site corpus supports statements about 'widespread' practices requires evidence that the sampling frame (search-engine results or popularity lists) does not systematically over-represent commercial, English-language, or already-tracked domains. No robustness checks or alternative sampling comparisons are reported.
  3. [Results] Results (44.97% content-analysis figure): while less central than the leakage statistic, the domain-name classification lacks reported inter-rater reliability or a clear decision rule for what constitutes 'suggest[ing] a specific gender/sexual identity,' making the percentage difficult to interpret or replicate.
minor comments (2)
  1. [Abstract] Abstract and Results: the readability claim ('two-year college education') should cite the specific metric (e.g., Flesch-Kincaid) and report the exact grade-level value rather than a paraphrase.
  2. [Discussion] Discussion: the three implications are stated at a high level; tighter mapping from the quantitative results (e.g., the concentration statistic) to each implication would strengthen the argument.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback on our measurement study of privacy practices on pornography websites. We address each major comment below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Methods] Methods section (data collection and leakage detection): the 93% leakage rate is presented as a direct observation, but the manuscript does not describe validation of the detector (e.g., manual inspection of a subsample to confirm that third-party requests actually transmit user-identifiable or sexual-interest data rather than merely recording the presence of domains such as doubleclick.net). Without such validation or payload analysis, the central percentage cannot be confirmed to measure actual data exposure.

    Authors: We agree that the manuscript does not include payload inspection or manual validation of transmitted data. Our detector identifies HTTP requests to domains of known tracking services, following standard methods in web privacy literature. The 93% figure therefore measures sites initiating such requests rather than confirmed transmission of identifiable sexual data. We will revise the methods and results sections to explicitly state this scope, add a limitations paragraph noting the absence of payload analysis, and adjust phrasing from 'leak user data' to 'initiate requests to third-party trackers' where appropriate. revision: yes

  2. Referee: [Results] Results (sample construction): the claim that the 22,484-site corpus supports statements about 'widespread' practices requires evidence that the sampling frame (search-engine results or popularity lists) does not systematically over-represent commercial, English-language, or already-tracked domains. No robustness checks or alternative sampling comparisons are reported.

    Authors: The corpus was assembled via search-engine queries using pornography-related terms, a common approach for large-scale web studies. While we did not report explicit robustness checks against alternative frames, the scale and diversity of results provide support for the 'widespread' characterization. We will add a methods subsection describing the sampling procedure, potential biases (e.g., toward popular or English-language sites), and a brief discussion of why the frame is appropriate for the research questions. revision: yes

  3. Referee: [Results] Results (44.97% content-analysis figure): while less central than the leakage statistic, the domain-name classification lacks reported inter-rater reliability or a clear decision rule for what constitutes 'suggest[ing] a specific gender/sexual identity,' making the percentage difficult to interpret or replicate.

    Authors: The classification relied on keyword and pattern matching in domain names for terms indicating gender, orientation, or specific interests. We will expand the methods section with the explicit decision rules and keyword list used, move them to an appendix for replicability, and note that classifications were performed by the author team with internal discussion to resolve edge cases. Inter-rater reliability statistics were not computed because this was a single-team effort; the added documentation should mitigate the replicability concern. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical measurement with direct observations

full rationale

The paper reports observational statistics from crawling 22,484 sites (93% leakage rate, domain-name content analysis yielding 44.97%, policy readability on a 3,856-site subset). No equations, fitted parameters, predictions, or derivations are present. Results are stated as direct counts and percentages from the sample; no step reduces a claimed output to an input by construction, self-definition, or self-citation chain. The central claims rest on the corpus construction and detection method, which are methodological choices open to external validation rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an observational web measurement study and rests on standard domain assumptions about what constitutes third-party tracking rather than new axioms or invented entities.

axioms (1)
  • domain assumption Third-party network requests observed during site visits indicate leakage of user data to external entities
    This assumption underpins the 93% leakage figure reported in the abstract.

pith-pipeline@v0.9.0 · 5695 in / 1293 out tokens · 33571 ms · 2026-05-24T21:10:48.316501+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 1 internal anchor

  1. [1]

    Assuming a site with a privacy policy will protect his personal information 2, Jack clicks on a video

    He pulls up a site and scrolls past a small link to a privacy policy. Assuming a site with a privacy policy will protect his personal information 2, Jack clicks on a video. What Jack does not know is that incog- nito mode only ensures his browsing history is not stored on his computer. /T_he sites he visits, as well as any third-party trackers, may observ...

  2. [2]

    the complications of giving consent to data collection and tracking for porn site users, and how these problematic understandings of consent mirror more general mis- conceptions and power imbalances of interpersonal sexual consent. 2 RELATED WORK 2.1 Porn Uses, Identity, and ‘Sexual Interests’ Pornography and sexually explicit material related to sex, sex...

  3. [3]

    /T_hey also have offered channels for the vicarious expression and satisfaction of minority interests that are difficult, embarrassing, and occasionally illegal to indulge in reality

    explains, ‘..sexual images and stories have generally been officially condemned while privately en- joyed. /T_hey also have offered channels for the vicarious expression and satisfaction of minority interests that are difficult, embarrassing, and occasionally illegal to indulge in reality. . . ’ Porn can provide community for those in areas hostile toward thei...

  4. [4]

    Sloop (2004:

    When sex acts and identities are labeled abnormal or normal,all are vulnerable. Sloop (2004:

  5. [5]

    notes ‘sex positive’ means, “to think of sexual practices and sexuality as being organized into systems of power that must be transgressed if we are to undermine the constraining dimensions of culture on our behavior, ” and, accord- ing to Smith and A/t_twood (2014:

  6. [6]

    o/f_ten associated with opposition to the regulation of sexual practices, the censorship of sexual representations and restrictions on sex education

    is “. . . o/f_ten associated with opposition to the regulation of sexual practices, the censorship of sexual representations and restrictions on sex education. ” Herein, we take such a ‘sex positive’ view of porn and access to online pornography. While acknowledging the many racist, misogynistic, heteronormative and other problematic histories and themes ...

  7. [7]

    rich variety

    notes the “rich variety” of re- ported reasons for viewing porn, including: “for reconnection with my body, to get in the mood with my partner, for recognition of my sexual interests, to see things I might do, to see things I can’t do, to see things I wouldn’t do, to see things I shouldn’t do, for a laugh. . . , ” and more. Sexual playfulness is an import...

  8. [8]

    Further, the site URLs o/f_ten suggest speci/f_ic genders and/or sexual preferences, genres, and acts found in the site content. However, we believe if individuals’ porn use is involuntarily exposed, such nuanced, sex-positive understandings of porn and sexual interest will likely not /f_igure into many outside readings of user activities. /T_hus, we cent...

  9. [9]

    is concerned with sexual autonomy,self- determination, and dignity

    notes, “is concerned with sexual autonomy,self- determination, and dignity. . . ” and “. . . the extent to which others have access to and information about people’s . . . sexual desires, fantasies, and thoughts. . . ” 2.2 Online Tracking and Privacy Although users may perceive a website or app as a single entity (o/f_ten the address in their browsers), m...

  10. [10]

    human behavior and sociality

    states that ‘dataism’ demonstrates, “wide- spread belief in the objective quanti/f_ication and potential tracking of . . . human behavior and sociality. . . (and) also involvestrust in the (institutional) agents that collect, interpret, and share (meta)data. . . ” Despite the normalization of tracking, survey research consis- tently demonstrates that user...

  11. [11]

    2 information but ensuring that it /f_lowsappropriately

    argues, “What people care most about is not simply restricting the /f_low of 5It also doesn’t necessarily reveal actions by the assumed device owner; porn con- sumption can occur on someone’s device without their knowledge. 2 information but ensuring that it /f_lowsappropriately. ” Privacy poli- cies, the primary means for users to learn about tracking, h...

  12. [12]

    /T_hree coders were women (one identi/f_ied her sexuality as /f_luid; the others as queer), and one was a heterosexual man

    We used four coders from diverse backgrounds: one primary researcher and three volunteers. /T_hree coders were women (one identi/f_ied her sexuality as /f_luid; the others as queer), and one was a heterosexual man. Coders were trained using a code book with guidelines and examples for coding Presence or Absence of words or phrases that ‘reveal or strongly...

  13. [13]

    argument that: Sexual privacy sits at the apex of privacy values because of its importance to sexual agency, inti- macy, and equality. We are free only insofar as we can manage the boundaries around our bod- ies and intimate activities… It therefore deserves recognition and protection, in the same way that health privacy, /f_inancial privacy, communicatio...

  14. [14]

    As Marwick (2017:

    /T_his /f_la/t_ten- ing is even more likely to occur online. As Marwick (2017:

  15. [15]

    the same properties of social media that facilitate activism and cultural participation can

    noted regarding two prominent online sexual privacy controversies: ‘. . . the same properties of social media that facilitate activism and cultural participation can . . . enable networked abuse and targeted intimidation. ’ /T_hose most likely to be impacted by online sexual privacy violations are traditionally marginalized and vulnerable communities, esp...

  16. [16]

    sexual citizen

    explains, ‘…the “sexual citizen” . . . occupies a classed, eth- nicized, gendered, and age-grouped position in society. /T_hat is, not all sexual citizens will be treated equally, fairly, in the same way. . . ’ Franks’ (2017:

  17. [17]

    ’ In what Banet-Weiser and Miltner (2016) call ‘networked misogyny, ’ women are singled out for a disproportionate share of online abuse

    notion of ‘intersectional surveillance, ’ build- ing on Crenshaw’s (1989) foundational work on intersectionality, emphasizes that ‘those subjected to multiple sources of subordi- nation are also subjected to multiple sources of surveillance. ’ In what Banet-Weiser and Miltner (2016) call ‘networked misogyny, ’ women are singled out for a disproportionate ...

  18. [18]

    …Slut-shaming

    explain that when women’s privacy is violated and they are shamed online: ‘/T_he message is… you deserve no protection, no privacy. …Slut-shaming . . . blames the user—her habits of leaking—for systemic (networked) vulnerabilities. . . ’ /T_hus, surveillance and exposure are deployed to discipline difference. Moral judgments can lead to devastating consequ...

  19. [19]

    notes, ‘Consent facilitated by sexual privacy is contextual and nuanced - it does not operate like an on-off switch. ’ In 2015, all New York schools adopted an affirmative de/f_inition of sexual consent, stating: ‘Consent can be given by words or actions, as long as those words or actions create clear permission regarding willingness to engage in the sexual ...

  20. [20]

    virtually raped

    argues consent is like a promise, and promises can ‘protect individuals from imbalances of power within a relationship. ’ Our results reveal a troubling power imbalance in the negotiation of users’ private sexual data. Some of the world’s largest and most powerful corporations have access to this data. Not only are users o/f_ten unable to truly give conse...

  21. [21]

    16 7 CONCLUSION RQ1 asked to what extent porn websites reveal user data and al- low for third-party tracking

    As in any sexual interaction, silence must not be mistaken for consent, and individuals should have a clear understanding of the power dynamics of the sexual exchange they are entering when visiting porn sites, as well as the procedures for withdrawing consent. 16 7 CONCLUSION RQ1 asked to what extent porn websites reveal user data and al- low for third-p...

  22. [22]

    (and) by build- ing new forms of interaction that cannot ‘leak’ because they do not seek to create imaginary bub- bles of privacy between users in the /f_irst place

    contention that: We need to /f_ight for the right to take risks—to be in public—and not be a/t_tacked. . . . (and) by build- ing new forms of interaction that cannot ‘leak’ because they do not seek to create imaginary bub- bles of privacy between users in the /f_irst place. However, our results reveal just how susceptible our sexual data is to accidental ...

  23. [23]

    warning that, ‘/T_hanks to networked technologies, sexual privacy can be invaded at scale and from across the globe. ’ While we should always work toward ex- posing the constructed and complex nature of ‘normalcy’ (Warner, 1999), we have demonstrated an imperative to swi/f_tly and prag- matically address and disrupt the current and ongoing widespread leak...

  24. [24]

    yes means yes

    argument that regulatory efforts ought to ‘recognize the nature of privacy as both a collective value and a collective social phenomenon, ’ rather than traditional individualist understandings of privacy self- management. such a/t_titudes as evidenced by the ‘privacy paradox’ which implic- itly holds users responsible for privacy violations rather than the...

  25. [25]

    too enticing to resist

    Dougherty T (2015) Yes means yes: consent as communication. Philosophy & Public Affairs 43(3): 224-253. Englehardt S and Narayanan A (2016) Online tracking: a 1-million-site measurement and analysis. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (pp. 1388-1401). Felten EW and Schneider MA (2000) Timing a/t_tacks on w...

  26. [26]

    In: A/t_twood F (ed)Porn.com: Making Sense of Online Pornography

    Mowlabocus S (2010) Porn 2.0? Technology, social practice, and the new online porn industry. In: A/t_twood F (ed)Porn.com: Making Sense of Online Pornography . New York: Peter Lang, pp. 90-113. Nissenbaum H (2010) Privacy in Context: Technology, Policy, and the Integrity of Social Life . Stanford: Stanford University Press. Paasonen S (2018) Many Splendor...