pith. sign in

arxiv: 1907.03834 · v1 · pith:XXBBH6VGnew · submitted 2019-06-25 · 💻 cs.CY · cs.NI

Differences in Online Course Usage and IP Geolocation Bias by Local Economic Profile

Pith reviewed 2026-05-25 15:43 UTC · model grok-4.3

classification 💻 cs.CY cs.NI
keywords MOOC registrationIP geolocationZIP code analysiseconomic inequalityMaxMindonline educationlocation bias
0
0 comments X

The pith

IP geolocation databases like MaxMind place users from poorer areas into richer ZIP codes, understating how MOOC sign-ups favor prosperous neighborhoods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper maps HarvardX and MITx registrations to ZIP codes using both IP addresses and user mailing addresses. Per-person registration rates rise with local income and population density. MaxMind's IP-based ZIP codes deviate more from mailing addresses in lower-income areas, assign more users to higher-income ZIP codes than mailing addresses indicate, and flatten the apparent relationship between prosperity and registration. The result is that IP-only analyses make MOOC participation look less unequal by economic profile than the mailing-address data show. The author notes the same bias pattern could distort location-based work in other fields.

Core claim

Per-capita registration rates for the courses increase with ZIP-code economic prosperity and population density; when MaxMind IP geolocation is compared to user mailing addresses, the database produces larger geographic and economic mismatches for users in distressed areas, disproportionately assigns locations to prosperous ZIP codes, and therefore underestimates the degree to which registrations are concentrated in higher-income places.

What carries the argument

Comparison of MaxMind-derived ZIP codes against self-reported mailing addresses to quantify geolocation error by local economic profile.

If this is right

  • Analyses of MOOC participation that rely solely on IP-derived locations will report weaker correlations between registration and local prosperity than actually exist.
  • Any downstream statistic that uses IP geolocation to infer demographic or economic traits of online users inherits the same directional bias toward prosperous areas.
  • Policy or platform decisions that allocate resources on the basis of IP-mapped participation rates will understate the need for outreach in economically distressed ZIP codes.
  • Commercial or legal uses of IP geolocation to estimate user income or neighborhood characteristics will overstate the presence of users from higher-income areas.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Researchers who need location-linked socioeconomic data may need to combine IP methods with address verification or alternative signals rather than treating IP output as neutral.
  • The bias documented here could be tested in other large-scale IP datasets, such as web traffic logs or app usage, to see whether the same prosperity skew appears.
  • Database maintainers could examine whether their own training data or resolution methods systematically under-sample lower-income network infrastructure.

Load-bearing premise

User-provided mailing addresses are an accurate and unbiased record of actual location that can serve as ground truth for measuring IP geolocation error.

What would settle it

Repeating the error analysis with a second independent geolocation service or with verified address data shows no systematic increase in error size or direction for lower-income ZIP codes.

read the original abstract

Although Massive Online Open Courses (MOOCs) have the promise to make rigorous higher education accessible to everyone, prior research has shown that registrants tend to come from backgrounds of higher socioeconomic status. In this work, I study geographically granular economic patterns in registration for HarvardX and MITx courses, and in the accuracy of identifying users' locations from their IP addresses. Using ZIP Codes identified by the MaxMind IP geolocation database, I find that per-capita registration rates correlate with economic prosperity and population density. Comparing these ZIP Codes with user-provided mailing addresses, I find evidence of bias in MaxMind geolocation: it makes greater errors, both geographically and economically, for users from more economically distressed areas; it disproportionately geolocates users to prosperous areas; and it underestimates the regressive pattern in MOOC registration. Similar economic biases may affect IP geolocation in other academic, commercial, and legal contexts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper studies geographic economic patterns in registrations for HarvardX and MITx MOOCs. Using MaxMind IP geolocation to assign ZIP codes, it reports that per-capita registration rates correlate positively with local economic prosperity and population density. Comparing these ZIP codes against user-provided mailing addresses, the work claims to identify systematic biases in MaxMind: larger geographic and economic errors for users from distressed areas, disproportionate assignment to prosperous ZIP codes, and consequent underestimation of the regressive character of MOOC participation.

Significance. If the differential-bias results are robust, the manuscript supplies concrete evidence that IP geolocation databases can embed socioeconomic skews, with direct consequences for any research, platform analytics, or policy work that relies on such data to study educational access or digital behavior.

major comments (2)
  1. [section comparing MaxMind ZIP codes to mailing addresses (results)] The central claims of greater MaxMind errors (both geographic and economic) for distressed-area users, disproportionate geolocation to prosperous areas, and underestimation of regressive registration all rest on treating user-provided mailing addresses as an unbiased ground truth. The manuscript supplies no analysis or data showing that address provision rates, completeness, or self-reporting accuracy are independent of local economic profile; any systematic difference by economic status would directly confound the reported differential-bias findings.
  2. [methods / data description] No information is given on the number of users who supplied mailing addresses versus those who did not, nor on whether address provision itself correlates with the economic variables under study; without this, the size and representativeness of the comparison sample cannot be assessed.
minor comments (1)
  1. [abstract] The abstract would be strengthened by reporting sample sizes, correlation coefficients, and the statistical tests used for the per-capita prosperity and density relationships.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments. We respond point-by-point to the major comments below, indicating planned revisions where feasible.

read point-by-point responses
  1. Referee: [section comparing MaxMind ZIP codes to mailing addresses (results)] The central claims of greater MaxMind errors (both geographic and economic) for distressed-area users, disproportionate geolocation to prosperous areas, and underestimation of regressive registration all rest on treating user-provided mailing addresses as an unbiased ground truth. The manuscript supplies no analysis or data showing that address provision rates, completeness, or self-reporting accuracy are independent of local economic profile; any systematic difference by economic status would directly confound the reported differential-bias findings.

    Authors: We agree that the assumption of unbiased ground truth is central to interpreting the differential-bias results. The available data do not allow direct verification that address provision rates are independent of economic status, since users without addresses lack the location information needed for such an analysis. We will revise the manuscript to explicitly acknowledge this assumption and discuss its potential implications for the findings on geolocation bias. revision: yes

  2. Referee: [methods / data description] No information is given on the number of users who supplied mailing addresses versus those who did not, nor on whether address provision itself correlates with the economic variables under study; without this, the size and representativeness of the comparison sample cannot be assessed.

    Authors: We will add to the methods section the total number of users and the number who provided mailing addresses used in the comparison sample. However, we cannot assess correlation between address provision and economic variables, as non-providers have no associated location data from which to derive economic profiles. This limitation will be noted in the revision. revision: partial

standing simulated objections not resolved
  • Whether address provision rates or accuracy correlate with local economic profile (cannot be determined from available data on non-providers).

Circularity Check

0 steps flagged

No circularity: purely observational data analysis with no derivations or fitted predictions

full rationale

The paper performs empirical analysis of MOOC registration rates by ZIP code economic profiles and compares MaxMind IP geolocation to user-provided mailing addresses. No equations, parameters, or derivations are present that could reduce any result to a fitted input or self-referential definition by construction. All claims rest on direct data comparisons rather than any predictive or uniqueness step that loops back to the inputs. This is the expected non-finding for a purely observational study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on external data sources (MaxMind database, ZIP-code economic profiles) and the assumption that user self-reported addresses are reliable ground truth; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption User-provided mailing addresses are reliable indicators of actual user locations
    The comparison used to measure geolocation accuracy treats these addresses as ground truth.

pith-pipeline@v0.9.0 · 5681 in / 1213 out tokens · 32930 ms · 2026-05-25T15:43:58.979663+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.