Overconfident Coordinates: Quantifying Confidence in Traceroute Geolocation

Caleb J. Wang; Fabi\'an E. Bustamante; Santiago Klein

arxiv: 2606.24027 · v2 · pith:7XTKNT4Ynew · submitted 2026-06-23 · 💻 cs.NI

Overconfident Coordinates: Quantifying Confidence in Traceroute Geolocation

Santiago Klein , Caleb J. Wang , Fabi\'an E. Bustamante This is my paper

Pith reviewed 2026-06-25 22:37 UTC · model grok-4.3

classification 💻 cs.NI

keywords traceroutegeolocationhidden markov modelinternet measurementpath consistencylatencynetwork topologyrouter location

0 comments

The pith

Path Consistency Scoring assigns confidence to traceroute geolocations by checking consistency with latency and speed-of-light constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Path Consistency Scoring (PCS) to evaluate how well geolocation metadata supports a coherent geographic interpretation of a traceroute path. It models each path as a sequence of candidate locations using a Hidden Markov Model that combines local evidence with speed-of-light limits and empirical latency priors. The resulting score indicates the reliability of the geographic path. On a set of validated paths, 94.2 percent of decoded sequences show mean error below 200 km. PCS scores remain similar across different commercial geolocation databases, while a separate alignment metric highlights varying needs for correction depending on the database used. This framework lets researchers quantify confidence in geographic conclusions drawn from traceroutes.

Core claim

PCS models each traceroute as a sequence of candidate city-level locations and uses a Hidden Markov Model to fuse local evidence with speed-of-light constraints and empirical latency priors. The model produces a path consistency score that summarizes how well metadata and observed RTT increments support a coherent geographic interpretation. On 6,555 validated paths, 94.2% of decoded sequences achieve mean error below 200 km. PCS is largely GeoDB-agnostic, with median scores varying by less than 5% across four commercial databases, while the alignment metric shows that over half of DB-IP and IP2Location paths require substantial correction, compared with 15% for IPinfo.

What carries the argument

Path Consistency Scoring (PCS), a Hidden Markov Model that scores path-level consistency between geolocation metadata and observed latencies under speed-of-light constraints.

If this is right

Downstream analyses can filter or weight traceroute paths according to their PCS scores instead of treating all geolocation data as equally reliable.
The Path-Model Alignment metric identifies which geolocation databases produce paths needing the most correction on a given dataset.
Researchers obtain a passive method to qualify geographic conclusions from traceroutes without requiring active probing for every path.
Comparisons across geolocation databases become possible through consistency scores rather than point-wise accuracy alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Studies of Internet routing and topology could improve by discarding or down-weighting low-PCS paths, reducing the impact of unreliable location data.
The same consistency-checking approach might extend to other forms of network metadata such as rDNS labels or IXP records.
If latency-to-distance assumptions weaken in certain network regions, PCS scores would need region-specific calibration to remain useful.

Load-bearing premise

The score is only meaningful when latency serves as a reasonable proxy for geographic distance.

What would settle it

A large independent set of ground-truth validated paths on which fewer than 80% of decoded sequences achieve mean error below 200 km would falsify the reported decoding accuracy.

Figures

Figures reproduced from arXiv: 2606.24027 by Caleb J. Wang, Fabi\'an E. Bustamante, Santiago Klein.

**Figure 2.** Figure 2: Overview of the PCS path-alignment pipeline. Traceroute observations and normalized location evidence define a candidate state space, emission scores, and latency-aware transition scores. Viterbi decoding then produces a path-level geographic interpretation and a path consistency score. 4.1 Candidate State Space and Endpoint Anchors PCS restricts inference to a discrete set of candidate locations at each h… view at source ↗

**Figure 3.** Figure 3: Empirical latency distributions used to con [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Path-level alignment accuracy for the Public [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Validation-set accuracy and alignment diagnostics. Panels (a) and (b) compare raw GeoDB assignments [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Validated vendor runs: PCS versus mean decoded geolocation error, colored by Path–Model Alignment [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Validated vendor runs: mean decoded geolocation error grouped by Path–Model Alignment [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Decoded Evaluation Corpus: CDFs of PCS and Path–Model Alignment [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

read the original abstract

Studies of Internet paths often attach router locations to traceroute hops using commercial geolocation databases, rDNS labels, Geofeeds, and IXP metadata. These sources provide useful hints, but they report point locations without calibrated confidence, leaving researchers unable to tell whether a geographic path is trustworthy. We introduce Path Consistency Scoring (PCS), a passive framework that evaluates router geolocation as a path-level consistency problem. PCS models each traceroute as a sequence of candidate city-level locations and uses a Hidden Markov Model to fuse local evidence with speed-of-light constraints and empirical latency priors. PCS produces a path consistency score summarizing how well metadata and observed RTT increments support a coherent geographic interpretation. Because this score is only meaningful when latency proxies for geography, we also define a Path-Model Alignment metric that compares speed-of-light residual increments of the decoded path against a reference path. We evaluate on 413,354 RIPE Atlas traceroutes and a 6,555-path subset verified by active probing. On validated paths, 94.2% of decoded sequences achieve mean error below 200 km. PCS is largely GeoDB-agnostic; median scores vary by less than 5% across four commercial databases, while the alignment metric reveals that over half of DB-IP and IP2Location paths require substantial correction, compared with 15% for IPinfo. This lets downstream analyses quantify confidence in their geographic conclusions rather than inheriting database accuracy without qualification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PCS offers a workable way to score path-level geolocation consistency via HMMs, but the value hinges on the latency-as-geo proxy and the evaluation leaves some parameterization details open.

read the letter

The main takeaway is that this work supplies a concrete scoring system for how consistent a set of geolocation labels looks along a traceroute, using an HMM that folds in speed-of-light bounds and latency priors. That is the actual novelty: the Path Consistency Score plus the separate Path-Model Alignment check that tests whether the latency-geo assumption holds for a given path.

The evaluation is the strongest part. They ran it on 413k RIPE Atlas traces and a 6.5k-path actively verified subset, reporting that 94.2% of the decoded paths stay under 200 km mean error on the verified slice. The claim that scores stay stable across four commercial databases is also useful; it shows the method is not just re-ranking one particular GeoDB.

Two soft spots stand out. First, the abstract and available description give little on how the HMM transition and emission probabilities were chosen or tuned; without that, it is hard to judge how much the result depends on the empirical priors. Second, the headline 94.2% figure is only on the verified subset, so the performance on the full corpus remains less clear. The alignment metric is a reasonable external check, but it does not remove the need for those details.

This is for people who already run large traceroute studies and need a way to filter or weight geographic claims rather than take database output at face value. It is narrow but the method is reproducible enough that a serious referee could check the parameterization and the verification procedure. I would send it to review.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Path Consistency Scoring (PCS), a passive framework that models traceroute paths as sequences of candidate city-level locations and applies a Hidden Markov Model to combine local geolocation evidence with speed-of-light constraints and empirical latency priors, yielding a path-level consistency score. It also defines a Path-Model Alignment metric to test whether latency serves as a geographic proxy. Evaluation on 413,354 RIPE Atlas traceroutes and a 6,555-path actively verified subset reports that 94.2% of decoded sequences have mean error below 200 km; PCS scores are largely insensitive to the choice of commercial geolocation database (median variation <5%), while the alignment metric indicates that over half of DB-IP and IP2Location paths require substantial correction versus 15% for IPinfo.

Significance. If the central claims hold, the work supplies a practical, quantitative method for assessing confidence in geolocated Internet paths, directly addressing the absence of calibrated uncertainty in existing metadata sources. The scale of the evaluation, the use of an independently verified subset, the explicit precondition statement, and the external alignment check are notable strengths that could improve reliability in downstream measurement studies relying on geographic path data.

major comments (2)

[Abstract / Methods] Abstract and Methods: The HMM transition and emission probabilities are central to PCS, yet the abstract (and apparently the methods description) provides no information on how these probabilities were set or estimated from data. This detail is load-bearing for reproducibility and for interpreting the reported 94.2% low-error rate.
[Evaluation] Evaluation section: The 94.2% figure on the verified 6,555-path subset is presented without error bars, confidence intervals, or a clear description of the active-probing validation protocol. These omissions weaken the ability to assess the robustness of the central performance claim.

minor comments (2)

[Abstract] Abstract: The statement that PCS is 'largely GeoDB-agnostic' is quantified only via median score variation; reporting the full distribution or inter-quartile ranges across the four databases would improve clarity.
[Evaluation] The Path-Model Alignment metric is introduced as an external check, but the manuscript could more explicitly compare its results against the PCS scores to illustrate how the two metrics interact on the same paths.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. The comments correctly identify areas where additional detail will improve reproducibility and the interpretability of our results. We address each major comment below.

read point-by-point responses

Referee: [Abstract / Methods] Abstract and Methods: The HMM transition and emission probabilities are central to PCS, yet the abstract (and apparently the methods description) provides no information on how these probabilities were set or estimated from data. This detail is load-bearing for reproducibility and for interpreting the reported 94.2% low-error rate.

Authors: We agree that the current description of how the HMM parameters were obtained is insufficient for full reproducibility. Transition probabilities are estimated via maximum likelihood from empirical latency-increment distributions observed across the RIPE Atlas corpus, while emission probabilities combine per-database confidence scores with speed-of-light feasibility checks. To resolve the concern we will (1) add a concise statement to the abstract and (2) insert an explicit subsection in Methods that reports the estimation procedure, the data subset used for fitting, and the resulting parameter values. These changes will make the 94.2 % figure more readily interpretable. revision: yes
Referee: [Evaluation] Evaluation section: The 94.2% figure on the verified 6,555-path subset is presented without error bars, confidence intervals, or a clear description of the active-probing validation protocol. These omissions weaken the ability to assess the robustness of the central performance claim.

Authors: The referee is correct that uncertainty quantification and protocol details are missing. The 6,555-path subset was obtained by issuing additional active probes from multiple RIPE Atlas vantage points and confirming city-level locations via latency triangulation against known landmarks. We will expand the Evaluation section with a step-by-step description of this validation protocol and will report bootstrap confidence intervals around the 94.2 % statistic. These additions directly address the robustness concern. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper defines PCS via an HMM that incorporates speed-of-light constraints and empirical latency priors, then reports performance on a 6,555-path actively verified subset separate from the main 413k RIPE corpus. The Path-Model Alignment metric is introduced explicitly to test the latency-as-geography precondition rather than assuming it. No equation or step is shown reducing a claimed prediction or score to a fit performed on the identical data being scored, nor does any load-bearing claim rest solely on a self-citation chain. The derivation therefore remains self-contained against the external validation set and alignment check.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; the central claim rests on an HMM that fuses local evidence with speed-of-light and latency priors, but no explicit free parameters, axioms, or invented entities are enumerated beyond the stated precondition.

axioms (1)

domain assumption Latency can serve as a proxy for geography
Explicitly required for the consistency score to be meaningful.

pith-pipeline@v0.9.1-grok · 5797 in / 1303 out tokens · 23670 ms · 2026-06-25T22:37:18.222732+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 20 canonical work pages

[1]

Bischof, Alberto Dain- otti, and Paul Barford

Scott Anderson, Loqman Salamatian, Zachary S. Bischof, Alberto Dain- otti, and Paul Barford. 2022. iGDB: connecting the physical and logical layers of the internet. InProceedings of the 22nd ACM Internet Measure- ment Conference(Nice, France)(IMC ’22). Association for Computing Machinery, New York, NY, USA, 433–448. doi:10.1145/3517745.3561443

work page doi:10.1145/3517745.3561443 2022
[2]

Jeff A. Bilmes. 1998.A Gentle Tutorial of the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report. International Computer Science In- stitute. https://www.cs.cmu.edu/~aarti/Class/10701/readings/gentle_ tut_HMM.pdf

1998
[3]

Wang, Mia Weaver, Fabián E

Esteban Carisimo, Caleb J. Wang, Mia Weaver, Fabián E. Bustamante, and Paul Barford. 2023. A Hop Away from Everywhere: A View of the Intercontinental Long-haul Infrastructure.Proc. ACM Meas. Anal. Comput. Syst., Article 47 (dec 2023), 26 pages

2023
[4]

Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christo- pher Frost, J

James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christo- pher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christo- pher Heiser, Peter Hochschild, et al. 2013. Spanner: Google’s globally distributed database.ACM Transactions on Computer Systems31, 3 (2013), 1–22

2013
[5]

Alberto Dainotti, Walter de Donato, Antonio Pescape, and Pierluigi Salvo Rossi. 2008. Classification of Network Traffic via Packet-Level Hidden Markov Models. InIEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference. 1–5. doi:10.1109/GLOCOM.2008.ECP. 412

work page doi:10.1109/glocom.2008.ecp 2008
[7]

Omar Darwich, Hugo Rimlinger, Milo Dreyfus, Matthieu Gouel, and Kevin Vermeulen. 2023. Replication: Towards a Publicly Available Internet Scale IP Geolocation Dataset. InProceedings of the 2023 ACM on Internet Measurement Conference(Montreal QC, Canada)(IMC ’23). Association for Computing Machinery, New York, NY, USA, 1–15. doi:10.1145/3618257.3624801

work page doi:10.1145/3618257.3624801 2023
[8]

Alun Davies. 2017. Anchoring Measurements: Bringing Back the Balance. https://labs.ripe.net/author/alun_davies/anchoring- measurements-bringing-back-the-balance/. RIPE Labs article

2017
[9]

DB-IP. 2025. DB-IP Geolocation Data. https://db-ip.com/db/

2025
[10]

Perera, Rajarathnam Chandramouli, and K.P

Ziqian Dong, Rohan D.W. Perera, Rajarathnam Chandramouli, and K.P. Subbalakshmi. 2012. Network measurement based modeling and optimization for IP geolocation.Computer Networks56, 1 (2012), 85–98. doi:10.1016/j.comnet.2011.08.011

work page doi:10.1016/j.comnet.2011.08.011 2012
[11]

Benoit Donnet, Matthew Luckie, Pascal Mérindol, and Jean-Jacques Pansiot. 2012. Revealing MPLS tunnels obscured from traceroute. ACM SIGCOMM Computer Communication Review42, 2 (March 2012), 87–93

2012
[12]

Ben Du, Massimo Candela, Bradley Huffaker, Alex C Snoeren, and KC Claffy. 2020. RIPE IPmap active geolocation: Mechanism and performance evaluation.ACM SIGCOMM Computer Communication Review(2020)

2020
[13]

Snoeren, and kc claffy

Ben Du, Massimo Candela, Bradley Huffaker, Alex C. Snoeren, and kc claffy. 2020. RIPE IPmap active geolocation: mechanism and perfor- mance evaluation.SIGCOMM Comput. Commun. Rev.50, 2 (May 2020), 3–10. doi:10.1145/3402413.3402415

work page doi:10.1145/3402413.3402415 2020
[14]

Yariv Ephraim and Neri Merhav. 2002. Hidden Markov Processes.IEEE Transactions on Information Theory48, 6 (2002), 1518–1569. doi:10. 1109/TIT.2002.1003838

arXiv 2002
[15]

Romain Fontugne, Cristel Pelsser, Emile Aben, and Randy Bush. 2017. Pinpointing delay and forwarding anomalies using large-scale tracer- oute measurements. InProc. of IMC

2017
[16]

Center for Applied Internet Data Analysis (CAIDA). 2025. CYMRU Bogon Reference Dataset (historical and daily bogons and fullbogons) - Center for Applied Internet Data Analysis (CAIDA) / Team Cymru. https://publicdata.caida.org/datasets/bogon/. Accessed: 2025-11-05

2025
[17]

Manaf Gharaibeh, Anant Shah, Bradley Huffaker, Han Zhang, Roya Ensafi, and Christos Papadopoulos. 2017. A look at router geolocation in public and commercial databases. InProc. of IMC. 13 Manuscript under review, 2026 Santiago Klein, Caleb J. Wang, and Fabián E. Bustamante

2017
[18]

Manaf Gharaibeh, Anant Shah, Bradley Huffaker, Han Zhang, Roya Ensafi, and Christos Papadopoulos. 2017. A look at router geoloca- tion in public and commercial databases. InProceedings of the 2017 Internet Measurement Conference(London, United Kingdom)(IMC ’17). Association for Computing Machinery, New York, NY, USA, 463–469. doi:10.1145/3131365.3131380

work page doi:10.1145/3131365.3131380 2017
[19]

Bamba Gueye, Artur Ziviani, Mark Crovella, and Serge Fdida. 2004. Constraint-based geolocation of internet hosts. InProceedings of the 4th ACM SIGCOMM Conference on Internet Measurement(Taormina, Sicily, Italy)(IMC ’04). Association for Computing Machinery, New York, NY, USA, 288–293. doi:10.1145/1028788.1028828

work page doi:10.1145/1028788.1028828 2004
[20]

Naohiro Hayashibara, Xavier Defago, Rami Yared, and Takuya Katayama. 2004. The 𝜑 accrual failure detector. InProc. IEEE Sympo- sium on Reliable Distributed Systems (SRDS). IEEE, 66–78

2004
[21]

Zi Hu, John Heidemann, and Yuri Pradkin. 2012. Towards geoloca- tion of millions of IP addresses. InProceedings of the 2012 Internet Measurement Conference(Boston, Massachusetts, USA)(IMC ’12). As- sociation for Computing Machinery, New York, NY, USA, 123–130. doi:10.1145/2398776.2398790

work page doi:10.1145/2398776.2398790 2012
[22]

Bradley Huffaker, Marina Fomenkov, and kc claffy. 2014. DRoP: DNS- based router positioning.SIGCOMM Comput. Commun. Rev.44, 3 (July 2014), 5–13. doi:10.1145/2656877.2656879

work page doi:10.1145/2656877.2656879 2014
[23]

IP2Location. 2025. IP2Location Geolocation Data. https://www. ip2location.com/databases

2025
[24]

Inc. IPinfo. 2025. Bogon IP Address Ranges. https://ipinfo.io/bogon. Accessed: 2025-11-05

2025
[25]

Inc. IPinfo. 2025. IPinfo: IP Data Intelligence for Developers & Enter- prises. https://ipinfo.io. Accessed: 2025-11-05

2025
[27]

John, Arvind Krishnamurthy, David Wetherall, Thomas Anderson, and Yatin Chawathe

Ethan Katz-Bassett, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas Anderson, and Yatin Chawathe. 2006. Towards IP geolocation using delay and topology measurements. InProceedings of the 6th ACM SIGCOMM Conference on Internet Measurement(Rio de Janeriro, Brazil)(IMC ’06). Association for Computing Machinery, New York, NY, USA, 71–84. doi:10.11...

work page doi:10.1145/1177080.1177090 2006
[28]

Ioana Livadariu, Kevin Vermeulen, Maxime Mouchet, and Vasilis Giot- sas. 2024. Geofeeds: Revolutionizing IP Geolocation or Illusionary Promises?Proc. ACM Netw.2, CoNEXT3, Article 15 (Aug. 2024), 21 pages. doi:10.1145/3676869

work page doi:10.1145/3676869 2024
[29]

Matthew Luckie, Bradley Huffaker, Alexander Marder, Zachary Bischof, Marianne Fletcher, and K Claffy. 2021. Learning to extract geographic information from internet router hostnames. InProc. of CoNEXT

2021
[30]

Matthew Luckie, Bradley Huffaker, Alexander Marder, Zachary Bischof, Marianne Fletcher, and K Claffy. 2021. Learning to extract geographic information from internet router hostnames. InProceed- ings of the 17th International Conference on Emerging Networking EX- periments and Technologies(Virtual Event, Germany)(CoNEXT ’21). Association for Computing Mach...

work page doi:10.1145/3485983.3494869 2021
[31]

Justin Ma, Kirill Levchenko, Christian Kreibich, Stefan Savage, and Geoffrey M. Voelker. 2006. Unexpected means of protocol inference. In Proceedings of the 6th ACM SIGCOMM Conference on Internet Measure- ment(Rio de Janeriro, Brazil)(IMC ’06). Association for Computing Machinery, New York, NY, USA, 313–326. doi:10.1145/1177080.1177123

work page doi:10.1145/1177080.1177123 2006
[32]

Geolocate much?

Lorenzo Ariemma Massimo Candela, Emanuele Candela. 2025. Ge- ofeeds Registry "Geolocate much?". https://geolocatemuch.com/

2025
[33]

Maxmind. 2025. Maxmind Geolocation Data. https://www.maxmind. com/en/geoip2-services-and-databases

2025
[34]

Maxime Mouchet, Sandrine Vaton, Thierry Chonavel, Emile Aben, and Jasper Den Hertog. 2020. Large-Scale Characterization and Segmenta- tion of Internet Path Delays With Infinite HMMs.IEEE Access8 (2020), 16771–16784. doi:10.1109/ACCESS.2020.2968380

work page doi:10.1109/access.2020.2968380 2020
[35]

Gerhard Münz, Hui Dai, Lothar Braun, and Georg Carle. 2010. TCP Traffic Classification Using Markov Models. InTraffic Monitoring and Analysis, Fabio Ricciato, Marco Mellia, and Ernst Biersack (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 127–140

2010
[36]

Padmanabhan and Lakshminarayanan Subramanian

Venkata N. Padmanabhan and Lakshminarayanan Subramanian. 2001. An investigation of geographic mapping techniques for internet hosts. SIGCOMM Comput. Commun. Rev.31, 4 (Aug. 2001), 173–185. doi:10. 1145/964723.383073

arXiv 2001
[37]

Vern Paxson. 1996. End-to-End Routing Behavior in the Internet. In Proc. of ACM SIGCOMM

1996
[38]

PeeringDB. 2023. PeeringDB. https://www.peeringdb.com/. Dates used: 2023-12-08. Accessed: 2023-12-08

2023
[39]

Ingmar Poese, Steve Uhlig, Mohamed Ali Kaafar, Benoit Donnet, and Bamba Gueye. 2011. IP Geolocation Databases: Unreliable? 41, 2 (April 2011)

2011
[40]

Ingmar Poese, Steve Uhlig, Mohamed Ali Kaafar, Benoit Donnet, and Bamba Gueye. 2011. IP geolocation databases: unreliable?SIGCOMM Comput. Commun. Rev.41, 2 (April 2011), 53–56. doi:10.1145/1971162. 1971171

work page doi:10.1145/1971162 2011
[41]

Rabiner and Biing-Hwang Juang

Lawrence R. Rabiner and Biing-Hwang Juang. 1986. An Introduction to Hidden Markov Models.IEEE ASSP Magazine3, 1 (1986), 4–16. doi:10.1109/MASSP.1986.1165342

work page doi:10.1109/massp.1986.1165342 1986
[42]

Alagappan Ramanathan and Sangeetha Abdu Jyothi. 2023. Nautilus: A Framework for Cross-Layer Cartography of Submarine Cables and IP Links.Proc. ACM Meas. Anal. Comput. Syst.7, 3 (dec 2023)

2023
[43]

Hugo Rimlinger, Olivier Fourmaux, Timur Friedman, and Kevin Ver- meulen. 2025. GeoResolver: An Accurate, Scalable, and Explainable Geolocation Technique Using DNS Redirection.Proc. ACM Netw.3, CoNEXT3, Article 19 (Sept. 2025), 21 pages. doi:10.1145/3749219

work page doi:10.1145/3749219 2025
[44]

Quirin Scheitle, Oliver Gasser, Patrick Sattler, and Georg Carle. 2017. HLOC: Hints-based geolocation leveraging multiple measurement frameworks. InProc. of TMA

2017
[45]

Ankit Singla, Balakrishnan Chandrasekaran, P Brighten Godfrey, and Bruce Maggs. 2014. The internet at the speed of light. InProc. of HotNets

2014
[46]

Joel Sommers, Paul Barford, and Brian Eriksson. 2011. On the preva- lence and characteristics of MPLS deployments in the open internet. InProc. of IMC

2011
[47]

Bustamante

Kedar Thiagarajan, Esteban Carisimo, and Fabián E. Bustamante. 2025. The Aleph: Decoding DNS PTR Records With Large Language Models. InACM CoNEXT

2025
[48]

Bustamante

Kedar Thiagarajan, Esteban Carisimo, and Fabián E. Bustamante. 2025. The Aleph: Decoding Geographic Information from DNS PTR Records Using Large Language Models.Proc. ACM Netw.3, CoNEXT1, Article 7 (March 2025), 20 pages. doi:10.1145/3709374

work page doi:10.1145/3709374 2025
[49]

Yves Vanaubel, Pascal Mérindol, Jean-Jacques Pansiot, and Benoit Donnet. 2017. Through the wormhole: tracking invisible MPLS tunnels. InProc. of IMC

2017
[50]

Bustamante

Caleb Wang, Ying Zhang, Qianli Dong, Esteban Carisimo, Ramakr- ishnan Durairajan, and Fabián E. Bustamante. 2025. Threading the Ocean: Mapping Digital Routes Across Submarine Cables us- ing Calypso. InProceedings of the ACM SIGCOMM 2025 Conference (São Francisco Convent, Coimbra, Portugal)(SIGCOMM ’25). Asso- ciation for Computing Machinery, New York, NY,...

work page doi:10.1145/3718958.3750512 2025
[51]

Wei Wei, Bing Wang, Don Towsley, and Jim Kurose. 2011. Model-Based Identification of Dominant Congested Links.IEEE/ACM Transactions on Networking19, 2 (2011), 456–469. doi:10.1109/TNET.2010.2068058 A ETHICS This work does not raise any ethical issues. B MODEL PARAMETERS We use one fixed parameter configuration for all valida- tion and evaluation decodes. ...

work page doi:10.1109/tnet.2010.2068058 2011

[1] [1]

Bischof, Alberto Dain- otti, and Paul Barford

Scott Anderson, Loqman Salamatian, Zachary S. Bischof, Alberto Dain- otti, and Paul Barford. 2022. iGDB: connecting the physical and logical layers of the internet. InProceedings of the 22nd ACM Internet Measure- ment Conference(Nice, France)(IMC ’22). Association for Computing Machinery, New York, NY, USA, 433–448. doi:10.1145/3517745.3561443

work page doi:10.1145/3517745.3561443 2022

[2] [2]

Jeff A. Bilmes. 1998.A Gentle Tutorial of the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report. International Computer Science In- stitute. https://www.cs.cmu.edu/~aarti/Class/10701/readings/gentle_ tut_HMM.pdf

1998

[3] [3]

Wang, Mia Weaver, Fabián E

Esteban Carisimo, Caleb J. Wang, Mia Weaver, Fabián E. Bustamante, and Paul Barford. 2023. A Hop Away from Everywhere: A View of the Intercontinental Long-haul Infrastructure.Proc. ACM Meas. Anal. Comput. Syst., Article 47 (dec 2023), 26 pages

2023

[4] [4]

Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christo- pher Frost, J

James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christo- pher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christo- pher Heiser, Peter Hochschild, et al. 2013. Spanner: Google’s globally distributed database.ACM Transactions on Computer Systems31, 3 (2013), 1–22

2013

[5] [5]

Alberto Dainotti, Walter de Donato, Antonio Pescape, and Pierluigi Salvo Rossi. 2008. Classification of Network Traffic via Packet-Level Hidden Markov Models. InIEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference. 1–5. doi:10.1109/GLOCOM.2008.ECP. 412

work page doi:10.1109/glocom.2008.ecp 2008

[6] [7]

Omar Darwich, Hugo Rimlinger, Milo Dreyfus, Matthieu Gouel, and Kevin Vermeulen. 2023. Replication: Towards a Publicly Available Internet Scale IP Geolocation Dataset. InProceedings of the 2023 ACM on Internet Measurement Conference(Montreal QC, Canada)(IMC ’23). Association for Computing Machinery, New York, NY, USA, 1–15. doi:10.1145/3618257.3624801

work page doi:10.1145/3618257.3624801 2023

[7] [8]

Alun Davies. 2017. Anchoring Measurements: Bringing Back the Balance. https://labs.ripe.net/author/alun_davies/anchoring- measurements-bringing-back-the-balance/. RIPE Labs article

2017

[8] [9]

DB-IP. 2025. DB-IP Geolocation Data. https://db-ip.com/db/

2025

[9] [10]

Perera, Rajarathnam Chandramouli, and K.P

Ziqian Dong, Rohan D.W. Perera, Rajarathnam Chandramouli, and K.P. Subbalakshmi. 2012. Network measurement based modeling and optimization for IP geolocation.Computer Networks56, 1 (2012), 85–98. doi:10.1016/j.comnet.2011.08.011

work page doi:10.1016/j.comnet.2011.08.011 2012

[10] [11]

Benoit Donnet, Matthew Luckie, Pascal Mérindol, and Jean-Jacques Pansiot. 2012. Revealing MPLS tunnels obscured from traceroute. ACM SIGCOMM Computer Communication Review42, 2 (March 2012), 87–93

2012

[11] [12]

Ben Du, Massimo Candela, Bradley Huffaker, Alex C Snoeren, and KC Claffy. 2020. RIPE IPmap active geolocation: Mechanism and performance evaluation.ACM SIGCOMM Computer Communication Review(2020)

2020

[12] [13]

Snoeren, and kc claffy

Ben Du, Massimo Candela, Bradley Huffaker, Alex C. Snoeren, and kc claffy. 2020. RIPE IPmap active geolocation: mechanism and perfor- mance evaluation.SIGCOMM Comput. Commun. Rev.50, 2 (May 2020), 3–10. doi:10.1145/3402413.3402415

work page doi:10.1145/3402413.3402415 2020

[13] [14]

Yariv Ephraim and Neri Merhav. 2002. Hidden Markov Processes.IEEE Transactions on Information Theory48, 6 (2002), 1518–1569. doi:10. 1109/TIT.2002.1003838

arXiv 2002

[14] [15]

Romain Fontugne, Cristel Pelsser, Emile Aben, and Randy Bush. 2017. Pinpointing delay and forwarding anomalies using large-scale tracer- oute measurements. InProc. of IMC

2017

[15] [16]

Center for Applied Internet Data Analysis (CAIDA). 2025. CYMRU Bogon Reference Dataset (historical and daily bogons and fullbogons) - Center for Applied Internet Data Analysis (CAIDA) / Team Cymru. https://publicdata.caida.org/datasets/bogon/. Accessed: 2025-11-05

2025

[16] [17]

Manaf Gharaibeh, Anant Shah, Bradley Huffaker, Han Zhang, Roya Ensafi, and Christos Papadopoulos. 2017. A look at router geolocation in public and commercial databases. InProc. of IMC. 13 Manuscript under review, 2026 Santiago Klein, Caleb J. Wang, and Fabián E. Bustamante

2017

[17] [18]

Manaf Gharaibeh, Anant Shah, Bradley Huffaker, Han Zhang, Roya Ensafi, and Christos Papadopoulos. 2017. A look at router geoloca- tion in public and commercial databases. InProceedings of the 2017 Internet Measurement Conference(London, United Kingdom)(IMC ’17). Association for Computing Machinery, New York, NY, USA, 463–469. doi:10.1145/3131365.3131380

work page doi:10.1145/3131365.3131380 2017

[18] [19]

Bamba Gueye, Artur Ziviani, Mark Crovella, and Serge Fdida. 2004. Constraint-based geolocation of internet hosts. InProceedings of the 4th ACM SIGCOMM Conference on Internet Measurement(Taormina, Sicily, Italy)(IMC ’04). Association for Computing Machinery, New York, NY, USA, 288–293. doi:10.1145/1028788.1028828

work page doi:10.1145/1028788.1028828 2004

[19] [20]

Naohiro Hayashibara, Xavier Defago, Rami Yared, and Takuya Katayama. 2004. The 𝜑 accrual failure detector. InProc. IEEE Sympo- sium on Reliable Distributed Systems (SRDS). IEEE, 66–78

2004

[20] [21]

Zi Hu, John Heidemann, and Yuri Pradkin. 2012. Towards geoloca- tion of millions of IP addresses. InProceedings of the 2012 Internet Measurement Conference(Boston, Massachusetts, USA)(IMC ’12). As- sociation for Computing Machinery, New York, NY, USA, 123–130. doi:10.1145/2398776.2398790

work page doi:10.1145/2398776.2398790 2012

[21] [22]

Bradley Huffaker, Marina Fomenkov, and kc claffy. 2014. DRoP: DNS- based router positioning.SIGCOMM Comput. Commun. Rev.44, 3 (July 2014), 5–13. doi:10.1145/2656877.2656879

work page doi:10.1145/2656877.2656879 2014

[22] [23]

IP2Location. 2025. IP2Location Geolocation Data. https://www. ip2location.com/databases

2025

[23] [24]

Inc. IPinfo. 2025. Bogon IP Address Ranges. https://ipinfo.io/bogon. Accessed: 2025-11-05

2025

[24] [25]

Inc. IPinfo. 2025. IPinfo: IP Data Intelligence for Developers & Enter- prises. https://ipinfo.io. Accessed: 2025-11-05

2025

[25] [27]

John, Arvind Krishnamurthy, David Wetherall, Thomas Anderson, and Yatin Chawathe

Ethan Katz-Bassett, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas Anderson, and Yatin Chawathe. 2006. Towards IP geolocation using delay and topology measurements. InProceedings of the 6th ACM SIGCOMM Conference on Internet Measurement(Rio de Janeriro, Brazil)(IMC ’06). Association for Computing Machinery, New York, NY, USA, 71–84. doi:10.11...

work page doi:10.1145/1177080.1177090 2006

[26] [28]

Ioana Livadariu, Kevin Vermeulen, Maxime Mouchet, and Vasilis Giot- sas. 2024. Geofeeds: Revolutionizing IP Geolocation or Illusionary Promises?Proc. ACM Netw.2, CoNEXT3, Article 15 (Aug. 2024), 21 pages. doi:10.1145/3676869

work page doi:10.1145/3676869 2024

[27] [29]

Matthew Luckie, Bradley Huffaker, Alexander Marder, Zachary Bischof, Marianne Fletcher, and K Claffy. 2021. Learning to extract geographic information from internet router hostnames. InProc. of CoNEXT

2021

[28] [30]

Matthew Luckie, Bradley Huffaker, Alexander Marder, Zachary Bischof, Marianne Fletcher, and K Claffy. 2021. Learning to extract geographic information from internet router hostnames. InProceed- ings of the 17th International Conference on Emerging Networking EX- periments and Technologies(Virtual Event, Germany)(CoNEXT ’21). Association for Computing Mach...

work page doi:10.1145/3485983.3494869 2021

[29] [31]

Justin Ma, Kirill Levchenko, Christian Kreibich, Stefan Savage, and Geoffrey M. Voelker. 2006. Unexpected means of protocol inference. In Proceedings of the 6th ACM SIGCOMM Conference on Internet Measure- ment(Rio de Janeriro, Brazil)(IMC ’06). Association for Computing Machinery, New York, NY, USA, 313–326. doi:10.1145/1177080.1177123

work page doi:10.1145/1177080.1177123 2006

[30] [32]

Geolocate much?

Lorenzo Ariemma Massimo Candela, Emanuele Candela. 2025. Ge- ofeeds Registry "Geolocate much?". https://geolocatemuch.com/

2025

[31] [33]

Maxmind. 2025. Maxmind Geolocation Data. https://www.maxmind. com/en/geoip2-services-and-databases

2025

[32] [34]

Maxime Mouchet, Sandrine Vaton, Thierry Chonavel, Emile Aben, and Jasper Den Hertog. 2020. Large-Scale Characterization and Segmenta- tion of Internet Path Delays With Infinite HMMs.IEEE Access8 (2020), 16771–16784. doi:10.1109/ACCESS.2020.2968380

work page doi:10.1109/access.2020.2968380 2020

[33] [35]

Gerhard Münz, Hui Dai, Lothar Braun, and Georg Carle. 2010. TCP Traffic Classification Using Markov Models. InTraffic Monitoring and Analysis, Fabio Ricciato, Marco Mellia, and Ernst Biersack (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 127–140

2010

[34] [36]

Padmanabhan and Lakshminarayanan Subramanian

Venkata N. Padmanabhan and Lakshminarayanan Subramanian. 2001. An investigation of geographic mapping techniques for internet hosts. SIGCOMM Comput. Commun. Rev.31, 4 (Aug. 2001), 173–185. doi:10. 1145/964723.383073

arXiv 2001

[35] [37]

Vern Paxson. 1996. End-to-End Routing Behavior in the Internet. In Proc. of ACM SIGCOMM

1996

[36] [38]

PeeringDB. 2023. PeeringDB. https://www.peeringdb.com/. Dates used: 2023-12-08. Accessed: 2023-12-08

2023

[37] [39]

Ingmar Poese, Steve Uhlig, Mohamed Ali Kaafar, Benoit Donnet, and Bamba Gueye. 2011. IP Geolocation Databases: Unreliable? 41, 2 (April 2011)

2011

[38] [40]

Ingmar Poese, Steve Uhlig, Mohamed Ali Kaafar, Benoit Donnet, and Bamba Gueye. 2011. IP geolocation databases: unreliable?SIGCOMM Comput. Commun. Rev.41, 2 (April 2011), 53–56. doi:10.1145/1971162. 1971171

work page doi:10.1145/1971162 2011

[39] [41]

Rabiner and Biing-Hwang Juang

Lawrence R. Rabiner and Biing-Hwang Juang. 1986. An Introduction to Hidden Markov Models.IEEE ASSP Magazine3, 1 (1986), 4–16. doi:10.1109/MASSP.1986.1165342

work page doi:10.1109/massp.1986.1165342 1986

[40] [42]

Alagappan Ramanathan and Sangeetha Abdu Jyothi. 2023. Nautilus: A Framework for Cross-Layer Cartography of Submarine Cables and IP Links.Proc. ACM Meas. Anal. Comput. Syst.7, 3 (dec 2023)

2023

[41] [43]

Hugo Rimlinger, Olivier Fourmaux, Timur Friedman, and Kevin Ver- meulen. 2025. GeoResolver: An Accurate, Scalable, and Explainable Geolocation Technique Using DNS Redirection.Proc. ACM Netw.3, CoNEXT3, Article 19 (Sept. 2025), 21 pages. doi:10.1145/3749219

work page doi:10.1145/3749219 2025

[42] [44]

Quirin Scheitle, Oliver Gasser, Patrick Sattler, and Georg Carle. 2017. HLOC: Hints-based geolocation leveraging multiple measurement frameworks. InProc. of TMA

2017

[43] [45]

Ankit Singla, Balakrishnan Chandrasekaran, P Brighten Godfrey, and Bruce Maggs. 2014. The internet at the speed of light. InProc. of HotNets

2014

[44] [46]

Joel Sommers, Paul Barford, and Brian Eriksson. 2011. On the preva- lence and characteristics of MPLS deployments in the open internet. InProc. of IMC

2011

[45] [47]

Bustamante

Kedar Thiagarajan, Esteban Carisimo, and Fabián E. Bustamante. 2025. The Aleph: Decoding DNS PTR Records With Large Language Models. InACM CoNEXT

2025

[46] [48]

Bustamante

Kedar Thiagarajan, Esteban Carisimo, and Fabián E. Bustamante. 2025. The Aleph: Decoding Geographic Information from DNS PTR Records Using Large Language Models.Proc. ACM Netw.3, CoNEXT1, Article 7 (March 2025), 20 pages. doi:10.1145/3709374

work page doi:10.1145/3709374 2025

[47] [49]

Yves Vanaubel, Pascal Mérindol, Jean-Jacques Pansiot, and Benoit Donnet. 2017. Through the wormhole: tracking invisible MPLS tunnels. InProc. of IMC

2017

[48] [50]

Bustamante

Caleb Wang, Ying Zhang, Qianli Dong, Esteban Carisimo, Ramakr- ishnan Durairajan, and Fabián E. Bustamante. 2025. Threading the Ocean: Mapping Digital Routes Across Submarine Cables us- ing Calypso. InProceedings of the ACM SIGCOMM 2025 Conference (São Francisco Convent, Coimbra, Portugal)(SIGCOMM ’25). Asso- ciation for Computing Machinery, New York, NY,...

work page doi:10.1145/3718958.3750512 2025

[49] [51]

Wei Wei, Bing Wang, Don Towsley, and Jim Kurose. 2011. Model-Based Identification of Dominant Congested Links.IEEE/ACM Transactions on Networking19, 2 (2011), 456–469. doi:10.1109/TNET.2010.2068058 A ETHICS This work does not raise any ethical issues. B MODEL PARAMETERS We use one fixed parameter configuration for all valida- tion and evaluation decodes. ...

work page doi:10.1109/tnet.2010.2068058 2011