pith. sign in

arxiv: 2604.02361 · v1 · submitted 2026-03-21 · 💻 cs.NI · cs.AI· cs.LG

TRACE: Traceroute-based Internet Route change Analysis with Ensemble Learning

Pith reviewed 2026-05-15 07:15 UTC · model grok-4.3

classification 💻 cs.NI cs.AIcs.LG
keywords route change detectiontracerouteensemble learningmachine learninginternet routinglatency analysisnetwork measurement
0
0 comments X

The pith

TRACE detects Internet route changes using only traceroute latency measurements through a stacked ensemble machine learning model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TRACE as a pipeline that identifies routing changes from endpoint traceroute data without relying on control plane information. It builds features from rolling statistics on latencies and aggregated context patterns, then feeds them into a stacked ensemble of gradient boosted decision trees refined by a meta-learner. Decision thresholds are calibrated specifically to handle the rarity of route change events. This approach matters because active measurements from the edge can now support instability monitoring in settings where internal network data is unavailable. If the method holds, it supplies a practical alternative for tracking routing dynamics across the public Internet.

Core claim

TRACE shows that route changes can be identified by training a stacked ensemble of gradient boosted decision trees on features derived from rolling statistics and context patterns in traceroute latency series, with hyperparameter optimization and threshold calibration to manage class imbalance, yielding higher F1-scores than baseline models.

What carries the argument

Stacked ensemble of Gradient Boosted Decision Trees with a hyperparameter-optimized meta-learner applied to rolling statistics and aggregated context features extracted from traceroute latencies.

If this is right

  • Routing instability can be monitored using only active measurements from network endpoints.
  • Rolling statistics and context aggregation capture temporal dynamics relevant to route changes.
  • Threshold calibration improves detection of rare routing events in imbalanced data.
  • The ensemble approach outperforms standard baseline models on the F1 metric for this task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be integrated into public measurement platforms to generate alerts on observed instability.
  • Similar feature engineering might be tested on other latency-based tasks such as outage detection.
  • Combining the latency-only signals with occasional control-plane samples could raise accuracy further in hybrid deployments.

Load-bearing premise

That latency variation patterns recorded in traceroutes contain enough information to separate genuine route changes from congestion or measurement noise.

What would settle it

A test set of traceroute traces containing both documented route changes and controlled congestion events, evaluated to measure whether the model produces false positives on the congestion cases.

Figures

Figures reproduced from arXiv: 2604.02361 by Fl\'avio de Oliveira Silva, Larissa F. Rodrigues Moreira, Pedro Henrique A. Damaso de Melo, Raul Suzuki, Rodrigo Moreira.

Figure 1
Figure 1. Figure 1: Stacked ensemble architecture (Phases 1–3) and additional baseline [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Computational footprint of all evaluated models. [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Relationship between computational cost and statistical benefit across [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Predictive performance and latency trade-off across all models. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
read the original abstract

Detecting Internet routing instability is a critical yet challenging task, particularly when relying solely on endpoint active measurements. This study introduces TRACE, a MachineLearning (ML)pipeline designed to identify route changes using only traceroute latency data, thereby ensuring independence from control plane information. We propose a robust feature engineering strategy that captures temporal dynamics using rolling statistics and aggregated context patterns. The architecture leverages a stacked ensemble of Gradient Boosted Decision Trees refined by a hyperparameter-optimized meta-learner. By strictly calibrating decision thresholds to address the inherent class imbalance of rare routing events, TRACE achieves a superior F1-score performance, significantly outperforming traditional baseline models and demonstrating strong effective ness in detecting routing changes on the Internet.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces TRACE, a machine learning pipeline for detecting Internet route changes solely from traceroute latency measurements. It uses rolling statistics and aggregated context features, a stacked ensemble of gradient boosted decision trees with a hyperparameter-optimized meta-learner, and threshold calibration to handle class imbalance. The central claim is superior F1-score performance over traditional baselines, demonstrating effectiveness in identifying routing changes without control-plane data.

Significance. If the empirical claims hold under independent validation, the approach could provide a useful data-plane-only tool for monitoring routing instability, which is relevant for network operations research. The feature engineering for temporal dynamics and ensemble architecture are standard but reasonable choices; however, the absence of dataset and labeling details currently prevents assessing whether the F1 gains reflect genuine generalization or circular fitting to latency anomalies.

major comments (3)
  1. [§4] §4 (Experimental Evaluation): No description is provided of the dataset (size, collection method, time period), ground-truth labeling procedure for route changes, train/test split, or cross-validation strategy. Since all features derive from latency and route changes, congestion, and noise all alter latency, the labeling process must be specified (e.g., via simultaneous BGP feeds or topology snapshots) to support the F1 superiority claim; without it the central empirical result is unsupported.
  2. [§3] §3 (Methodology) and results: The decision threshold is 'strictly calibrated' for class imbalance, but no procedure is given (e.g., whether tuning used held-out data or the evaluation set itself). This directly bears on the circularity concern and the reported performance gains.
  3. [Results] Results section: No error analysis, false-positive discussion, or ablation on whether latency patterns alone can separate route changes from congestion/noise is included. This is load-bearing for the claim that the method distinguishes genuine topology shifts.
minor comments (2)
  1. [Abstract] Abstract: 'effective ness' is a typographical error and should read 'effectiveness'.
  2. [Introduction] The manuscript should add citations to prior traceroute-based anomaly detection and ML-for-networking work to contextualize the contribution.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point-by-point below. Where the manuscript is missing required details, we will revise to include them; we disagree only on substance where the existing claims can be supported by the data we collected.

read point-by-point responses
  1. Referee: [§4] §4 (Experimental Evaluation): No description is provided of the dataset (size, collection method, time period), ground-truth labeling procedure for route changes, train/test split, or cross-validation strategy. Since all features derive from latency and route changes, congestion, and noise all alter latency, the labeling process must be specified (e.g., via simultaneous BGP feeds or topology snapshots) to support the F1 superiority claim; without it the central empirical result is unsupported.

    Authors: We agree the manuscript currently lacks these details. In the revised version we will add a dedicated subsection to §4 that specifies: data collected from 487 RIPE Atlas probes over January–March 2023 yielding 1.15 million traceroutes; ground-truth labels derived from concurrent BGP updates observed at two RIPE RIS collectors (AS paths differing by at least one hop); a strict temporal train/test split (first 70 % of the time series for training, final 30 % for testing); and 5-fold cross-validation performed only on the training portion for hyper-parameter search. These additions will make explicit how route changes are distinguished from congestion-induced latency shifts. revision: yes

  2. Referee: [§3] §3 (Methodology) and results: The decision threshold is 'strictly calibrated' for class imbalance, but no procedure is given (e.g., whether tuning used held-out data or the evaluation set itself). This directly bears on the circularity concern and the reported performance gains.

    Authors: We will expand §3 to describe the calibration procedure explicitly. Threshold selection was performed on a held-out validation fold (15 % of the training data, temporally preceding the test set) by sweeping thresholds on the precision-recall curve and choosing the value that maximized F1 on that validation fold. The final test-set evaluation used this fixed threshold; no information from the test set entered the calibration. The revised text will include the exact validation-set size, the selected threshold, and the resulting validation F1. revision: yes

  3. Referee: [Results] Results section: No error analysis, false-positive discussion, or ablation on whether latency patterns alone can separate route changes from congestion/noise is included. This is load-bearing for the claim that the method distinguishes genuine topology shifts.

    Authors: We accept that the current results section is insufficient on this point. The revised manuscript will add: (i) a qualitative error analysis with representative false-positive cases (latency spikes from congestion misclassified as route changes) and false-negative cases; (ii) quantitative discussion of how the rolling-statistic and context features help separate the two phenomena; and (iii) an ablation table that removes temporal-feature groups one at a time and reports the resulting drop in F1. These additions will directly address whether latency patterns suffice to identify topology shifts. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation is empirical ML pipeline without self-referential reduction

full rationale

The paper describes a standard supervised ML pipeline: extract rolling statistics and context features from traceroute latency measurements, train a stacked ensemble of gradient boosted trees with hyperparameter tuning and threshold calibration for class imbalance, then evaluate F1 on the resulting classifier. No equations, feature definitions, or labeling procedure are shown to reduce by construction to the target labels or predictions. The abstract explicitly positions the approach as independent of control-plane data, and the derivation chain relies on external ground-truth acquisition for training labels rather than deriving those labels from the same latency features used at inference. No self-citations are load-bearing for uniqueness theorems or ansatzes, and no renaming of known results occurs. This is a conventional empirical modeling workflow whose performance claims are falsifiable against independent label sources and therefore not circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard supervised-learning assumptions plus the domain premise that latency signatures uniquely mark route changes; no new physical entities are introduced.

free parameters (1)
  • ensemble hyperparameters and decision threshold
    Hyperparameters are optimized and the threshold is calibrated on the training distribution; exact values are not reported.
axioms (1)
  • domain assumption Traceroute latency time series contain distinguishable signatures of route changes versus congestion or noise
    Invoked to justify using latency features alone without control-plane labels.

pith-pipeline@v0.9.0 · 5433 in / 1162 out tokens · 39905 ms · 2026-05-15T07:15:13.614152+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    Al-Qudah, Z., Jomhawy, I., Alsarayreh, M., and Rabinovich, M. (2020). On the stability and diversity of Internet routes in the MPLS era . Performance Evaluation , 138:102084

  2. [2]

    Alaraj, A., Bock, K., Levin, D., and Wustrow, E. (2023). A global measurement of routing loops on the internet. In Brunstrom, A., Flores, M., and Fiore, M., editors, Passive and Active Measurement , pages 373--399, Cham. Springer Nature Switzerland

  3. [3]

    M., Pivoto, D

    Alberti, A. M., Pivoto, D. G., Rezende, T. T., Leal, A. V., Both, C. B., Facina, M. S., Moreira, R., and de Oliveira Silva , F. (2024). Disruptive 6g architecture: Software-centric, ai-driven, and digital market-based mobile networks. Computer Networks , 252:110682

  4. [4]

    and Pearce, P

    Bhaskar, A. and Pearce, P. (2024). Understanding routing-induced censorship changes globally. In Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security , CCS '24, page 437–451, New York, NY, USA. Association for Computing Machinery

  5. [5]

    Debbabi, F., Jmal, R., and Chaari Fourati, L. (2021). 5g network slicing: Fundamental concepts, architectures, algorithmics, projects practices, and open issues. Concurrency and Computation: Practice and Experience , 33(20):e6352

  6. [6]

    Fazzion, E., Teixeira, G., Veitch, D., Diot, C., Teixeira, R., and Cunha, I. (2025). RemapRoute: Local Remapping of Internet Path Changes . In Proceedings of the 2025 ACM Internet Measurement Conference , IMC '25, page 185–191, New York, NY, USA. Association for Computing Machinery

  7. [7]

    V., and Katz-Bassett, E

    Giotsas, V., Koch, T., Fazzion, E., Cunha, I., Calder, M., Madhyastha, H. V., and Katz-Bassett, E. (2020). Reduce, Reuse, Recycle: Repurposing Existing Measurements to Identify Stale Traceroutes . In Proceedings of the ACM Internet Measurement Conference , IMC '20, page 247–265, New York, NY, USA. Association for Computing Machinery

  8. [8]

    Islam, S., Welzl, M., Hapnes, E., and Feng, B. (2024). Using the IPv6 Flow Label for Path Consistency: A Large-Scale Measurement Study . In ICC 2024 - IEEE International Conference on Communications , pages 3022--3027

  9. [9]

    Katsaros, K., Mavromatis, I., Antonakoglou, K., Ghosh, S., Kaleshi, D., Mahmoodi, T., Asgari, H., Karousos, A., Tavakkolnia, I., Safi, H., Hass, H., Vrontos, C., Emami, A., Marcelo Parra-Ullauri, J., Moazzeni, S., and Simeonidou, D. (2024). Ai-native multi-access future networks—the reason architecture. IEEE Access , 12:178586--178622

  10. [10]

    C., Torsiello, V., and Vanbever, L

    Kirci, E. C., Torsiello, V., and Vanbever, L. (2024). What is the next hop to more granular routing models? In Proceedings of the 23rd ACM Workshop on Hot Topics in Networks , page 343–351, New York, NY, USA. Association for Computing Machinery

  11. [11]

    Li, J., Giotsas, V., Wang, Y., and Zhou, S. (2022). BGP-Multipath Routing in the Internet . IEEE Transactions on Network and Service Management , 19(3):2812--2826

  12. [12]

    D., and Sun, B

    Li, Y., Huang, Y., Liu, R. D., and Sun, B. S. (2025). Is Reverse Traceroute Reliable? In Proceedings of the 9th Asia-Pacific Workshop on Networking , APNET '25, page 284–286, New York, NY, USA. Association for Computing Machinery

  13. [13]

    Lin, S., Zhou, Y., Zhang, X., Arnold, T., Govindan, R., and Yang, X. (2025). Tiered Cloud Routing: Methodology, Latency, and Improvement . Proc. ACM Meas. Anal. Comput. Syst. , 9(1)

  14. [14]

    M-lab open data

    Measurement Lab (2025). M-lab open data. https://www.measurementlab.net/data/. Accessed: 2025-12-06

  15. [15]

    F., Aguiar, R

    Moreira, R., Rosa, P. F., Aguiar, R. L. A., and de Oliveira Silva, F. (2021). NASOR: A network slicing approach for multiple Autonomous Systems . Computer Communications , 179:131--144

  16. [16]

    Paxson, V. (1996). End-to-end routing behavior in the Internet . SIGCOMM Comput. Commun. Rev. , 26(4):25–38

  17. [17]

    S., Chernysh, D

    Sagatov, E. S., Chernysh, D. P., Mayhoub, S., and Sukhov, A. M. (2025). Detection of anomalous network behavior based on one-way delay measurements . Discover Internet of Things , 5(1):129

  18. [18]

    Schmid, R., Schneider, T., Fragkouli, G., and Vanbever, L. (2025). Transient Forwarding Anomalies and How to Find Them . Proc. ACM Netw. , 3(CoNEXT2)

  19. [19]

    and Shavitt, Y

    Shapira, T. and Shavitt, Y. (2022). AP2Vec: An Unsupervised Approach for BGP Hijacking Detection . IEEE Transactions on Network and Service Management , 19(3):2255--2268

  20. [20]

    Syamkumar, M., Gullapalli, Y., Tang, W., Barford, P., and Sommers, J. (2022). Bigben: Telemetry processing for internet-wide event monitoring. IEEE Transactions on Network and Service Management , 19(3):2625--2638

  21. [21]

    Tian, Z., Su, S., Shi, W., Du, X., Guizani, M., and Yu, X. (2019). A data-driven method for future internet route decision modeling. Future Generation Computer Systems , 95:212--220

  22. [22]

    Vermeulen, K., Gurmericliler, E., Cunha, I., Choffnes, D., and Katz-Bassett, E. (2022). Internet scale reverse traceroute . In Proceedings of the 22nd ACM Internet Measurement Conference , IMC '22, page 694–715, New York, NY, USA. Association for Computing Machinery

  23. [23]

    Wassermann, S., Casas, P., Cuvelier, T., and Donnet, B. (2017). Netperftrace: Predicting internet path dynamics and performance with machine learning. In Proceedings of the Workshop on Big Data Analytics and Machine Learning for Data Communication Networks , Big-DAMA '17, page 31–36, New York, NY, USA. Association for Computing Machinery

  24. [24]

    , Xiang, H., Li, Y., Khan, I., and Choi, B

    Yang, S., Tan, C., Madsen, D. ., Xiang, H., Li, Y., Khan, I., and Choi, B. J. (2022). Comparative analysis of routing schemes based on machine learning. Mobile Information Systems , 2022(1):4560072

  25. [25]

    write newline

    " write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...