TURBOTEST: Learning When Less is Enough through Early Termination of Internet Speed Tests
Pith reviewed 2026-05-18 05:21 UTC · model grok-4.3
The pith
TurboTest decouples throughput prediction from termination to stop speed tests early using transport features.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TurboTest is a two-stage framework that sits on top of existing speed-test platforms. Stage 1 trains a regressor to estimate final throughput from partial measurements and richer transport signals. Stage 2 trains a classifier to decide when to terminate, exposing a tunable epsilon for accuracy tolerance plus a fallback for high-variability runs. On one million M-Lab NDT tests from 2024-2025 the method delivers 1.8-4.4 times higher data savings than a BBR-signal baseline while also lowering median error.
What carries the argument
Two-stage ML pipeline in which a regressor predicts final throughput from partial data and a classifier decides termination once evidence suffices, using RTT, retransmissions, and congestion window in addition to throughput.
If this is right
- Average data volume per speed test drops sharply while accuracy stays comparable to full-length runs.
- A single tunable parameter lets operators choose how much accuracy to trade for savings.
- Existing platforms can adopt the approach without altering their core measurement engines.
- High-variability tests automatically run to completion to protect estimate quality.
Where Pith is reading between the lines
- Users could run speed tests more frequently if each test uses far less bandwidth.
- The same early-stopping logic might apply to other streaming measurement tasks that wait for stable estimates.
- Periodic retraining on fresh data would likely be needed to keep performance as networks evolve.
- Mobile-device tests could see meaningful battery savings from shorter active periods.
Load-bearing premise
Models trained on the 2024-2025 M-Lab dataset will continue to work on future networks and the added transport signals will be sufficient to avoid systematic bias in the final throughput estimates.
What would settle it
Apply the trained system to a new collection of speed tests gathered in 2026 or on a different measurement platform and check whether data savings remain above 1.8 times the BBR baseline while median error does not rise.
Figures
read the original abstract
Internet speed tests are indispensable for users, ISPs, and policymakers, but their static flooding-based design imposes growing costs: a single high-speed test can transfer hundreds of MB, and collectively, platforms like Ookla, M-Lab, and Fast.com generate petabytes of traffic each month. Reducing this burden requires deciding when a test can be stopped early without sacrificing accuracy. We frame this as an optimal stopping problem and show that existing heuristics-static thresholds, BBR pipe-full signals, or throughput stability rules from Fast.com and FastBTS-capture only a narrow slice of the achievable accuracy-savings trade-off. This paper introduces TurboTest, a systematic framework for speed test termination that sits atop existing platforms. The key idea is to decouple throughput prediction (Stage 1) from test termination (Stage 2): Stage 1 trains a regressor to estimate final throughput from partial measurements, while Stage 2 trains a classifier to decide when sufficient evidence has accumulated to stop. Leveraging richer transport-level features (RTT, retransmissions, congestion window) alongside throughput, TurboTest exposes a single tunable parameter epsilon for accuracy tolerance and includes a fallback mechanism for high-variability cases. Evaluation on 1 million M-Lab NDT speed tests (2024-2025) shows that TurboTest achieves 1.8-4.4x higher data savings than an approach based on BBR signals while reducing median error. These results demonstrate that adaptive ML-based termination can deliver accurate, efficient, and deployable speed tests at scale.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TurboTest, a two-stage ML framework for early termination of internet speed tests. Stage 1 trains a regressor to predict final throughput from partial measurements using transport features (RTT, retransmissions, congestion window) in addition to throughput; Stage 2 trains a classifier to decide when to stop, controlled by a single tunable accuracy-tolerance parameter epsilon together with a fallback for high-variability cases. The central empirical claim is that, on 1 million M-Lab NDT tests collected in 2024-2025, TurboTest delivers 1.8-4.4x higher data savings than a BBR-signal baseline while also reducing median error.
Significance. If the reported accuracy-savings trade-off generalizes, the work addresses a practically important problem: speed-test platforms collectively generate petabytes of traffic monthly, and a deployable early-termination method could materially reduce this cost without sacrificing measurement fidelity. The scale of the real-world evaluation (1 M traces) is a clear strength and supplies concrete, falsifiable numbers for the accuracy-savings frontier.
major comments (2)
- [§5] §5 (Evaluation): The manuscript reports results on 1 million 2024-2025 M-Lab NDT tests but provides no description of training/validation splits, temporal or geographic hold-outs, hyperparameter search, or error-bar reporting. Because the headline claim is that the learned stopping policy generalizes to new network conditions, the absence of these controls is load-bearing; post-hoc choices on the same data could inflate the reported 1.8-4.4x savings and the reduction in median error.
- [§4] §4 (Framework) and Abstract: The assumption that the added transport features supply stable, unbiased signal for early termination is stated but not tested via any out-of-distribution or temporal-shift experiment. Without such a test, the claim that TurboTest simultaneously improves savings and reduces median error rests on an unverified stationarity assumption that is central to the practical significance of the result.
minor comments (2)
- [§4] The single tunable parameter is called epsilon in the abstract and framework but its precise definition (e.g., whether it bounds absolute or relative error) should be stated explicitly in the first paragraph of §4.
- [§5] Figure captions and axis labels in the evaluation section would benefit from explicit units (e.g., MB saved, Mbps error) to allow direct comparison with the BBR baseline numbers.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below and describe the revisions we will incorporate to strengthen the experimental rigor and generalization claims.
read point-by-point responses
-
Referee: [§5] §5 (Evaluation): The manuscript reports results on 1 million 2024-2025 M-Lab NDT tests but provides no description of training/validation splits, temporal or geographic hold-outs, hyperparameter search, or error-bar reporting. Because the headline claim is that the learned stopping policy generalizes to new network conditions, the absence of these controls is load-bearing; post-hoc choices on the same data could inflate the reported 1.8-4.4x savings and the reduction in median error.
Authors: We agree that explicit details on splits, tuning, and statistical reporting are necessary to substantiate generalization. In the revised §5 we will add a subsection specifying a temporal hold-out (training on the first 8 months of 2024–2025 data and testing on the final 4 months), the hyperparameter search procedure (grid search with 5-fold cross-validation on the training portion), and error bars/confidence intervals on the savings and median-error metrics. These additions will confirm that the 1.8–4.4× savings figures are obtained under a forward-looking split rather than post-hoc selection on the full dataset. revision: yes
-
Referee: [§4] §4 (Framework) and Abstract: The assumption that the added transport features supply stable, unbiased signal for early termination is stated but not tested via any out-of-distribution or temporal-shift experiment. Without such a test, the claim that TurboTest simultaneously improves savings and reduces median error rests on an unverified stationarity assumption that is central to the practical significance of the result.
Authors: We acknowledge that an explicit temporal-shift or OOD test would further validate the stationarity of the transport features. We will add a new experiment that trains the models on 2024 data only and evaluates on 2025 data, directly measuring whether the accuracy–savings trade-off holds under temporal distribution shift. While the existing 1 M-trace evaluation already spans diverse real-world conditions, this additional controlled shift experiment will provide concrete evidence supporting the practical significance of the result. revision: yes
Circularity Check
No significant circularity; results are empirical ML evaluations on external data
full rationale
The paper frames early termination as an optimal stopping problem but solves it via standard supervised learning: a regressor is trained to predict final throughput from partial traces, and a classifier decides when to stop, using features like RTT and cwnd. The headline metrics (1.8-4.4x data savings, lower median error) are computed by applying the trained models to a separate collection of 1 million M-Lab NDT tests and comparing against BBR-based and other baselines. These quantities are statistical outcomes of the evaluation procedure, not quantities that are algebraically or definitionally identical to the training inputs or fitted parameters. No self-definitional equations, fitted-input-as-prediction steps, or load-bearing self-citations appear in the derivation; the central claims remain independent of the model parameters once training is complete.
Axiom & Free-Parameter Ledger
free parameters (1)
- epsilon
axioms (1)
- domain assumption The 2024-2025 M-Lab NDT traces are statistically representative of future speed-test traffic for training and evaluating the regressor and classifier.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Stage 1 trains a regressor to estimate final throughput from partial measurements, while Stage 2 trains a classifier to decide when sufficient evidence has accumulated to stop. Leveraging richer transport-level features (RTT, retransmissions, congestion window) alongside throughput, TurboTest exposes a single tunable parameter ε for accuracy tolerance.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
- [2]
-
[3]
Swift and accurate end-to-end throughput measurements for high-speed net- works
Md Arifuzzaman and Engin Arslan. Swift and accurate end-to-end throughput measurements for high-speed net- works. InThe Network Traffic Measurement and Analysis Conference, 2022
work page 2022
-
[4]
Un- derstanding broadband speed measurements
Steven Bauer, David D Clark, and William Lehr. Un- derstanding broadband speed measurements. TPRC, 2010
work page 2010
-
[5]
Alexis Bondu, Youssef Achenchabe, Albert Bifet, Fab- rice Clérot, Antoine Cornuéjols, Joao Gama, Georges Hébrail, Vincent Lemaire, and Pierre-François Marteau. Open challenges for machine learning based early decision-making research.ACM SIGKDD Explorations Newsletter, 24(2):12–31, 2022
work page 2022
-
[6]
David D. Clark and Sara Wedeman. Measurement, Meaning and Purpose: Exploring the M-Lab NDT Dataset. SSRN Scholarly Paper, Rochester, NY , Au- gust 2021
work page 2021
-
[7]
Internet Speed Test - Measure Network Performance - CloudFlare. speed.cloudflare.com/ , 2025. Ac- cessed: May. 2025
work page 2025
-
[8]
Early classification of time series as a non myopic sequential decision making problem
Asma Dachraoui, Alexis Bondu, and Antoine Cor- nuéjols. Early classification of time series as a non myopic sequential decision making problem. InJoint european conference on machine learning and knowl- edge discovery in databases, pages 433–447. Springer, 2015
work page 2015
-
[9]
Measuring internet speed: current challenges and future recommendations
Nick Feamster and Jason Livingood. Measuring internet speed: current challenges and future recommendations. Communications of the ACM, 63(12):72–80, 2020
work page 2020
-
[10]
Utilizing temporal patterns for estimat- ing uncertainty in interpretable early decision making
Mohamed F Ghalwash, Vladan Radosavljevic, and Zo- ran Obradovic. Utilizing temporal patterns for estimat- ing uncertainty in interpretable early decision making. InProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 402–411, 2014
work page 2014
-
[11]
The case for leveraging transport signals to improve internet speed test efficiency.SIG- COMM Comput
Phillipa Gill, Cristina Leon, Neal Cardwell, and Christophe Diot. The case for leveraging transport signals to improve internet speed test efficiency.SIG- COMM Comput. Commun. Rev., 55(2):23–28, 2025
work page 2025
-
[12]
Assolo, a new method for available bandwidth estima- tion
Emanuele Goldoni, Giuseppe Rossi, and Alberto Torelli. Assolo, a new method for available bandwidth estima- tion. In2009 Fourth International Conference on Inter- net Monitoring and Protection, pages 130–136. IEEE, 2009
work page 2009
-
[13]
Ashish Gupta, Hari Prabhat Gupta, Bhaskar Biswas, and Tanima Dutta. Approaches and applications of early classification of time series: A review.IEEE Transac- tions on Artificial Intelligence, 1(1):47–61, 2020
work page 2020
-
[14]
Pathload: A measurement tool for end-to- end available bandwidth
Manish Jain. Pathload: A measurement tool for end-to- end available bandwidth. InProc. of Passive and Active Measurements (PAM) Workshop, Mar. 2002, 2002
work page 2002
-
[15]
Li Lao, Constantine Dovrolis, and MY Sanadidi. The probe gap model can underestimate the available band- width of multihop paths.ACM SIGCOMM Computer Communication Review, 36(5):29–34, 2006
work page 2006
- [16]
-
[17]
Best practices for collecting speed test data.Available at SSRN 4189044, 2022
Kyle MacMillan, Tarun Mangla, Marc Richardson, and Nick Feamster. Best practices for collecting speed test data.Available at SSRN 4189044, 2022
work page 2022
-
[18]
Kyle MacMillan, Tarun Mangla, James Saxon, Nicole P Marwell, and Nick Feamster. A comparative analysis of ookla speedtest and measurement labs network diagnos- tic test (ndt7).Proceedings of the ACM on Measurement and Analysis of Computing Systems, 7(1):1–26, 2023
work page 2023
-
[19]
Reducing consumed data volume in bandwidth measurements via a machine learning approach
Christian Maier, Peter Dorfinger, Jia Lei Du, Sven Gschweitl, and Johannes Lusak. Reducing consumed data volume in bandwidth measurements via a machine learning approach. In2019 Network Traffic Measure- ment and Analysis Conference (TMA), pages 215–220. IEEE, 2019
work page 2019
-
[20]
When do neural nets outperform boosted trees on tabular data?, 2024
Duncan McElfresh, Sujay Khandagale, Jonathan Valverde, Vishak Prasad C, Benjamin Feuer, Chinmay Hegde, Ganesh Ramakrishnan, Micah Goldblum, and Colin White. When do neural nets outperform boosted trees on tabular data?, 2024
work page 2024
-
[21]
Introducing data transfer limits to ndt
MLab. Introducing data transfer limits to ndt. https: //www.measurementlab.net/blog/short-ndt/
-
[22]
Measurement Lab: Test Your Speed. https://speed. measurementlab.net/, 2024. Accessed: Nov. 2024
work page 2024
-
[23]
On the harmful effects of active network probing
Alamin Mohammed, Theo Karagioules, Emir Halepovic, Shangyue Zhu, and Aaron Striegel. On the harmful effects of active network probing. In2023 32nd Inter- national Conference on Computer Communications and Networks (ICCCN), pages 01–08. IEEE, 2023. 13
work page 2023
-
[24]
repurpose: A case for versatile network measurement
Alamin Mohammed, Theo Karagioules, Emir Halepovic, Shangyue Zhu, and Aaron Striegel. repurpose: A case for versatile network measurement. InICC 2023-IEEE International Conference on Communications, pages 2357–2363. IEEE, 2023
work page 2023
-
[25]
The importance of contextualization of crowdsourced active speed test measurements
Udit Paul, Jiamo Liu, Mengyang Gu, Arpit Gupta, and Elizabeth Belding. The importance of contextualization of crowdsourced active speed test measurements. In Proceedings of the 22nd ACM Internet Measurement Conference, pages 274–289, 2022
work page 2022
-
[26]
Goran Peskir and Albert Shiryaev.Optimal stopping and free-boundary problems. Springer, 2006
work page 2006
-
[27]
pathchirp: Efficient avail- able bandwidth estimation for network paths
Vinay Ribeiro, Rudolf Riedi, Richard Baraniuk, Jiri Navratil, and Les Cottrell. pathchirp: Efficient avail- able bandwidth estimation for network paths. InPassive and active measurement workshop, volume 4, 2003
work page 2003
-
[28]
A mea- surement study of available bandwidth estimation tools
Jacob Strauss, Dina Katabi, and Frans Kaashoek. A mea- surement study of available bandwidth estimation tools. InProceedings of the 3rd ACM SIGCOMM conference on Internet measurement, pages 39–44, 2003
work page 2003
-
[29]
John N Tsitsiklis and Benjamin Van Roy. Optimal stop- ping of markov processes: Hilbert space theory, approx- imation algorithms, and an application to pricing high- dimensional financial derivatives.IEEE Transactions on Automatic Control, 44(10):1840–1851, 2002
work page 2002
-
[30]
Wenhe Yan, Guiling Li, Zongda Wu, Senzhang Wang, and Philip S Yu. Extracting diverse-shapelets for early classification on time series.World Wide Web, 23(6):3055–3081, 2020
work page 2020
-
[31]
Fast and Light Bandwidth Testing for Internet Users
Xinlei Yang, Xianlong Wang, Zhenhua Li, Yunhao Liu, Feng Qian, Liangyi Gong, Rui Miao, and Tianyin Xu. Fast and Light Bandwidth Testing for Internet Users. In18th USENIX Symposium on Networked Systems De- sign and Implementation (NSDI 21), pages 1011–1026. USENIX Association, April 2021
work page 2021
-
[32]
Zesen Zhang, Jiting Shen, and Ricky KP Mok. Empirical characterization of ookla’s speed test platform: analyz- ing server deployment, policy impact, and user coverage. In2024 IEEE 14th Annual Computing and Communi- cation Workshop and Conference (CCWC), pages 0630–
-
[33]
IEEE, 2024. A Appendix A.1 Analysis of Throughput Stability Heuristic (TSH) We apply TSH on our test dataset of 40k samples and calcu- late metrics such as Median Relative Error and Data Transfer as visualized in Table 1. As one can see, by increasing the stability threshold, the amount of data being transferred de- creases at the cost of relative error. ...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.