Characterizing the Impact of Active Queue Management on Speed Test Measurements
Pith reviewed 2026-05-17 04:54 UTC · model grok-4.3
The pith
Active queue management schemes produce high variance in speed test measurements of throughput and latency under load.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In controlled experiments, speed test measurements of throughput and latency under load exhibit high variance when comparing different AQM schemes such as CoDel, FQ-CoDel, and Stochastic Fair Queuing against a drop-tail baseline, across varying load conditions. This demonstrates that AQM configurations play a critical role in determining the values of emerging latency metrics like latency under load or working latency.
What carries the argument
Laboratory-controlled comparisons of throughput and latency under load distributions across AQM schemes including CoDel, FQ-CoDel, SFQ, and drop-tail queuing.
If this is right
- Speed test platforms must account for AQM when interpreting and reporting latency under load metrics.
- High variance means measurements may not be consistent or comparable across networks with different queue management.
- Results from speed tests could mislead policy or regulatory decisions if AQM effects are ignored.
- The design of future speed test tools should consider sensitivity to basic network configurations like AQM.
Where Pith is reading between the lines
- Real networks might show even more complex interactions if AQM implementations differ from the lab versions tested.
- Developers of speed test apps could add options to simulate or report under different AQM to improve accuracy.
- This work points to the need for standardized testing environments that include common AQM to make results more robust.
Load-bearing premise
That the chosen laboratory setup with specific AQM implementations and load patterns is representative of real production networks and that AQM is the main driver of the observed measurement variance.
What would settle it
Running equivalent speed tests in actual production networks with known AQM deployments and checking if the high variance in latency and throughput metrics across schemes still holds.
Figures
read the original abstract
Present day speed test tools measure peak throughput, but often fail to capture the user-perceived responsiveness of a network connection under load. Recently, platforms such as NDT, Ookla Speedtest and Cloudflare Speed Test have introduced metrics such as ``latency under load'' or ``working latency'' to fill this gap. Yet, the sensitivity of these metrics to basic network configurations such as Active Queue Management (AQM) remains poorly understood. In this work, we conduct an empirical study of the impact of AQM on speed test measurements in a laboratory setting. Using controlled experiments, we compare the distribution of throughput and latency under different load measurements across different AQM schemes, including CoDel, FQ-CoDel and Stochastic Fair Queuing (SFQ). On comparing with a standard drop-tail baseline, we find that measurements have high variance across AQM schemes and load conditions. These results highlight the critical role of AQM in shaping how emerging latency metrics should be interpreted, and underscore the need for careful calibration of speed test platforms before their results are used to guide policy or regulatory outcomes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper conducts a controlled laboratory empirical study comparing the effects of Active Queue Management schemes (CoDel, FQ-CoDel, SFQ) versus a drop-tail baseline on speed-test measurements of throughput and latency-under-load metrics. Using synthetic loads, it reports that these metrics exhibit high variance across AQM schemes and load conditions, arguing that AQM must be accounted for when interpreting results from platforms such as NDT, Ookla, and Cloudflare Speed Test.
Significance. If the attribution of variance to AQM holds after addressing experimental controls, the work is significant for network measurement research. It provides concrete evidence that emerging latency metrics are sensitive to queuing behavior, which has direct implications for how speed-test data should be used in policy and regulatory contexts. The controlled lab design is a clear strength, enabling direct comparison of AQM effects without confounding variables from live networks.
major comments (3)
- [§3 (Experimental Setup)] §3 (Experimental Setup): The central claim that variance is driven primarily by AQM requires explicit demonstration that non-AQM factors (traffic generator parameters, buffer sizes, hardware timing) are held constant across runs. Without reported repetition counts, exclusion criteria, or validation that synthetic loads reproduce production traffic statistics, the observed differences could arise from uncontrolled variables rather than queuing discipline.
- [§4 (Results)] §4 (Results): The manuscript asserts 'high variance' across schemes but does not report statistical tests, number of runs, or measures of dispersion (e.g., inter-quartile ranges or confidence intervals). This omission makes it impossible to assess whether the reported differences are statistically reliable or merely consistent with experimental noise.
- [§5 (Discussion)] §5 (Discussion): The generalizability argument is load-bearing for the policy implications yet rests on the untested assumption that lab conditions with the chosen AQM implementations and load patterns are representative. A concrete test—such as replaying captured production traces or varying buffer sizes—would be needed to rule out implementation artifacts.
minor comments (3)
- [Abstract] Abstract: The phrase 'high variance' would be more informative if accompanied by at least one quantitative example (e.g., latency range or coefficient of variation) to convey magnitude to readers.
- [Figures] Figures: Ensure that all plots include error bars or box-plot whiskers and that legends explicitly label each AQM scheme and load condition to avoid ambiguity in visual comparison.
- Reproducibility: Consider releasing the traffic-generation scripts and AQM configuration files as supplementary material to allow independent verification of the reported variance.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback, which has helped us improve the clarity and rigor of the manuscript. We have revised the paper to address the concerns about experimental controls, statistical reporting, and generalizability. Point-by-point responses to the major comments follow.
read point-by-point responses
-
Referee: [§3 (Experimental Setup)] §3 (Experimental Setup): The central claim that variance is driven primarily by AQM requires explicit demonstration that non-AQM factors (traffic generator parameters, buffer sizes, hardware timing) are held constant across runs. Without reported repetition counts, exclusion criteria, or validation that synthetic loads reproduce production traffic statistics, the observed differences could arise from uncontrolled variables rather than queuing discipline.
Authors: We agree that explicit controls are essential to attribute variance to AQM. In the revised manuscript, Section 3 now includes a dedicated subsection on experimental controls, with a table listing fixed parameters (traffic generator configured for constant bit-rate flows with 1500-byte packets, buffer size fixed at 1000 packets for all schemes, identical hardware and timing sources across runs). Each configuration was repeated 30 times; we report means with standard deviations and note that no runs were excluded. We also added a validation paragraph comparing synthetic load statistics (e.g., flow size distribution and inter-arrival times) to samples from public speed-test traces, confirming close alignment. revision: yes
-
Referee: [§4 (Results)] §4 (Results): The manuscript asserts 'high variance' across schemes but does not report statistical tests, number of runs, or measures of dispersion (e.g., inter-quartile ranges or confidence intervals). This omission makes it impossible to assess whether the reported differences are statistically reliable or merely consistent with experimental noise.
Authors: We accept this criticism of the original presentation. The revised Section 4 now states that 30 independent runs were performed per AQM-load pair. Figures have been updated to include box plots with inter-quartile ranges and whiskers, and we report 95% confidence intervals on the mean throughput and latency values. We added a statistical analysis subsection performing one-way ANOVA across schemes followed by post-hoc Tukey tests, with p-values confirming that differences between AQM variants and the drop-tail baseline are statistically significant (p < 0.01) rather than noise. revision: yes
-
Referee: [§5 (Discussion)] §5 (Discussion): The generalizability argument is load-bearing for the policy implications yet rests on the untested assumption that lab conditions with the chosen AQM implementations and load patterns are representative. A concrete test—such as replaying captured production traces or varying buffer sizes—would be needed to rule out implementation artifacts.
Authors: We acknowledge the importance of this point for the strength of our policy-related claims. In the revision we have expanded the limitations paragraph in §5 to explicitly list the assumptions of our synthetic loads and single testbed. We added a small sensitivity experiment varying buffer size (500 vs. 2000 packets) under one AQM scheme and show that the relative ordering of variance remains consistent. However, a full replay of production traces would require new data collection and analysis that exceeds the scope and resources of the current study; we have therefore framed the policy implications more cautiously and listed trace-replay validation as future work. revision: partial
Circularity Check
No circularity: purely empirical comparison with no derivations or self-referential predictions
full rationale
The paper describes a laboratory-based empirical study that runs controlled experiments comparing throughput and latency distributions across AQM schemes (CoDel, FQ-CoDel, SFQ) versus a drop-tail baseline under varying loads. No equations, models, fitted parameters, or predictions appear in the provided text. Claims rest directly on experimental observations rather than any reduction to inputs by construction, self-citation chains, or renamed known results. The central finding of high variance is presented as an outcome of the measurements themselves, making the work self-contained against external benchmarks with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
On comparing with a standard drop-tail baseline, we find that measurements have high variance across AQM schemes and load conditions.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We conduct an empirical study of the impact of AQM on speed test measurements in a laboratory setting.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Understanding Speedtest Methodology
2023. Understanding Speedtest Methodology. https://www.ookla. com/articles/ookla-speedtest-methodology. Whitepaper, Accessed: 2025-10-02
work page 2023
-
[2]
Dale Alexander and Lika Døhl Diouf. 2022. Illustrating Internet Speed Divides in the Caribbean During COVID-19.FOCUS Magazine of the Caribbean Development and Cooperation Committee (CDCC)(12 2022). https://hdl.handle.net/11362/48956
work page 2022
-
[3]
Apple Inc. 2025. Apple Speedtest. https://support.apple.com/ HT212313. Accessed: 2025-10-02
work page 2025
-
[4]
Babak Arzani, Theophilus Benson, David Maltz, and Lucian Popa
-
[5]
InProceedings of the ACM SIGMETRICS Conference
Speedtest at Scale: Measurement and Biases in Crowdsourced Performance Data. InProceedings of the ACM SIGMETRICS Conference. ACM, 307–318. doi:10.1145/2896377.2901480
-
[6]
Steven Bauer, David D Clark, and William Lehr. 2010. Understanding broadband speed measurements. TPRC
work page 2010
-
[7]
Bischof, André Callado, Srikanth Sundaresan, and Nick Feamster
Zachary S. Bischof, André Callado, Srikanth Sundaresan, and Nick Feamster. 2019. Characterizing latency in broadband access networks. InProceedings of the Passive and Active Measurement Conference (PAM). Springer, 65–77. doi:10.1007/978-3-030-15986-3_5
-
[8]
Zdravko Bozakov, Aaron Schulman, and Srikanth Sundaresan. 2018. M-Lab: An Open Platform for Large-Scale Network Measurement. In Proceedings of the ACM SIGCOMM Conference (Demo Session). ACM, 80–81. doi:10.1145/3234200.3234239
-
[9]
Center for Rural Pennsylvania. 2018. Broadband Availability and Ac- cess. https://www.rural.pa.gov/publications/broadband.cfm Accessed November 13, 2024
work page 2018
-
[10]
David D Clark and Sara Wedeman. 2021. Measurement, meaning and purpose: Exploring the M-Lab NDT dataset. InTPRC49: The 49th Research Conference on Communication, Information and Internet Policy. https://dx.doi.org/10.2139/ssrn.3898339
-
[11]
David D Clark and Sara Wedeman. 2024. Measurement of Inter- net access latency: A cross-dataset comparison. InProceedings of the TPRC2024 The Research Conference on Communications, Information and Internet Policy. http://dx.doi.org/10.2139/ssrn.4909679
-
[12]
Dave Täht. [n. d.]. The new features and flaws of speedtest.net, ookla and cloudflare. https://blog.cerowrt.org/post/speedtests/. Accessed: 2025-10-01
work page 2025
- [13]
-
[14]
S. Floyd. 2001. A report on recent developments in TCP congestion control.Comm. Mag.39, 4 (April 2001), 84–90. doi:10.1109/35.917508
-
[15]
S. Floyd and V. Jacobson. 1993. Random early detection gateways for congestion avoidance.IEEE/ACM Transactions on Networking1, 4 (1993), 397–413. doi:10.1109/90.251892
- [16]
-
[17]
Toke Høiland-Jørgensen. [n. d.]. Bufferbloat and Beyond. ([n. d.])
-
[18]
Toke Høiland-Jørgensen, Bengt Ahlgren, Per Hurtig, and Anna Brun- strom. 2016. Measuring Latency Variation in the Internet. InProceed- ings of the 12th International on Conference on Emerging Networking EXperiments and Technologies(Irvine, California, USA)(CoNEXT ’16). Association for Computing Machinery, New York, NY, USA, 473–480. doi:10.1145/2999572.2999603
-
[19]
Toke Høiland-Jørgensen, Paul McKenney, dave.taht@gmail.com, Jim Gettys, and Eric Dumazet. 2018. The Flow Queue CoDel Packet Scheduler and Active Queue Management Algorithm. RFC 8290. doi:10.17487/RFC8290
-
[20]
Toke Høiland-Jørgensen, Dave Täht, and Jonathan Morton. 2018. Piece of CAKE: A Comprehensive Queue Management Solution for Home Gateways. arXiv:1804.07617 [cs.NI] https://arxiv.org/abs/1804.07617
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [21]
- [22]
-
[23]
Linux man-pages project. 2025. tc - traffic control. https://man7. org/linux/man-pages/man8/tc.8.html Accessed through ‘man tc‘ on a Linux system
work page 2025
-
[24]
Kyle MacMillan, Tarun Mangla, James Saxon, Nicole P. Marwell, and Nick Feamster. 2023. A Comparative Analysis of Ookla Speedtest and Measurement Labs Network Diagnostic Test (NDT7).Proc. ACM Meas. Anal. Comput. Syst.7, 1, Article 19 (March 2023), 26 pages. doi:10.1145/3579448
- [25]
-
[26]
Measurement Lab. 2025. NDT7 Protocol Specification. https://github. com/m-lab/ndt-server/blob/main/spec/ndt7-protocol.md. Accessed: 2025-10-02
work page 2025
-
[27]
Measurement Lab. 2025. Network Diagnostic Tool (NDT). https: //www.measurementlab.net/tests/ndt/. Accessed: 2025-10-02
work page 2025
-
[28]
Cise Midoglu, Leonhard Wimmer, Andra Lutu, Ozgu Alay, and Carsten Griwodz. 2018. MONROE-Nettest: A Configurable Tool for Dissecting Speed Measurements in Mobile Broadband Networks. arXiv:1710.07805 [cs.NI] https://arxiv.org/abs/1710.07805
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[29]
Syed Tauhidun Nabi, Zhuowei Wen, Brooke Ritter, and Shaddi Hasan
-
[30]
InProceedings of the 2024 ACM on Internet Measurement Conference(Madrid, Spain) (IMC ’24)
Red is Sus: Automated Identification of Low-Quality Service Availability Claims in the US National Broadband Map. InProceedings of the 2024 ACM on Internet Measurement Conference(Madrid, Spain) (IMC ’24). Association for Computing Machinery, New York, NY, USA, 2–18. doi:10.1145/3646547.3688441
-
[31]
Netflix. 2025. Fast.com Internet Speed Test. https://fast.com/. Accessed: 2025-10-02
work page 2025
-
[32]
Kathleen Nichols and Van Jacobson. 2012. Controlling queue delay. Commun. ACM55, 7 (jul 2012), 42–50. doi:10.1145/2209249.2209264
-
[33]
Ookla. 2025. Speedtest by Ookla. https://www.speedtest.net/. Ac- cessed: 2025-10-02
work page 2025
-
[34]
2025.Responsiveness under Working Conditions
Christoph Paasch, Randall Meyer, Stuart Cheshire, and Will Hawkins. 2025.Responsiveness under Working Conditions. Internet-Draft draft- ietf-ippm-responsiveness-07. Internet Engineering Task Force. https: 7 Ray et al. //datatracker.ietf.org/doc/draft-ietf-ippm-responsiveness/07/ Work in Progress
work page 2025
-
[35]
Rong Pan, Preethi Natarajan, Chiara Piglione, Mythili Suryanarayana Prabhu, Vijay Subramanian, Fred Baker, and Bill VerSteeg. 2013. PIE: A lightweight control scheme to address the bufferbloat problem. In 2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR). 148–155. doi:10.1109/HPSR.2013.6602305
-
[36]
Udit Paul, Jiamo Liu, David Farias-llerenas, Vivek Adarsh, Arpit Gupta, and Elizabeth Belding. 2022. Characterizing Internet Ac- cess and Quality Inequities in California M-Lab Measurements. In Proceedings of the 5th ACM SIGCAS/SIGCHI Conference on Comput- ing and Sustainable Societies(Seattle, WA, USA)(COMPASS ’22). As- sociation for Computing Machinery,...
-
[38]
Aditya Rao, Arpit Gupta, Nick Feamster, and Balachander Krishna- murthy. 2020. Understanding Bottlenecks in the Internet Last Mile. In Proceedings of the ACM Internet Measurement Conference (IMC). ACM, 321–335. doi:10.1145/3419394.3423642
-
[39]
Juan Restrepo, John Rula, and Alberto Dainotti. 2021. Crowdsourced Internet Measurement: Bias and Representativeness. InProceedings of the Passive and Active Measurement Conference (PAM). Springer, 123–136. doi:10.1007/978-3-030-72582-2_8
-
[40]
Reinaldo Sanchez-Arias, Luis G. Jaimes, Shahram Taj, and Md. Selim Habib. 2023. Understanding the State of Broadband Connectivity: An Analysis of Speedtests and Emerging Technologies.IEEE Access11 (2023), 101580–101603. doi:10.1109/ACCESS.2023.3313231
-
[41]
Fatih Berkay Sarpkaya, Fraida Fund, and Shivendra Panwar. 2025. To Adopt or Not to Adopt L4S-Compatible Congestion Control? Under- standing Performance in a Partial L4S Deployment. InPassive and Active Measurement, Cecilia Testart, Roland van Rijswijk-Deij, and Burkhard Stiller (Eds.). Springer Nature Switzerland, Cham, 217–246
work page 2025
-
[42]
James Saxon and Dan A. Black. 2022. What we can learn from selected, unmatched data: Measuring internet inequality in Chicago.Comput- ers, Environment and Urban Systems98 (2022), 101874. doi:10.1016/j. compenvurbsys.2022.101874
work page doi:10.1016/j 2022
-
[43]
Koen De Schepper, Olga Albisser, Olivier Tilmans, and Bob Briscoe
-
[44]
arXiv:2209.01078 [cs.NI] https://arxiv.org/abs/2209.01078
Dual Queue Coupled AQM: Deployable Very Low Queuing Delay for All. arXiv:2209.01078 [cs.NI] https://arxiv.org/abs/2209.01078
-
[45]
Taveesh Sharma, Paul Schmitt, Francesco Bronzino, Nick Feamster, and Nicole P. Marwell. 2024. Beyond Data Points: Regionalizing Crowd- sourced Latency Measurements.Proc. ACM Meas. Anal. Comput. Syst. 8, 3, Article 34 (Dec. 2024), 24 pages. doi:10.1145/3700416
-
[46]
M. Shreedhar and George Varghese. 1995. Efficient fair queueing using deficit round robin. InProceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (Cambridge, Massachusetts, USA)(SIGCOMM ’95). Association for Computing Machinery, New York, NY, USA, 231–242. doi:10.1145/ 217382.217453
-
[47]
Jacob Smith, Srikanth Sundaresan, Will Scott, Siddharth Eravuchira, and Nick Feamster. 2022. Understanding Netflix’s Fast.com: A Study of Web-Based Speed Measurement. InProceedings of the ACM Internet Measurement Conference (IMC). ACM, 458–470. doi:10.1145/3517745. 3561442
-
[48]
Srikanth Sundaresan, Walter de Donato, Nick Feamster, Renata Teix- eira, Sam Crawford, and Antonio Pescapè. 2011. Broadband In- ternet performance: A view from the edge. InProceedings of the Passive and Active Measurement Conference (PAM). Springer, 9–22. doi:10.1007/978-3-642-19260-9_2
-
[49]
Srikanth Sundaresan, Xiaohong Deng, Yun Feng, Danny Lee, and Amogh Dhamdhere. 2017. Challenges in inferring internet conges- tion using throughput measurements. InProceedings of the 2017 In- ternet Measurement Conference(London, United Kingdom)(IMC ’17). Association for Computing Machinery, New York, NY, USA, 43–56. doi:10.1145/3131365.3131382
-
[50]
" " Runs ndt7 - client locally while capturing PCAP remotely via SSH
Greg White. 2025.Operational Guidance on Coexistence with Classic ECN during L4S Deployment. Internet-Draft draft-ietf-tsvwg-l4sops-08. Internet Engineering Task Force. https://datatracker.ietf.org/doc/ draft-ietf-tsvwg-l4sops/08/ Work in Progress. A Ethics Statement This research was conducted in a lab environment using our own testbed infrastructure. Al...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.