Recognition: no theorem link
Cardinality is Not Enough: Super Host Detection via Segmented Cardinality Estimation
Pith reviewed 2026-05-13 22:12 UTC · model grok-4.3
The pith
SegSketch detects super hosts by estimating distinct connections within inferred IP subnets rather than across full addresses.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SegSketch introduces a segmented cardinality estimation scheme that uses a halved-segment hashing strategy to infer the common prefix lengths of IP addresses and then computes flow cardinality inside each inferred subnet; the resulting per-subnet counts replace full-address counts, yielding higher detection accuracy at far lower memory cost than either flat sketches or hierarchical structures.
What carries the argument
Halved-segment hashing strategy that infers common IP prefix lengths to partition addresses into subnets for localized cardinality estimation.
If this is right
- Super-host detection becomes practical on routers with only a few megabytes of fast memory.
- False-positive rates drop because normal cross-subnet traffic no longer inflates global cardinality counts.
- Attack mitigation systems can act on the same memory budget that previously produced unreliable results.
- The same segmented counting idea can be swapped into other sketch-based tasks that currently ignore address locality.
Where Pith is reading between the lines
- The approach may generalize to detecting other locality-sensitive anomalies such as distributed scanners or botnet command channels.
- Router vendors could embed the halved-segment logic in hardware hash tables without increasing on-chip SRAM.
- Combining SegSketch with existing heavy-hitter detectors could produce a single low-memory pipeline for multiple security signals.
Load-bearing premise
Super hosts that matter for detection usually talk to many hosts inside the same subnet rather than scattering connections across unrelated addresses.
What would settle it
A traffic trace containing super hosts whose peer sets have no common prefix longer than /32, where SegSketch shows no F1 improvement over a plain full-address sketch of equal size.
Figures
read the original abstract
Accurately detecting super host that establishes connections to a large number of distinct peers is significant for mitigating web attacks and ensuring high quality of web service. Existing sketch-based approaches estimate the number of distinct connections called flow cardinality according to full IP addresses, while ignoring the fact that a malicious or victim super host often communicates with hosts within the same subnet, resulting in high false positive rates and low accuracy. Though hierarchical-structure based approaches could capture flow cardinality in subnet, they inherently suffer from high memory usage. To address these limitations, we propose SegSketch, a segmented cardinality estimation approach that employs a lightweight halved-segment hashing strategy to infer common prefix lengths of IP addresses, and estimates cardinality within subnet to enhance detection accuracy under constrained memory size. Experiments driven by real-world traces demonstrate that, SegSketch improves F1-Score by up to 8.04x compared to state-of-the-art solutions, particularly under small memory budgets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes SegSketch, a segmented cardinality estimation method for super-host detection. It introduces a halved-segment hashing strategy to infer common IP prefix lengths from traffic, then estimates flow cardinality inside the inferred subnets rather than over full addresses. The approach is positioned as addressing high false-positive rates in standard sketch methods (which ignore subnet locality) and high memory use in hierarchical methods. Experiments on real-world traces are claimed to yield up to an 8.04x F1-score improvement over state-of-the-art baselines, especially under tight memory budgets.
Significance. If the central performance claim holds after validation, the work would be significant for practical network monitoring and attack mitigation. It offers a lightweight way to exploit the common observation that super hosts (malicious or victim) often communicate inside the same subnet, achieving better accuracy than flat sketches without the memory cost of full hierarchical sketches. The method is presented as a direct extension of existing cardinality sketches rather than a parameter-heavy invention.
major comments (2)
- [§3.2] §3.2 (halved-segment hashing): the accuracy of prefix-length inference is not supported by any error analysis, correctness argument, or ablation that isolates the hashing step from the subnet-locality property of the evaluation traces. If the inferred groupings are incorrect on traces lacking strong /24 locality, the reported F1 gain collapses to the performance of the underlying sketch.
- [§4] §4 (experimental evaluation): the headline 8.04x F1 improvement is stated without tabulated baselines, memory budgets, error bars, or an ablation that quantifies the contribution of subnet estimation versus the hashing strategy. This makes the central claim impossible to assess from the provided evidence.
minor comments (1)
- [Abstract] Abstract: the phrase 'particularly under small memory budgets' is not quantified; the manuscript should state the exact memory sizes (e.g., 1 MB, 2 MB) at which the 8.04x figure is observed.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and have revised the manuscript to strengthen the analysis and experimental presentation.
read point-by-point responses
-
Referee: [§3.2] §3.2 (halved-segment hashing): the accuracy of prefix-length inference is not supported by any error analysis, correctness argument, or ablation that isolates the hashing step from the subnet-locality property of the evaluation traces. If the inferred groupings are incorrect on traces lacking strong /24 locality, the reported F1 gain collapses to the performance of the underlying sketch.
Authors: We agree that an explicit error analysis and ablation isolating the halved-segment hashing would strengthen the paper. The hashing strategy is designed to probabilistically group addresses sharing common prefixes by splitting the hash space, but we acknowledge the need to separate this from trace-specific locality. In the revision, we add a probabilistic correctness argument for prefix inference accuracy (based on collision probabilities) and an ablation study evaluating the hashing step independently. We also include results on synthetic traces with controlled locality levels to show that gains diminish without strong subnet structure, as expected, rather than collapsing entirely. revision: yes
-
Referee: [§4] §4 (experimental evaluation): the headline 8.04x F1 improvement is stated without tabulated baselines, memory budgets, error bars, or an ablation that quantifies the contribution of subnet estimation versus the hashing strategy. This makes the central claim impossible to assess from the provided evidence.
Authors: We apologize for the lack of detailed tabular and ablation data in the original submission. The revised manuscript includes a new table reporting F1-scores for all baselines (HyperLogLog, PCSA, and hierarchical sketches) at explicit memory budgets (0.5 MB to 8 MB). We add error bars as standard deviations over 10 independent runs. A dedicated ablation subsection quantifies the separate contributions of subnet cardinality estimation and the halved-segment hashing, confirming that both are necessary for the peak gains (e.g., the 8.04x figure occurs at 1 MB on the CAIDA trace). revision: yes
Circularity Check
No circularity: SegSketch introduces independent halved-segment hashing on top of existing sketches
full rationale
The paper describes SegSketch as a new segmented cardinality estimation method that adds a lightweight halved-segment hashing strategy to infer common IP prefix lengths and then estimates cardinality within those subnets. This construction is presented as an engineering extension of prior sketch techniques rather than any self-referential equation, fitted parameter renamed as prediction, or self-citation chain that carries the central claim. No equations or derivations in the provided text reduce the reported F1 improvement to the input data by construction; the accuracy gains are asserted via empirical evaluation on real-world traces. The approach therefore remains self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
free parameters (1)
- halved-segment hash parameters
axioms (1)
- domain assumption Malicious or victim super hosts communicate with hosts within the same subnet
invented entities (1)
-
SegSketch
no independent evidence
Reference graph
Works this paper leans on
-
[1]
2026. SegSketch Repository. https://github.com/Elaine-codebase/SegSketch- Repository
work page 2026
-
[2]
Qasem Abu Al-Haija, Eyad Saleh, and Mohammad Alnabhan. 2021. Detecting port scan attacks using logistic regression. InProceedings of IEEE International Symposium on Advanced Electrical and Communication Technologies. 1–5
work page 2021
-
[3]
Manos Antonakakis, Tim April, Michael Bailey, Matt Bernhard, Elie Bursztein, Jaime Cochran, Zakir Durumeric, J Alex Halderman, Luca Invernizzi, Michalis Kallitsis, et al. 2017. Understanding the mirai botnet. InProceedings of USENIX Security Symposium. 1093–1110
work page 2017
-
[4]
Barefoot Networks. 2016. Barefoot’s Tofino. https://barefootnetworks.com/ products/brief-tofino
work page 2016
-
[5]
Ran Ben Basat, Gil Einziger, Roy Friedman, Marcelo C Luizelli, and Erez Waisbard
-
[6]
InProceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM)
Constant time updates in hierarchical heavy hitters. InProceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM). 127–140
-
[7]
Catalin Cimpanu. 2019. Carpet-bombing ddos attack takes down south african isp for an entire day. https://www.zdnet.com/article/carpet-bombing-ddos-attack- takes-down-south-african-isp-for-an-entire-day/
work page 2019
-
[8]
Center for Applied Internet Data Analysis. 2016. The CAIDA Anonymized Internet Traces Dataset. https://catalog.caida.org/dataset/passive_2016_pcap
work page 2016
-
[9]
Graham Cormode, Flip Korn, S Muthukrishnan, and Divesh Srivastava. 2004. Diamond in the rough: Finding hierarchical heavy hitters in multi-dimensional data. InProceedings of ACM SIGMOD International Conference on Management of Data. 155–166
work page 2004
-
[10]
Graham Cormode, Flip Korn, S Muthukrishnan, and Divesh Srivastava. 2008. Finding hierarchical heavy hitters in streaming data.ACM Transactions on Knowledge Discovery from Data1, 4 (2008), 1–48
work page 2008
-
[11]
Leonardo Henrique De Melo, Gustavo de Carvalho Bertoli, Michele Nogueira, Aldri Luiz Dos Santos, and Lourenço Alves Pereira. 2025. Anomaly-Flow: A Multi-domain Federated Generative Adversarial Network for Distributed Denial- of-Service Detection.IEEE Network(2025), 1–1
work page 2025
-
[12]
Damu Ding, Marco Savi, Federico Pederzolli, Mauro Campanella, and Domenico Siracusa. 2021. In-network volumetric DDoS victim identification using pro- grammable commodity switches.IEEE Transactions on Network and Service Management18, 2 (2021), 1191–1202
work page 2021
-
[13]
Yang Du, He Huang, Yu-E Sun, Kejian Li, Boyu Zhang, and Guoju Gao. 2023. A better cardinality estimator with fewer bits, constant update time, and mergeabil- ity. InProceedings of IEEE International Conference on Computer Communications (INFOCOM). 1–10
work page 2023
-
[14]
Zakir Durumeric, Michael Bailey, and J Alex Halderman. 2014. An Internet-Wide view of Internet-Wide scanning. InProceedings of USENIX Security Symposium. 65–78
work page 2014
-
[15]
Cristian Estan, George Varghese, and Mike Fisk. 2003. Bitmap algorithms for counting active flows on high speed links. InProceedings of ACM SIGCOMM Conference on Internet Measurement. 153–166
work page 2003
-
[16]
FastNetMon. 2023. Rise of carpet bombing attacks. https://fastnetmon.com/2023/ 10/24/rise-of-carpet-bombing-ddos-attacks-and-ways-to-detect-and-defend- against-them-using-fastnetmon-advanced/
work page 2023
-
[17]
FS-ISAC. 2025. DDoS Attackers Increase Targeting of Global Financial Sector, Ac- cording to FS-ISAC and Akamai Report. https://www.fsisac.com/newsroom/ddos- attackers-increase-targeting-of-global-financial-sector-according-to-fsisac- and-akamai-report
work page 2025
-
[18]
Cormode Graham, Korn Flip, Muthukrishnan Shanmugavelayutham, and Sri- vastava Divesh. 2003. Finding hierarchical heavy hitters in data streams. In Proceedings of ACM International Conference on Very Large Data Bases (VLDB). 464–475
work page 2003
-
[19]
Tiago Heinrich, Rafael R. Obelheiro, and Carlos A. Maziero. 2021. New kids on the DRDoS block: Characterizing multiprotocol and carpet bombing attacks. In International Conference on Passive and Active Network Measurement. 269–283
work page 2021
-
[20]
Stefan Heule, Marc Nunkesser, and Alexander Hall. 2013. Hyperloglog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm. InProceedings of International Conference on Extending Database Technology. 683–692
work page 2013
-
[21]
Hirsi Abdinasir, Audah Lukman, Salh, Adeb. 2024. SDN-DDoS Traffic Dataset. https://data.mendeley.com/datasets/b7vw628825/1
work page 2024
- [22]
-
[23]
Zhen Huang, Shang Liu, Ke Zhao, and Yong Xiang. 2024. GMCB: An Efficient and Light Graph Analysis Model for Detecting Carpet Bombing DDoS Attacks. InProceedings of IEEE International Conference on Computer and Communications. 1918–1922
work page 2024
-
[24]
Itay Raviv. 2023. DDoS Carpet-Bombing – Coming In Fast And Bru- tal. https://www.radware.com/blog/ddos-protection/ddos-carpet-bombing- coming-in-fast-and-brutal/
work page 2023
-
[25]
Xuyang Jing, Hui Han, Zheng Yan, and Witold Pedrycz. 2021. SuperSketch: A multi-dimensional reversible data structure for super host identification.IEEE Transactions on Dependable and Secure Computing (TDSC)19, 4 (2021), 2741–2754
work page 2021
-
[26]
Xuyang Jing, Zheng Yan, Hui Han, and Witold Pedrycz. 2021. ExtendedSketch: Fusing network traffic for super host identification with a memory efficient sketch.IEEE Transactions on Dependable and Secure Computing (TDSC)19, 6 (2021), 3913–3924
work page 2021
-
[27]
Robert J. Jenkins Jr. 1995. Hash Functions for Hash Table Lookup. http: //burtleburtle.net/bob/hash/evahash.html
work page 1995
-
[28]
Sian Kim, Changhun Jung, Rhongho Jang, David Mohaisen, and Dae Hun Nyang
-
[29]
InProceedings of Annual Network and Distributed System Security Symposium
A robust counting sketch for data plane intrusion detection. InProceedings of Annual Network and Distributed System Security Symposium
-
[30]
Tatyana Kulikova, Olga Svistunova, Roman Dedenok, Andrey Kovtun, Irina Shimko, and Anna Lazaricheva. 2024. Spam and phishing in 2024. https:// securelist.com/spam-and-phishing-report-2024/115536/
work page 2024
- [31]
-
[32]
Weijiang Liu, Wenyu Qu, Jian Gong, and Keqiu Li. 2015. Detection of superpoints using a vector bloom filter.IEEE Transactions on Information Forensics and Security (TIFS)11, 3 (2015), 514–527
work page 2015
-
[33]
Chaoyi Ma, Shigang Chen, Youlin Zhang, Qingjun Xiao, and Olufemi O Odeg- bile. 2021. Super spreader identification using geometric-min filter.IEEE/ACM Transactions on Networking (TON)30, 1 (2021), 299–312
work page 2021
-
[34]
Chaoyi Ma, Olufemi O Odegbile, Dimitrios Melissourgos, Haibo Wang, and Shiping Chen. 2023. From CountMin to Super kJoin Sketches for Flow Spread Estimation.IEEE Transactions on Network Science and Engineering11, 3 (2023), 2353–2370
work page 2023
-
[35]
Qingxin Mao, Daisuke Makita, Michel van Eeten, Katsunari Yoshioka, and Tsu- tomu Matsumoto. 2024. Characteristics Comparison between Carpet Bombing- type and Single Target DRDoS Attacks Observed by Honeypot.Journal of Infor- mation Processing32 (2024), 731–747
work page 2024
-
[36]
Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2005. Efficient compu- tation of frequent and top-k elements in data streams. InProceedings of Springer International Conference on Database Theory. 398–412
work page 2005
-
[37]
Michael Mitzenmacher, Thomas Steinke, and Justin Thaler. 2012. Hierarchical heavy hitters with the space saving algorithm. InProceedings of Workshop on Algorithm Engineering and Experiments. 160–174
work page 2012
-
[38]
Netresec. 2012. Capture files from Mid-Atlantic CCDC. https://www.netresec. com/?page=MACCDC
work page 2012
-
[39]
Jorge Pacheco Omer Yoachimik. 2025. Targeted by 20.5 million DDoS attacks, up 358% year-over-year: Cloudflare’s 2025 Q1 DDoS Threat Report. https://blog. cloudflare.com/ddos-threat-report-for-2025-q1/
work page 2025
-
[40]
P4 Language Consortium. 2015. P4 Language. https://p4.org
work page 2015
-
[41]
Xun Song, Jiaqi Zheng, Hao Qian, Shiju Zhao, Hongxuan Zhang, Xuntao Pan, and Guihai Chen. 2023. Couper: Memory-Efficient Cardinality Estimation under Unbalanced Distribution. InProceedings of IEEE International Conference on Data Engineering. 2753–2765
work page 2023
-
[42]
Lu Tang, Yao Xiao, Qun Huang, and Patrick PC Lee. 2022. A high-performance invertible sketch for network-wide superspreader detection.IEEE/ACM Transac- tions on Networking (TON)31, 2 (2022), 724–737
work page 2022
-
[43]
Terry Young. 2024. Carpet-bombing attacks highlight the need for intelligent and automated ddos protection. https://www.a10networks.com/blog/carpet- bombing-attacks-highlight-the-need-for-intelligent-and-automated-ddos- protection
work page 2024
-
[44]
The Measurement and Analysis on the WIDE Internet (MAWI) Working Group
-
[45]
MAWI Working Group Traffic Archive. http://mawi.wide.ad.jp/mawi/
-
[46]
Patrick Truong and Fabrice Guillemin. 2009. Identification of heavyweight address prefix pairs in IP traffic. InProceedings of IEEE International Teletraffic Congress. 1–8
work page 2009
-
[47]
UNSW Canberra at ADFA. 2015. The UNSW-NB15 Dataset. https://research. unsw.edu.au/projects/unsw-nb15-dataset
work page 2015
-
[48]
Haibo Wang, Chaoyi Ma, Olufemi O Odegbile, Shigang Chen, and Jih-Kwon Peir
-
[49]
Randomized error removal for online spread estimation in data streaming. (2021), 1040—-1052
work page 2021
- [50]
-
[51]
Pinghui Wang, Xiaohong Guan, Tao Qin, and Qiuzhen Huang. 2011. A data streaming method for monitoring host connection degrees of high-speed links. IEEE Transactions on Information Forensics and Security (TIFS)6, 3 (2011), 1086– 1098
work page 2011
-
[52]
Kyu-Young Whang, Brad T Vander-Zanden, and Howard M Taylor. 1990. A linear-time probabilistic counting algorithm for database applications.ACM Transactions on Database Systems15, 2 (1990), 208–229
work page 1990
-
[53]
Qingjun Xiao, Shigang Chen, Min Chen, and Yibei Ling. 2015. Hyper-compact virtual estimators for big network data based on register sharing. InProceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS). 417–428. WWW ’26, April 13–17, 2026, Dubai, United Arab Emirates Yilin Zhao et al
work page 2015
-
[54]
You Zhou, Youlin Zhang, Chaoyi Ma, Shigang Chen, and Olufemi O Odegbile
-
[55]
A Theoretical Analysis A.1 Cardinality Estimation Error Bound Proof
Generalized sketch families for network traffic measurement.Proceedings of the ACM on Measurement and Analysis of Computing Systems3, 3 (2019), 1–34. A Theoretical Analysis A.1 Cardinality Estimation Error Bound Proof. For a flow, the probability that it is hashed to a specific bit 𝑗∈ { 1, . . . , 𝑀} in a bitmap of size 𝑀 is 𝑀 −1. Conversely, the probabil...
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.