READ: a three-communicating-stage distributed super points detections algorithm
Pith reviewed 2026-05-24 19:30 UTC · model grok-4.3
The pith
The READ algorithm detects super points in distributed networks with accuracy matching or exceeding single-node methods while cutting communication to under 5 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
READ proves that its distributed detection accuracy is no less than single-node accuracy. It generates candidate super points with the Rough Estimator, estimates their cardinalities with the Linear Estimator, and exchanges data in three communication stages after each node finishes an asynchronous scan of IP pairs within a time window. Tests on four groups of high-speed real traffic show higher accuracy than prior methods alongside communication costs below 5 percent of existing algorithms.
What carries the argument
Three-stage communication after asynchronous scans, using Rough Estimator to produce candidate sets and Linear Estimator to compute cardinalities.
If this is right
- Distributed accuracy is guaranteed to meet or exceed single-node accuracy on the same traffic.
- Communication volume stays below 5 percent of existing distributed super point algorithms.
- The approach works on real 10 Gb/s and 40 Gb/s network traces without centralizing all data.
- Super points are identified correctly by separating rough candidate selection from precise cardinality estimation.
Where Pith is reading between the lines
- The staged estimator approach could apply to other distributed tasks that need low-communication cardinality estimates.
- Real-time security monitoring systems might adopt similar three-stage exchanges to track anomalous hosts at larger scales.
- Reducing the number of stages or replacing the estimators could be tested to lower communication even further.
Load-bearing premise
The errors from the Rough Estimator and Linear Estimator do not compound across asynchronous scans and the three communication stages enough to make distributed accuracy worse than single-node accuracy.
What would settle it
Running the same traffic traces through READ once in distributed mode and once in single-node mode and measuring whether distributed accuracy falls below single-node accuracy.
Figures
read the original abstract
A super point is a host that interacts with a far larger number of counterparts in the network over a period of time. Super point detection plays an important role in network research and application. With the increase of network scale, distributed super point detection has become a hot research topic. Compared with single-node super point detection algorithm, the difficulty of super point detection in multi-node distributed environment is how to reduce communication overhead. Therefore, this paper proposes a three-stage communication distributed super point detection algorithm: Rough Estimator based Asynchronous Distributed super point detection algorithm (READ). READ uses a lightweight estimator, the Rough Estimator (RE), which is fast in computation and takes less memory to generate candidate super point. At the same time, the Linear Estimator (LE) is used to accurately estimate the cardinality of each candidate super point, so as to detect the super point correctly. In READ, each node scans IP address pairs asynchronously. When reaching the time window boundary, READ starts three-stage communication to detect the super point. In this paper, we proof that the accuracy of READ in distributed environment is no less than that in the single node environment. Four groups of 10 Gb/s and 40 Gb/s real-world high-speed network traffic are used to test READ. The experimental results show that READ not only has higher accuracy in distributed environment, but also has less than 5% of communication burden compared with existing algorithms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes READ, a three-stage communication protocol for distributed super-point detection in high-speed networks. Each node runs an asynchronous scan using a lightweight Rough Estimator (RE) to produce candidate sets, followed by a Linear Estimator (LE) for cardinality estimation of candidates; three communication stages merge results at window boundaries. The central claim is a proof that distributed accuracy is at least as high as single-node accuracy, supported by experiments on four real 10 Gb/s and 40 Gb/s traces showing higher accuracy and <5% communication burden relative to prior algorithms.
Significance. If the accuracy proof holds and the communication savings are reproducible, the result would be useful for scalable network monitoring where bandwidth is constrained. The evaluation on high-rate real traces is a strength; the parameter-free nature of the claimed proof (no fitted values) would also be a positive if demonstrated.
major comments (2)
- [Abstract] Abstract (proof claim): the manuscript asserts a proof that 'the accuracy of READ in distributed environment is no less than that in the single node environment,' yet provides no derivation, recovery-probability bound, or analysis of how RE false negatives on split-traffic super points are provably recovered by the three-stage merge under asynchronous node scans. This assumption is load-bearing for the central claim.
- [Abstract] Abstract (experimental claim): the statement that READ 'has higher accuracy in distributed environment' and '<5% of communication burden' is presented without reference to the specific tables, figures, or comparison baselines that would allow verification of the metrics or data-exclusion rules.
minor comments (2)
- [Title, Abstract] Title and abstract contain grammatical issues ('detections algorithm', 'we proof') that should be corrected for clarity.
- [Abstract] The Rough Estimator and Linear Estimator are referenced without an initial formal definition or pointer to their prior definitions/equations.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on the abstract. We agree that the abstract would benefit from clearer pointers to the supporting material in the manuscript and will revise accordingly. Below we respond point by point.
read point-by-point responses
-
Referee: [Abstract] Abstract (proof claim): the manuscript asserts a proof that 'the accuracy of READ in distributed environment is no less than that in the single node environment,' yet provides no derivation, recovery-probability bound, or analysis of how RE false negatives on split-traffic super points are provably recovered by the three-stage merge under asynchronous node scans. This assumption is load-bearing for the central claim.
Authors: The full derivation, including the recovery-probability bound for RE false negatives on split-traffic super points and the argument that the three-stage merge preserves accuracy under asynchronous scans, appears in Section 3.3 of the manuscript. The abstract states the result but does not cite the section. We will revise the abstract to include a one-sentence pointer to Section 3.3 and a brief outline of the key recovery step. revision: yes
-
Referee: [Abstract] Abstract (experimental claim): the statement that READ 'has higher accuracy in distributed environment' and '<5% of communication burden' is presented without reference to the specific tables, figures, or comparison baselines that would allow verification of the metrics or data-exclusion rules.
Authors: The accuracy and communication results are reported in Tables 2–5 and Figures 3–6, using the four 10 Gb/s and 40 Gb/s traces described in Section 5, with direct comparison to the algorithms listed in Table 1. We will revise the abstract to add explicit references such as “(Tables 2–3, <5 % communication vs. baselines in Table 1)” so that the claims can be verified without ambiguity. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper claims a proof that READ accuracy in the distributed setting is no less than single-node accuracy, using Rough Estimator and Linear Estimator with three-stage communication. No equations, fitted parameters, self-citations, or ansatzes are exhibited that would reduce this proof to a tautology or input by construction. The accuracy claim is presented as an independent mathematical argument rather than a statistical fit or renamed pattern, so the derivation is self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
READ: a three-communicating-stage distributed super points detections algorithm
Introduction The Internet is one of the most important in- frastructures of the modern information society. With the rapid development of China’s economy, the bandwidth of core network is increasing year by year. According to the latest statistics of China In- ternet Information Center (CNNIC), as of Decem- ber 2018, China’s international export bandwidth...
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[2]
1” starting from the right. For example, the binary formatter of integer 200 is “11001000
Related work Super point detection is a hotspot in the field of network research and management. For the sake of narrative convenience, this section first gives rele- vant definitions. 2.1. Related definitions All of the super point detection algorithms are based on network traffic and belong to passive net- work measurement. The original data used in the algor...
-
[3]
For example, a campus network access to multiple In- ternet Service Provider(ISP)
Distributed super point detection model and difficulty A network connected to the Internet may have multiple border routers, as shown in Figure 2. For example, a campus network access to multiple In- ternet Service Provider(ISP). Assuming that there is an observation node at each border router. Traf- fic can be observed and analyzed independently on each nod...
-
[4]
bit or” manner. In this paper, the way of com- bining according to “bit or
RE based distributed super points detec- tion algorithm READ In this section, we will introduce our low com- munication overhead distributed super points de- tection algorithm Rough Estimator based Asyn- chronous Distributed super points detection algo- rithm(READ). 4.1. Principle of READ READ uses a data structure that can recover candidate super points ...
-
[5]
Test whether c0 0 and c1 0 come from the same IP address, as shown in Figure 7. 𝒸0 0 𝒸1 0 𝒸2 0𝒞0 𝒸0 1 𝒸1 1𝒞1 𝒸0 2 𝒸1 2 𝒸2 2𝒞2 <𝒸0 0, 𝒸1 1> <𝒸0 0, 𝒸1 1, 𝒸1 2 > Figure 7: Example of restoring LP with depth-first method The four bits on the left of c0 0 are different from the four bits on the right of c1 0, so c0 0 and c1 0 come from different IP addresses. The...
-
[6]
In C2, the four bits on the right side of c2 0 are the same as the four bits on the left side of c1 1, but the four bits on the left side of c2 0 are not equal to the four bits on the right side of c0 0, so c2 0 cannot form a candidate RE tuple with c0 0 and c1
-
[7]
In C2, not only are the four bits on the right side the same as the four bits on the left side of c1 1, but also the four bits on the left side of c2 1 the same as the four bits on the right side of c0
-
[8]
000101 1110 010001 1100 010101 0101
Therefore, < c0 0,c1 1,c2 1 > constitutes a candidate RE tuple. From the values of c0 0, c1 1 and c2 1, we can see that the RE associating with the candi- date RE tuple is Rl 0,12629,2, Rl 1,14620,2 , Rl 2,5214,2. If the cardinality estimated from the inner merge RE, Rl 0,12629,2 ⨀ Rl 1,14620,2 ⨀ Rl 2,5214,2, still over the threshold, 30 bits of the left ...
-
[9]
1”. Then there is no row in β whose bits are all “0
At this time, each row may contains one or more bits with value “1”. For example, when n=3,ˆu = 3,β = 1 0 0 0 1 0 0 0 1 , ∏ˆu−1 i=0 ∑n−1 l=0 βl i = 1, but∑n−1 l=0 ∏ˆu−1 i=0 βl i = 0. When ∑n−1 l=0 ∏ˆu−1 i=0 βl i = 1, ∏ˆu−1 i=0 ∑n−1 l=0 βl i also equals to 1. Because when ∑n−1 l=0 ∏ˆu−1 i=0 βl i = 1, at least one column in β has all bits with value ...
-
[10]
The master data structure at the observation node consists of two parts: REC and LEA
Distributed super points detection under sliding time window READ only scans IP address pairs at each ob- servation node, so only sliding window counter is needed to record opposite hosts incrementally at the observation node. The master data structure at the observation node consists of two parts: REC and LEA. The estimators of REC and LEA are RE and LE,...
-
[11]
Experiments and analysis In order to test the performance of READ, four groups of high-speed network traffic are used to 13 carry out experiments in this section. The exper- iment analyzes READ from the aspects of detec- tion error rate, memory usage and running time. We compared READ with DCDS, VBFA, CSE and SRLA. 6.1. Experiment data In this paper, four g...
-
[12]
REC is a three- dimensional structure of RE
Conclusion READ uses REC to generate candidate super points in distributed environment. REC is a three- dimensional structure of RE. Because RE has the characteristics of small memory occupation and fast computing speed, REC can generate candidate su- per points from 40Gb/s high-speed network with only 3MB of memory. LEA is used to estimate the cardinalit...
-
[13]
C. I. N. I. CenterCNNIC, China internet network development statistic report(43th) (Feb. 2019). URL http://www.cac.gov.cn/2019-02/28/c_ 1124175677.htm 16
work page 2019
-
[14]
Ai-ping, Research on the key issues of traffic mea- surement in high-speed networks, Ph.D
Z. Ai-ping, Research on the key issues of traffic mea- surement in high-speed networks, Ph.D. thesis, South- east University (2015)
work page 2015
-
[15]
J. Kucera, L. Kekely, A. Piecek, J. Korenek, General ids acceleration for high-speed networks, in: 2018 IEEE 36th International Conference on Computer Design (ICCD), 2018, pp. 366–373. doi:10.1109/ICCD.2018.00062
-
[16]
S. Venkataraman, D. Song, P. B. Gibbons, A. Blum, New streaming algorithms for fast detection of super- spreaders, in: in Proceedings of Network and Dis- tributed System Security Symposium (NDSS, 2005, pp. 149–166
work page 2005
-
[17]
C. Modi, D. Patel, B. Borisaniya, H. Patel, A. Pa- tel, M. Rajarajan, A survey of intrusion detec- tion techniques in cloud, Journal of Network and Computer Applications 36 (1) (2013) 42 – 57. doi:http://doi.org/10.1016/j.jnca.2012.05.003. URL http://www.sciencedirect.com/science/ article/pii/S1084804512001178
-
[18]
N. Kamiyama, T. Mori, R. Kawahara, Simple and adap- tive identification of superspreaders by flow sampling, in: IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications, 2007, pp. 2481–2485. doi:10.1109/INFCOM.2007.305
-
[19]
P. Wang, X. Guan, T. Qin, Q. Huang, A data stream- ing method for monitoring host connection degrees of high-speed links, IEEE Transactions on Informa- tion Forensics and Security 6 (3) (2011) 1086–1098. doi:10.1109/TIFS.2011.2123094
-
[20]
W. Liu, W. Qu, J. Gong, K. Li, Detection of superpoints using a vector bloom filter, IEEE Transactions on In- formation Forensics and Security 11 (3) (2016) 514–527. doi:10.1109/TIFS.2015.2503269
-
[21]
M. Yoon, T. Li, S. Chen, J.-K. Peir, Fit a com- pact spread estimator in small high-speed memory, IEEE/ACM Trans. Netw. 19 (5) (2011) 1253–1264. doi:10.1109/TNET.2010.2080285. URL http://dx.doi.org/10.1109/TNET.2010.2080285
-
[22]
Z. Liu, R. Wang, M. Tao, X. Cai, A class-oriented feature selection approach for multi-class imbalanced network traffic datasets based on local and global metrics fusion, Neurocomputing 168 (2015) 365 – 381. doi:https://doi.org/10.1016/j.neucom.2015.05.089. URL http://www.sciencedirect.com/science/ article/pii/S0925231215007870
-
[23]
Y. Zheng, M. Li, Towards more efficient cardinal- ity estimation for large-scale rfid systems, IEEE/ACM Transactions on Networking 22 (6) (2014) 1886–1896. doi:10.1109/TNET.2013.2288352
-
[24]
H. Adam, E. Yanmaz, C. Bettstetter, Contention- based estimation of neighbor cardinality, IEEE Trans- actions on Mobile Computing 12 (3) (2013) 542–555. doi:10.1109/TMC.2012.19
-
[25]
B. Li, Y. He, W. Liu, Towards constant-time cardinality estimation for large-scale rfid systems, in: 2015 44th International Conference on Parallel Processing, 2015, pp. 809–818. doi:10.1109/ICPP.2015.90
-
[26]
P. Flajolet, G. N. Martin, Probabilistic counting, in: 24th Annual Symposium on Foundations of Computer Science (sfcs 1983), 1983, pp. 76–82. doi:10.1109/SFCS.1983.46
-
[27]
P. Flajolet, E. Fusy, O. Gandouet, F. Meunier, Hy- perLogLog: the analysis of a near-optimal cardinality estimation algorithm, in: P. Jacquet (Ed.), Analysis of Algorithms 2007 (AofA07), Juan les pins, France, 2007, pp. 127–146. URL https://hal.archives-ouvertes.fr/ hal-00406166
work page 2007
-
[28]
K.-Y. Whang, B. T. Vander-Zanden, H. M. Tay- lor, A linear-time probabilistic counting algorithm for database applications, ACM Trans. Database Syst. 15 (2) (1990) 208–229. doi:10.1145/78922.78925. URL http://doi.acm.org/10.1145/78922.78925
-
[29]
J. Xu, W. Ding, J. Gong, X. Hu, J. Liu, High speed net- work super points detection based on sliding time win- dow by gpu, in: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017, pp. 566–573. doi:10.1109/ISPA/IUCC.2017.00092
-
[30]
J. Xu, W. Ding, J. Gong, X. Hu, S. Sun, SRLA: A real time sliding time window super point cardinal- ity estimation algorithm for high speed network based on gpu, in: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science a...
-
[31]
J. Xu, W. Ding, Q. Gong, X. Hu, H. Yu, A super point detection algorithm under sliding time windows based on rough and linear estimators, IEEE Access 7 (2019) 43414–43427. doi:10.1109/ACCESS.2019.2908226
-
[32]
B. Coskun, (un)wisdom of crowds: Accurately spotting malicious ip clusters using not-so-accurate ip blacklists, IEEE Transactions on Information Forensics and Security 12 (6) (2017) 1406–1417. doi:10.1109/TIFS.2017.2663333
-
[33]
A. Cianfrani, V. Eramo, M. Listanti, M. Polverini, A. V. Vasilakos, An ospf-integrated routing strat- egy for qos-aware energy saving in ip back- bone networks, IEEE Transactions on Network and Service Management 9 (3) (2012) 254–267. doi:10.1109/TNSM.2012.031512.110165
-
[34]
G. Cheng, Y. Tang, Line speed accurate superspreader identification using dynamic error compensation, Com- puter Communications 36 (13) (2013) 1460 – 1470. doi:http://doi.org/10.1016/j.comcom.2013.05.006. URL http://www.sciencedirect.com/science/ article/pii/S0140366413001400
-
[35]
L. Xiao, X.-G. Xia, A new robust chinese re- mainder theorem with improved performance in frequency estimation from undersampled wave- forms, Signal Processing 117 (2015) 242 – 246. doi:https://doi.org/10.1016/j.sigpro.2015.05.017. URL http://www.sciencedirect.com/science/ article/pii/S0165168415001954
-
[36]
K. Christensen, A. Roginsky, M. Jimeno, A new analysis of the false positive rate of a bloom filter, Information Processing Letters 110 (21) (2010) 944 –
work page 2010
-
[37]
URL http://www.sciencedirect.com/science/ article/pii/S0020019010002425
doi:http://dx.doi.org/10.1016/j.ipl.2010.07.024. URL http://www.sciencedirect.com/science/ article/pii/S0020019010002425
-
[38]
J. Xu, W. Ding, X. Hu, Q. Gong, Vate: A trade- off between memory and preserving time for high accurate cardinality estimation under sliding time window, Computer Communications 138 (2019) 20 –
work page 2019
-
[39]
URL http://www.sciencedirect.com/science/ 17 article/pii/S014036641830625X
doi:https://doi.org/10.1016/j.comcom.2019.02.005. URL http://www.sciencedirect.com/science/ 17 article/pii/S014036641830625X
-
[40]
C. for Applied Internet Data Analysis, The caida anonymized internet traces, online;accessed 2017 (2017). URL \url{http://www.caida.org/data/passive}
work page 2017
-
[41]
N. technology key labratory of Jiangsu Province(Southeast University), Ip trace and ser- vice (iptas), http://iptas.edu.cn/src/system.php, Online;accessed 2017 (2017). URL \url{http://iptas.edu.cn/src/system.php} 18
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.