READ: a three-communicating-stage distributed super points detections algorithm

Jie Xu

arxiv: 1907.08057 · v1 · pith:36C33U6Xnew · submitted 2019-07-18 · 💻 cs.DC · cs.NI

READ: a three-communicating-stage distributed super points detections algorithm

Jie Xu This is my paper

Pith reviewed 2026-05-24 19:30 UTC · model grok-4.3

classification 💻 cs.DC cs.NI

keywords super point detectiondistributed algorithmnetwork traffic analysiscardinality estimationcommunication overheadrough estimatorlinear estimatorasynchronous scanning

0 comments

The pith

The READ algorithm detects super points in distributed networks with accuracy matching or exceeding single-node methods while cutting communication to under 5 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents READ as a three-stage communication method for finding super points across multiple network nodes. Each node scans traffic asynchronously using a Rough Estimator to flag candidate hosts and a Linear Estimator to count their distinct connections accurately. The authors prove that the distributed accuracy stays at least as high as a centralized single-node run. This matters for large-scale networks because sending all raw traffic to one place creates prohibitive communication costs. Experiments on real 10 Gb/s and 40 Gb/s traces confirm both the accuracy claim and the communication savings.

Core claim

READ proves that its distributed detection accuracy is no less than single-node accuracy. It generates candidate super points with the Rough Estimator, estimates their cardinalities with the Linear Estimator, and exchanges data in three communication stages after each node finishes an asynchronous scan of IP pairs within a time window. Tests on four groups of high-speed real traffic show higher accuracy than prior methods alongside communication costs below 5 percent of existing algorithms.

What carries the argument

Three-stage communication after asynchronous scans, using Rough Estimator to produce candidate sets and Linear Estimator to compute cardinalities.

If this is right

Distributed accuracy is guaranteed to meet or exceed single-node accuracy on the same traffic.
Communication volume stays below 5 percent of existing distributed super point algorithms.
The approach works on real 10 Gb/s and 40 Gb/s network traces without centralizing all data.
Super points are identified correctly by separating rough candidate selection from precise cardinality estimation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The staged estimator approach could apply to other distributed tasks that need low-communication cardinality estimates.
Real-time security monitoring systems might adopt similar three-stage exchanges to track anomalous hosts at larger scales.
Reducing the number of stages or replacing the estimators could be tested to lower communication even further.

Load-bearing premise

The errors from the Rough Estimator and Linear Estimator do not compound across asynchronous scans and the three communication stages enough to make distributed accuracy worse than single-node accuracy.

What would settle it

Running the same traffic traces through READ once in distributed mode and once in single-node mode and measuring whether distributed accuracy falls below single-node accuracy.

Figures

Figures reproduced from arXiv: 1907.08057 by Jie Xu.

**Figure 1.** Figure 1: The observation node on network boarder 2.2. Cardinality Estimation Cardinality is an important attribute in network research[11]. At the same time, the calculation of cardinality is also the basis of super point detection[12]. Therefore, this sub section introduces the algorithm of hosts cardinality estimating[13]. There are many cardinality estimating algorithms, such as PCSA algorithm[14], HyperLog… view at source ↗

**Figure 2.** Figure 2: Super point detection in distributed environmentr [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Structure of RE cube The u RE associating with a are located in the same REA. READ divides A into two parts: the first part is r bits on the right (Right Part, RP), and the second part is 32-r bits on the left (Left Part, LP). READ selects a REA in the REC based on the IP of a. REC has 2r REA, so the RP of a can determine only one REA in the REC. READ divides A into 2r subsets according to r bits on the r… view at source ↗

**Figure 4.** Figure 4: Locate RE by left part of IP address When selecting bits from La as I i a , READ first determines which bit in La is I i a [0], and then calculates the other bits in I i a . Let bi denote the index of the 0th bit of I i a in La, i.e. I i a [0]=La [bi ]. Each bit of I i a is calculated according to the following formula: I i a [j] = La[(bi+j)mod(32−r)], 0 ≤ j ≤ vi−1 (3) bi0iu−1is a parameter of READ, which… view at source ↗

**Figure 6.** Figure 6: Collect REC from observation nodes After each observation node has scanned all IP address pair in a time window, only the REC needs to be sent to the global server. The global server merges all the collected REC. The merging method is to merge the RE of different observation nodes in a “bit or” manner. In this paper, the way of combining according to “bit or” is called external merging, and the way of co… view at source ↗

**Figure 7.** Figure 7: Example of restoring LP with depth-first method [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Super point detection in distributed environmentr [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: shows the time of IP address pairs scanning (GScanT). The graph shows that the GScanT of READ is slightly higher than that of SRLA algorithm. However, the GScanT of each algorithm is not more than 4 seconds, which can process 40 Gb/s of high-speed network traffic in real time. 0.4 0.5 0.6 0.7 13:00 13:05 13:10 13:15 13:20 13:25 13:30 13:35 13:40 13:45 13:50 13:55 Time window starting minute GScanT(s) DCD… view at source ↗

**Figure 10.** Figure 10: shows the time of candidate super point cardinality estimation (GEstT). The graph shows that GEstT of READ is close to DCDS, VBFA and SRLA algorithm, much lower than CSE, and GEstT of READ is not higher than 2.5 seconds. Therefore, READ can detect super points in real-time from 40Gb/s high-speed network. 0.0 2.5 5.0 7.5 10.0 13:00 13:05 13:10 13:15 13:20 13:25 13:30 13:35 13:40 13:45 13:50 13:55 Time wind… view at source ↗

read the original abstract

A super point is a host that interacts with a far larger number of counterparts in the network over a period of time. Super point detection plays an important role in network research and application. With the increase of network scale, distributed super point detection has become a hot research topic. Compared with single-node super point detection algorithm, the difficulty of super point detection in multi-node distributed environment is how to reduce communication overhead. Therefore, this paper proposes a three-stage communication distributed super point detection algorithm: Rough Estimator based Asynchronous Distributed super point detection algorithm (READ). READ uses a lightweight estimator, the Rough Estimator (RE), which is fast in computation and takes less memory to generate candidate super point. At the same time, the Linear Estimator (LE) is used to accurately estimate the cardinality of each candidate super point, so as to detect the super point correctly. In READ, each node scans IP address pairs asynchronously. When reaching the time window boundary, READ starts three-stage communication to detect the super point. In this paper, we proof that the accuracy of READ in distributed environment is no less than that in the single node environment. Four groups of 10 Gb/s and 40 Gb/s real-world high-speed network traffic are used to test READ. The experimental results show that READ not only has higher accuracy in distributed environment, but also has less than 5% of communication burden compared with existing algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

READ presents a concrete three-stage protocol for distributed super-point detection that claims single-node accuracy at under 5% communication cost, but the proof needs full verification.

read the letter

READ gives a practical three-stage protocol for distributed superpoint detection but its accuracy proof needs the full details checked. The paper introduces READ, which runs a Rough Estimator on each node to find candidate super points from asynchronous IP pair scans, then uses a Linear Estimator for accurate cardinality after three communication stages at window boundaries. The authors prove that this keeps accuracy no lower than a single-node version and show experiments on real 10 and 40 Gb/s traces where communication drops below 5% of prior methods. What works is the concrete design for reducing communication in a multi-node setup while testing on actual high-speed traffic. That combination is useful for the network monitoring community. The main soft spot is the accuracy guarantee. The claim that distributed accuracy matches or exceeds single-node rests on the three stages correctly merging candidates without missing split-traffic super points or letting errors add up from async scans. The abstract states the proof but does not show the steps or bounds, so that part needs verification from the full manuscript. The experimental results would also be stronger with more visible methodology on how accuracy was computed and what the baselines were. This is for people building or studying distributed systems for traffic analysis who want a low-communication detector. It has a clear new protocol and real data tests, so it deserves a serious referee to go over the proof and the numbers. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper proposes READ, a three-stage communication protocol for distributed super-point detection in high-speed networks. Each node runs an asynchronous scan using a lightweight Rough Estimator (RE) to produce candidate sets, followed by a Linear Estimator (LE) for cardinality estimation of candidates; three communication stages merge results at window boundaries. The central claim is a proof that distributed accuracy is at least as high as single-node accuracy, supported by experiments on four real 10 Gb/s and 40 Gb/s traces showing higher accuracy and <5% communication burden relative to prior algorithms.

Significance. If the accuracy proof holds and the communication savings are reproducible, the result would be useful for scalable network monitoring where bandwidth is constrained. The evaluation on high-rate real traces is a strength; the parameter-free nature of the claimed proof (no fitted values) would also be a positive if demonstrated.

major comments (2)

[Abstract] Abstract (proof claim): the manuscript asserts a proof that 'the accuracy of READ in distributed environment is no less than that in the single node environment,' yet provides no derivation, recovery-probability bound, or analysis of how RE false negatives on split-traffic super points are provably recovered by the three-stage merge under asynchronous node scans. This assumption is load-bearing for the central claim.
[Abstract] Abstract (experimental claim): the statement that READ 'has higher accuracy in distributed environment' and '<5% of communication burden' is presented without reference to the specific tables, figures, or comparison baselines that would allow verification of the metrics or data-exclusion rules.

minor comments (2)

[Title, Abstract] Title and abstract contain grammatical issues ('detections algorithm', 'we proof') that should be corrected for clarity.
[Abstract] The Rough Estimator and Linear Estimator are referenced without an initial formal definition or pointer to their prior definitions/equations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on the abstract. We agree that the abstract would benefit from clearer pointers to the supporting material in the manuscript and will revise accordingly. Below we respond point by point.

read point-by-point responses

Referee: [Abstract] Abstract (proof claim): the manuscript asserts a proof that 'the accuracy of READ in distributed environment is no less than that in the single node environment,' yet provides no derivation, recovery-probability bound, or analysis of how RE false negatives on split-traffic super points are provably recovered by the three-stage merge under asynchronous node scans. This assumption is load-bearing for the central claim.

Authors: The full derivation, including the recovery-probability bound for RE false negatives on split-traffic super points and the argument that the three-stage merge preserves accuracy under asynchronous scans, appears in Section 3.3 of the manuscript. The abstract states the result but does not cite the section. We will revise the abstract to include a one-sentence pointer to Section 3.3 and a brief outline of the key recovery step. revision: yes
Referee: [Abstract] Abstract (experimental claim): the statement that READ 'has higher accuracy in distributed environment' and '<5% of communication burden' is presented without reference to the specific tables, figures, or comparison baselines that would allow verification of the metrics or data-exclusion rules.

Authors: The accuracy and communication results are reported in Tables 2–5 and Figures 3–6, using the four 10 Gb/s and 40 Gb/s traces described in Section 5, with direct comparison to the algorithms listed in Table 1. We will revise the abstract to add explicit references such as “(Tables 2–3, <5 % communication vs. baselines in Table 1)” so that the claims can be verified without ambiguity. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper claims a proof that READ accuracy in the distributed setting is no less than single-node accuracy, using Rough Estimator and Linear Estimator with three-stage communication. No equations, fitted parameters, self-citations, or ansatzes are exhibited that would reduce this proof to a tautology or input by construction. The accuracy claim is presented as an independent mathematical argument rather than a statistical fit or renamed pattern, so the derivation is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations, no fitted constants, and no explicit background assumptions; the algorithm introduces the Rough Estimator and Linear Estimator but gives no information on whether they contain free parameters or rest on unstated statistical assumptions about traffic.

pith-pipeline@v0.9.0 · 5776 in / 1165 out tokens · 18526 ms · 2026-05-24T19:30:19.801403+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 1 internal anchor

[1]

READ: a three-communicating-stage distributed super points detections algorithm

Introduction The Internet is one of the most important in- frastructures of the modern information society. With the rapid development of China’s economy, the bandwidth of core network is increasing year by year. According to the latest statistics of China In- ternet Information Center (CNNIC), as of Decem- ber 2018, China’s international export bandwidth...

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

1” starting from the right. For example, the binary formatter of integer 200 is “11001000

Related work Super point detection is a hotspot in the ﬁeld of network research and management. For the sake of narrative convenience, this section ﬁrst gives rele- vant deﬁnitions. 2.1. Related deﬁnitions All of the super point detection algorithms are based on network traﬃc and belong to passive net- work measurement. The original data used in the algor...

work page
[3]

For example, a campus network access to multiple In- ternet Service Provider(ISP)

Distributed super point detection model and diﬃculty A network connected to the Internet may have multiple border routers, as shown in Figure 2. For example, a campus network access to multiple In- ternet Service Provider(ISP). Assuming that there is an observation node at each border router. Traf- ﬁc can be observed and analyzed independently on each nod...

work page
[4]

bit or” manner. In this paper, the way of com- bining according to “bit or

RE based distributed super points detec- tion algorithm READ In this section, we will introduce our low com- munication overhead distributed super points de- tection algorithm Rough Estimator based Asyn- chronous Distributed super points detection algo- rithm(READ). 4.1. Principle of READ READ uses a data structure that can recover candidate super points ...

work page
[5]

Test whether c0 0 and c1 0 come from the same IP address, as shown in Figure 7. 𝒸0 0 𝒸1 0 𝒸2 0𝒞0 𝒸0 1 𝒸1 1𝒞1 𝒸0 2 𝒸1 2 𝒸2 2𝒞2 <𝒸0 0, 𝒸1 1> <𝒸0 0, 𝒸1 1, 𝒸1 2 > Figure 7: Example of restoring LP with depth-ﬁrst method The four bits on the left of c0 0 are diﬀerent from the four bits on the right of c1 0, so c0 0 and c1 0 come from diﬀerent IP addresses. The...

work page
[6]

In C2, the four bits on the right side of c2 0 are the same as the four bits on the left side of c1 1, but the four bits on the left side of c2 0 are not equal to the four bits on the right side of c0 0, so c2 0 cannot form a candidate RE tuple with c0 0 and c1

work page
[7]

In C2, not only are the four bits on the right side the same as the four bits on the left side of c1 1, but also the four bits on the left side of c2 1 the same as the four bits on the right side of c0

work page
[8]

000101 1110 010001 1100 010101 0101

Therefore, < c0 0,c1 1,c2 1 > constitutes a candidate RE tuple. From the values of c0 0, c1 1 and c2 1, we can see that the RE associating with the candi- date RE tuple is Rl 0,12629,2, Rl 1,14620,2 , Rl 2,5214,2. If the cardinality estimated from the inner merge RE, Rl 0,12629,2 ⨀ Rl 1,14620,2 ⨀ Rl 2,5214,2, still over the threshold, 30 bits of the left ...

work page
[9]

1”. Then there is no row in β whose bits are all “0

At this time, each row may contains one or more bits with value “1”. For example, when n=3,ˆu = 3,β =   1 0 0 0 1 0 0 0 1  , ∏ˆu−1 i=0 ∑n−1 l=0 βl i = 1, but∑n−1 l=0 ∏ˆu−1 i=0 βl i = 0. When ∑n−1 l=0 ∏ˆu−1 i=0 βl i = 1, ∏ˆu−1 i=0 ∑n−1 l=0 βl i also equals to 1. Because when ∑n−1 l=0 ∏ˆu−1 i=0 βl i = 1, at least one column in β has all bits with value ...

work page
[10]

The master data structure at the observation node consists of two parts: REC and LEA

Distributed super points detection under sliding time window READ only scans IP address pairs at each ob- servation node, so only sliding window counter is needed to record opposite hosts incrementally at the observation node. The master data structure at the observation node consists of two parts: REC and LEA. The estimators of REC and LEA are RE and LE,...

work page
[11]

The exper- iment analyzes READ from the aspects of detec- tion error rate, memory usage and running time

Experiments and analysis In order to test the performance of READ, four groups of high-speed network traﬃc are used to 13 carry out experiments in this section. The exper- iment analyzes READ from the aspects of detec- tion error rate, memory usage and running time. We compared READ with DCDS, VBFA, CSE and SRLA. 6.1. Experiment data In this paper, four g...

work page arXiv 2015
[12]

REC is a three- dimensional structure of RE

Conclusion READ uses REC to generate candidate super points in distributed environment. REC is a three- dimensional structure of RE. Because RE has the characteristics of small memory occupation and fast computing speed, REC can generate candidate su- per points from 40Gb/s high-speed network with only 3MB of memory. LEA is used to estimate the cardinalit...

work page
[13]

C. I. N. I. CenterCNNIC, China internet network development statistic report(43th) (Feb. 2019). URL http://www.cac.gov.cn/2019-02/28/c_ 1124175677.htm 16

work page 2019
[14]

Ai-ping, Research on the key issues of traﬃc mea- surement in high-speed networks, Ph.D

Z. Ai-ping, Research on the key issues of traﬃc mea- surement in high-speed networks, Ph.D. thesis, South- east University (2015)

work page 2015
[15]

Kucera, L

J. Kucera, L. Kekely, A. Piecek, J. Korenek, General ids acceleration for high-speed networks, in: 2018 IEEE 36th International Conference on Computer Design (ICCD), 2018, pp. 366–373. doi:10.1109/ICCD.2018.00062

work page doi:10.1109/iccd.2018.00062 2018
[16]

Venkataraman, D

S. Venkataraman, D. Song, P. B. Gibbons, A. Blum, New streaming algorithms for fast detection of super- spreaders, in: in Proceedings of Network and Dis- tributed System Security Symposium (NDSS, 2005, pp. 149–166

work page 2005
[17]

C. Modi, D. Patel, B. Borisaniya, H. Patel, A. Pa- tel, M. Rajarajan, A survey of intrusion detec- tion techniques in cloud, Journal of Network and Computer Applications 36 (1) (2013) 42 – 57. doi:http://doi.org/10.1016/j.jnca.2012.05.003. URL http://www.sciencedirect.com/science/ article/pii/S1084804512001178

work page doi:10.1016/j.jnca.2012.05.003 2013
[18]

Kamiyama, T

N. Kamiyama, T. Mori, R. Kawahara, Simple and adap- tive identiﬁcation of superspreaders by ﬂow sampling, in: IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications, 2007, pp. 2481–2485. doi:10.1109/INFCOM.2007.305

work page doi:10.1109/infcom.2007.305 2007
[19]

P. Wang, X. Guan, T. Qin, Q. Huang, A data stream- ing method for monitoring host connection degrees of high-speed links, IEEE Transactions on Informa- tion Forensics and Security 6 (3) (2011) 1086–1098. doi:10.1109/TIFS.2011.2123094

work page doi:10.1109/tifs.2011.2123094 2011
[20]

W. Liu, W. Qu, J. Gong, K. Li, Detection of superpoints using a vector bloom ﬁlter, IEEE Transactions on In- formation Forensics and Security 11 (3) (2016) 514–527. doi:10.1109/TIFS.2015.2503269

work page doi:10.1109/tifs.2015.2503269 2016
[21]

M. Yoon, T. Li, S. Chen, J.-K. Peir, Fit a com- pact spread estimator in small high-speed memory, IEEE/ACM Trans. Netw. 19 (5) (2011) 1253–1264. doi:10.1109/TNET.2010.2080285. URL http://dx.doi.org/10.1109/TNET.2010.2080285

work page doi:10.1109/tnet.2010.2080285 2011
[22]

Z. Liu, R. Wang, M. Tao, X. Cai, A class-oriented feature selection approach for multi-class imbalanced network traﬃc datasets based on local and global metrics fusion, Neurocomputing 168 (2015) 365 – 381. doi:https://doi.org/10.1016/j.neucom.2015.05.089. URL http://www.sciencedirect.com/science/ article/pii/S0925231215007870

work page doi:10.1016/j.neucom.2015.05.089 2015
[23]

Zheng, M

Y. Zheng, M. Li, Towards more eﬃcient cardinal- ity estimation for large-scale rﬁd systems, IEEE/ACM Transactions on Networking 22 (6) (2014) 1886–1896. doi:10.1109/TNET.2013.2288352

work page doi:10.1109/tnet.2013.2288352 2014
[24]

H. Adam, E. Yanmaz, C. Bettstetter, Contention- based estimation of neighbor cardinality, IEEE Trans- actions on Mobile Computing 12 (3) (2013) 542–555. doi:10.1109/TMC.2012.19

work page doi:10.1109/tmc.2012.19 2013
[25]

B. Li, Y. He, W. Liu, Towards constant-time cardinality estimation for large-scale rﬁd systems, in: 2015 44th International Conference on Parallel Processing, 2015, pp. 809–818. doi:10.1109/ICPP.2015.90

work page doi:10.1109/icpp.2015.90 2015
[26]

Flajolet, G

P. Flajolet, G. N. Martin, Probabilistic counting, in: 24th Annual Symposium on Foundations of Computer Science (sfcs 1983), 1983, pp. 76–82. doi:10.1109/SFCS.1983.46

work page doi:10.1109/sfcs.1983.46 1983
[27]

Flajolet, E

P. Flajolet, E. Fusy, O. Gandouet, F. Meunier, Hy- perLogLog: the analysis of a near-optimal cardinality estimation algorithm, in: P. Jacquet (Ed.), Analysis of Algorithms 2007 (AofA07), Juan les pins, France, 2007, pp. 127–146. URL https://hal.archives-ouvertes.fr/ hal-00406166

work page 2007
[28]

Whang, B

K.-Y. Whang, B. T. Vander-Zanden, H. M. Tay- lor, A linear-time probabilistic counting algorithm for database applications, ACM Trans. Database Syst. 15 (2) (1990) 208–229. doi:10.1145/78922.78925. URL http://doi.acm.org/10.1145/78922.78925

work page doi:10.1145/78922.78925 1990
[29]

J. Xu, W. Ding, J. Gong, X. Hu, J. Liu, High speed net- work super points detection based on sliding time win- dow by gpu, in: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017, pp. 566–573. doi:10.1109/ISPA/IUCC.2017.00092

work page doi:10.1109/ispa/iucc.2017.00092 2017
[30]

J. Xu, W. Ding, J. Gong, X. Hu, S. Sun, SRLA: A real time sliding time window super point cardinal- ity estimation algorithm for high speed network based on gpu, in: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science a...

work page doi:10.1109/hpcc/smartcity/dss.2018.00156 2018
[31]

J. Xu, W. Ding, Q. Gong, X. Hu, H. Yu, A super point detection algorithm under sliding time windows based on rough and linear estimators, IEEE Access 7 (2019) 43414–43427. doi:10.1109/ACCESS.2019.2908226

work page doi:10.1109/access.2019.2908226 2019
[32]

B. Coskun, (un)wisdom of crowds: Accurately spotting malicious ip clusters using not-so-accurate ip blacklists, IEEE Transactions on Information Forensics and Security 12 (6) (2017) 1406–1417. doi:10.1109/TIFS.2017.2663333

work page doi:10.1109/tifs.2017.2663333 2017
[33]

Cianfrani, V

A. Cianfrani, V. Eramo, M. Listanti, M. Polverini, A. V. Vasilakos, An ospf-integrated routing strat- egy for qos-aware energy saving in ip back- bone networks, IEEE Transactions on Network and Service Management 9 (3) (2012) 254–267. doi:10.1109/TNSM.2012.031512.110165

work page doi:10.1109/tnsm.2012.031512.110165 2012
[34]

Cheng, Y

G. Cheng, Y. Tang, Line speed accurate superspreader identiﬁcation using dynamic error compensation, Com- puter Communications 36 (13) (2013) 1460 – 1470. doi:http://doi.org/10.1016/j.comcom.2013.05.006. URL http://www.sciencedirect.com/science/ article/pii/S0140366413001400

work page doi:10.1016/j.comcom.2013.05.006 2013
[35]

Xiao, X.-G

L. Xiao, X.-G. Xia, A new robust chinese re- mainder theorem with improved performance in frequency estimation from undersampled wave- forms, Signal Processing 117 (2015) 242 – 246. doi:https://doi.org/10.1016/j.sigpro.2015.05.017. URL http://www.sciencedirect.com/science/ article/pii/S0165168415001954

work page doi:10.1016/j.sigpro.2015.05.017 2015
[36]

Christensen, A

K. Christensen, A. Roginsky, M. Jimeno, A new analysis of the false positive rate of a bloom ﬁlter, Information Processing Letters 110 (21) (2010) 944 –

work page 2010
[37]

URL http://www.sciencedirect.com/science/ article/pii/S0020019010002425

doi:http://dx.doi.org/10.1016/j.ipl.2010.07.024. URL http://www.sciencedirect.com/science/ article/pii/S0020019010002425

work page doi:10.1016/j.ipl.2010.07.024 2010
[38]

J. Xu, W. Ding, X. Hu, Q. Gong, Vate: A trade- oﬀ between memory and preserving time for high accurate cardinality estimation under sliding time window, Computer Communications 138 (2019) 20 –

work page 2019
[39]

URL http://www.sciencedirect.com/science/ 17 article/pii/S014036641830625X

doi:https://doi.org/10.1016/j.comcom.2019.02.005. URL http://www.sciencedirect.com/science/ 17 article/pii/S014036641830625X

work page doi:10.1016/j.comcom.2019.02.005 2019
[40]

for Applied Internet Data Analysis, The caida anonymized internet traces, online;accessed 2017 (2017)

C. for Applied Internet Data Analysis, The caida anonymized internet traces, online;accessed 2017 (2017). URL \url{http://www.caida.org/data/passive}

work page 2017
[41]

technology key labratory of Jiangsu Province(Southeast University), Ip trace and ser- vice (iptas), http://iptas.edu.cn/src/system.php, Online;accessed 2017 (2017)

N. technology key labratory of Jiangsu Province(Southeast University), Ip trace and ser- vice (iptas), http://iptas.edu.cn/src/system.php, Online;accessed 2017 (2017). URL \url{http://iptas.edu.cn/src/system.php} 18

work page 2017

[1] [1]

READ: a three-communicating-stage distributed super points detections algorithm

Introduction The Internet is one of the most important in- frastructures of the modern information society. With the rapid development of China’s economy, the bandwidth of core network is increasing year by year. According to the latest statistics of China In- ternet Information Center (CNNIC), as of Decem- ber 2018, China’s international export bandwidth...

work page internal anchor Pith review Pith/arXiv arXiv 2018

[2] [2]

1” starting from the right. For example, the binary formatter of integer 200 is “11001000

Related work Super point detection is a hotspot in the ﬁeld of network research and management. For the sake of narrative convenience, this section ﬁrst gives rele- vant deﬁnitions. 2.1. Related deﬁnitions All of the super point detection algorithms are based on network traﬃc and belong to passive net- work measurement. The original data used in the algor...

work page

[3] [3]

For example, a campus network access to multiple In- ternet Service Provider(ISP)

Distributed super point detection model and diﬃculty A network connected to the Internet may have multiple border routers, as shown in Figure 2. For example, a campus network access to multiple In- ternet Service Provider(ISP). Assuming that there is an observation node at each border router. Traf- ﬁc can be observed and analyzed independently on each nod...

work page

[4] [4]

bit or” manner. In this paper, the way of com- bining according to “bit or

RE based distributed super points detec- tion algorithm READ In this section, we will introduce our low com- munication overhead distributed super points de- tection algorithm Rough Estimator based Asyn- chronous Distributed super points detection algo- rithm(READ). 4.1. Principle of READ READ uses a data structure that can recover candidate super points ...

work page

[5] [5]

Test whether c0 0 and c1 0 come from the same IP address, as shown in Figure 7. 𝒸0 0 𝒸1 0 𝒸2 0𝒞0 𝒸0 1 𝒸1 1𝒞1 𝒸0 2 𝒸1 2 𝒸2 2𝒞2 <𝒸0 0, 𝒸1 1> <𝒸0 0, 𝒸1 1, 𝒸1 2 > Figure 7: Example of restoring LP with depth-ﬁrst method The four bits on the left of c0 0 are diﬀerent from the four bits on the right of c1 0, so c0 0 and c1 0 come from diﬀerent IP addresses. The...

work page

[6] [6]

In C2, the four bits on the right side of c2 0 are the same as the four bits on the left side of c1 1, but the four bits on the left side of c2 0 are not equal to the four bits on the right side of c0 0, so c2 0 cannot form a candidate RE tuple with c0 0 and c1

work page

[7] [7]

In C2, not only are the four bits on the right side the same as the four bits on the left side of c1 1, but also the four bits on the left side of c2 1 the same as the four bits on the right side of c0

work page

[8] [8]

000101 1110 010001 1100 010101 0101

Therefore, < c0 0,c1 1,c2 1 > constitutes a candidate RE tuple. From the values of c0 0, c1 1 and c2 1, we can see that the RE associating with the candi- date RE tuple is Rl 0,12629,2, Rl 1,14620,2 , Rl 2,5214,2. If the cardinality estimated from the inner merge RE, Rl 0,12629,2 ⨀ Rl 1,14620,2 ⨀ Rl 2,5214,2, still over the threshold, 30 bits of the left ...

work page

[9] [9]

1”. Then there is no row in β whose bits are all “0

At this time, each row may contains one or more bits with value “1”. For example, when n=3,ˆu = 3,β =   1 0 0 0 1 0 0 0 1  , ∏ˆu−1 i=0 ∑n−1 l=0 βl i = 1, but∑n−1 l=0 ∏ˆu−1 i=0 βl i = 0. When ∑n−1 l=0 ∏ˆu−1 i=0 βl i = 1, ∏ˆu−1 i=0 ∑n−1 l=0 βl i also equals to 1. Because when ∑n−1 l=0 ∏ˆu−1 i=0 βl i = 1, at least one column in β has all bits with value ...

work page

[10] [10]

The master data structure at the observation node consists of two parts: REC and LEA

Distributed super points detection under sliding time window READ only scans IP address pairs at each ob- servation node, so only sliding window counter is needed to record opposite hosts incrementally at the observation node. The master data structure at the observation node consists of two parts: REC and LEA. The estimators of REC and LEA are RE and LE,...

work page

[11] [11]

The exper- iment analyzes READ from the aspects of detec- tion error rate, memory usage and running time

Experiments and analysis In order to test the performance of READ, four groups of high-speed network traﬃc are used to 13 carry out experiments in this section. The exper- iment analyzes READ from the aspects of detec- tion error rate, memory usage and running time. We compared READ with DCDS, VBFA, CSE and SRLA. 6.1. Experiment data In this paper, four g...

work page arXiv 2015

[12] [12]

REC is a three- dimensional structure of RE

Conclusion READ uses REC to generate candidate super points in distributed environment. REC is a three- dimensional structure of RE. Because RE has the characteristics of small memory occupation and fast computing speed, REC can generate candidate su- per points from 40Gb/s high-speed network with only 3MB of memory. LEA is used to estimate the cardinalit...

work page

[13] [13]

C. I. N. I. CenterCNNIC, China internet network development statistic report(43th) (Feb. 2019). URL http://www.cac.gov.cn/2019-02/28/c_ 1124175677.htm 16

work page 2019

[14] [14]

Ai-ping, Research on the key issues of traﬃc mea- surement in high-speed networks, Ph.D

Z. Ai-ping, Research on the key issues of traﬃc mea- surement in high-speed networks, Ph.D. thesis, South- east University (2015)

work page 2015

[15] [15]

Kucera, L

J. Kucera, L. Kekely, A. Piecek, J. Korenek, General ids acceleration for high-speed networks, in: 2018 IEEE 36th International Conference on Computer Design (ICCD), 2018, pp. 366–373. doi:10.1109/ICCD.2018.00062

work page doi:10.1109/iccd.2018.00062 2018

[16] [16]

Venkataraman, D

S. Venkataraman, D. Song, P. B. Gibbons, A. Blum, New streaming algorithms for fast detection of super- spreaders, in: in Proceedings of Network and Dis- tributed System Security Symposium (NDSS, 2005, pp. 149–166

work page 2005

[17] [17]

C. Modi, D. Patel, B. Borisaniya, H. Patel, A. Pa- tel, M. Rajarajan, A survey of intrusion detec- tion techniques in cloud, Journal of Network and Computer Applications 36 (1) (2013) 42 – 57. doi:http://doi.org/10.1016/j.jnca.2012.05.003. URL http://www.sciencedirect.com/science/ article/pii/S1084804512001178

work page doi:10.1016/j.jnca.2012.05.003 2013

[18] [18]

Kamiyama, T

N. Kamiyama, T. Mori, R. Kawahara, Simple and adap- tive identiﬁcation of superspreaders by ﬂow sampling, in: IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications, 2007, pp. 2481–2485. doi:10.1109/INFCOM.2007.305

work page doi:10.1109/infcom.2007.305 2007

[19] [19]

P. Wang, X. Guan, T. Qin, Q. Huang, A data stream- ing method for monitoring host connection degrees of high-speed links, IEEE Transactions on Informa- tion Forensics and Security 6 (3) (2011) 1086–1098. doi:10.1109/TIFS.2011.2123094

work page doi:10.1109/tifs.2011.2123094 2011

[20] [20]

W. Liu, W. Qu, J. Gong, K. Li, Detection of superpoints using a vector bloom ﬁlter, IEEE Transactions on In- formation Forensics and Security 11 (3) (2016) 514–527. doi:10.1109/TIFS.2015.2503269

work page doi:10.1109/tifs.2015.2503269 2016

[21] [21]

M. Yoon, T. Li, S. Chen, J.-K. Peir, Fit a com- pact spread estimator in small high-speed memory, IEEE/ACM Trans. Netw. 19 (5) (2011) 1253–1264. doi:10.1109/TNET.2010.2080285. URL http://dx.doi.org/10.1109/TNET.2010.2080285

work page doi:10.1109/tnet.2010.2080285 2011

[22] [22]

Z. Liu, R. Wang, M. Tao, X. Cai, A class-oriented feature selection approach for multi-class imbalanced network traﬃc datasets based on local and global metrics fusion, Neurocomputing 168 (2015) 365 – 381. doi:https://doi.org/10.1016/j.neucom.2015.05.089. URL http://www.sciencedirect.com/science/ article/pii/S0925231215007870

work page doi:10.1016/j.neucom.2015.05.089 2015

[23] [23]

Zheng, M

Y. Zheng, M. Li, Towards more eﬃcient cardinal- ity estimation for large-scale rﬁd systems, IEEE/ACM Transactions on Networking 22 (6) (2014) 1886–1896. doi:10.1109/TNET.2013.2288352

work page doi:10.1109/tnet.2013.2288352 2014

[24] [24]

H. Adam, E. Yanmaz, C. Bettstetter, Contention- based estimation of neighbor cardinality, IEEE Trans- actions on Mobile Computing 12 (3) (2013) 542–555. doi:10.1109/TMC.2012.19

work page doi:10.1109/tmc.2012.19 2013

[25] [25]

B. Li, Y. He, W. Liu, Towards constant-time cardinality estimation for large-scale rﬁd systems, in: 2015 44th International Conference on Parallel Processing, 2015, pp. 809–818. doi:10.1109/ICPP.2015.90

work page doi:10.1109/icpp.2015.90 2015

[26] [26]

Flajolet, G

P. Flajolet, G. N. Martin, Probabilistic counting, in: 24th Annual Symposium on Foundations of Computer Science (sfcs 1983), 1983, pp. 76–82. doi:10.1109/SFCS.1983.46

work page doi:10.1109/sfcs.1983.46 1983

[27] [27]

Flajolet, E

P. Flajolet, E. Fusy, O. Gandouet, F. Meunier, Hy- perLogLog: the analysis of a near-optimal cardinality estimation algorithm, in: P. Jacquet (Ed.), Analysis of Algorithms 2007 (AofA07), Juan les pins, France, 2007, pp. 127–146. URL https://hal.archives-ouvertes.fr/ hal-00406166

work page 2007

[28] [28]

Whang, B

K.-Y. Whang, B. T. Vander-Zanden, H. M. Tay- lor, A linear-time probabilistic counting algorithm for database applications, ACM Trans. Database Syst. 15 (2) (1990) 208–229. doi:10.1145/78922.78925. URL http://doi.acm.org/10.1145/78922.78925

work page doi:10.1145/78922.78925 1990

[29] [29]

J. Xu, W. Ding, J. Gong, X. Hu, J. Liu, High speed net- work super points detection based on sliding time win- dow by gpu, in: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017, pp. 566–573. doi:10.1109/ISPA/IUCC.2017.00092

work page doi:10.1109/ispa/iucc.2017.00092 2017

[30] [30]

J. Xu, W. Ding, J. Gong, X. Hu, S. Sun, SRLA: A real time sliding time window super point cardinal- ity estimation algorithm for high speed network based on gpu, in: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science a...

work page doi:10.1109/hpcc/smartcity/dss.2018.00156 2018

[31] [31]

J. Xu, W. Ding, Q. Gong, X. Hu, H. Yu, A super point detection algorithm under sliding time windows based on rough and linear estimators, IEEE Access 7 (2019) 43414–43427. doi:10.1109/ACCESS.2019.2908226

work page doi:10.1109/access.2019.2908226 2019

[32] [32]

B. Coskun, (un)wisdom of crowds: Accurately spotting malicious ip clusters using not-so-accurate ip blacklists, IEEE Transactions on Information Forensics and Security 12 (6) (2017) 1406–1417. doi:10.1109/TIFS.2017.2663333

work page doi:10.1109/tifs.2017.2663333 2017

[33] [33]

Cianfrani, V

A. Cianfrani, V. Eramo, M. Listanti, M. Polverini, A. V. Vasilakos, An ospf-integrated routing strat- egy for qos-aware energy saving in ip back- bone networks, IEEE Transactions on Network and Service Management 9 (3) (2012) 254–267. doi:10.1109/TNSM.2012.031512.110165

work page doi:10.1109/tnsm.2012.031512.110165 2012

[34] [34]

Cheng, Y

G. Cheng, Y. Tang, Line speed accurate superspreader identiﬁcation using dynamic error compensation, Com- puter Communications 36 (13) (2013) 1460 – 1470. doi:http://doi.org/10.1016/j.comcom.2013.05.006. URL http://www.sciencedirect.com/science/ article/pii/S0140366413001400

work page doi:10.1016/j.comcom.2013.05.006 2013

[35] [35]

Xiao, X.-G

L. Xiao, X.-G. Xia, A new robust chinese re- mainder theorem with improved performance in frequency estimation from undersampled wave- forms, Signal Processing 117 (2015) 242 – 246. doi:https://doi.org/10.1016/j.sigpro.2015.05.017. URL http://www.sciencedirect.com/science/ article/pii/S0165168415001954

work page doi:10.1016/j.sigpro.2015.05.017 2015

[36] [36]

Christensen, A

K. Christensen, A. Roginsky, M. Jimeno, A new analysis of the false positive rate of a bloom ﬁlter, Information Processing Letters 110 (21) (2010) 944 –

work page 2010

[37] [37]

URL http://www.sciencedirect.com/science/ article/pii/S0020019010002425

doi:http://dx.doi.org/10.1016/j.ipl.2010.07.024. URL http://www.sciencedirect.com/science/ article/pii/S0020019010002425

work page doi:10.1016/j.ipl.2010.07.024 2010

[38] [38]

J. Xu, W. Ding, X. Hu, Q. Gong, Vate: A trade- oﬀ between memory and preserving time for high accurate cardinality estimation under sliding time window, Computer Communications 138 (2019) 20 –

work page 2019

[39] [39]

URL http://www.sciencedirect.com/science/ 17 article/pii/S014036641830625X

doi:https://doi.org/10.1016/j.comcom.2019.02.005. URL http://www.sciencedirect.com/science/ 17 article/pii/S014036641830625X

work page doi:10.1016/j.comcom.2019.02.005 2019

[40] [40]

for Applied Internet Data Analysis, The caida anonymized internet traces, online;accessed 2017 (2017)

C. for Applied Internet Data Analysis, The caida anonymized internet traces, online;accessed 2017 (2017). URL \url{http://www.caida.org/data/passive}

work page 2017

[41] [41]

technology key labratory of Jiangsu Province(Southeast University), Ip trace and ser- vice (iptas), http://iptas.edu.cn/src/system.php, Online;accessed 2017 (2017)

N. technology key labratory of Jiangsu Province(Southeast University), Ip trace and ser- vice (iptas), http://iptas.edu.cn/src/system.php, Online;accessed 2017 (2017). URL \url{http://iptas.edu.cn/src/system.php} 18

work page 2017