Counting Butterflies over Streaming Bipartite Graphs with Duplicate Edges

Chengjie Li; Kai Wang; Lingkai Meng; Long Yuan; Wenjie Zhang; Xuemin Lin

arxiv: 2412.11488 · v2 · submitted 2024-12-16 · 💻 cs.DS

Counting Butterflies over Streaming Bipartite Graphs with Duplicate Edges

Lingkai Meng , Long Yuan , Xuemin Lin , Chengjie Li , Kai Wang , Wenjie Zhang This is my paper

Pith reviewed 2026-05-23 07:19 UTC · model grok-4.3

classification 💻 cs.DS

keywords butterfly countingstreaming bipartite graphsduplicate edgespriority samplingsubgraph estimationmemory efficiency

0 comments

The pith

DEABC uses bucket-based priority sampling to estimate butterfly counts in streaming bipartite graphs with duplicate edges while using less memory.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that a bucket-based priority sampling approach can accurately estimate the number of butterflies in bipartite graph streams that contain duplicate edges. Existing methods either fail on duplicates or use too much memory and have high variance. If the claims hold, this enables efficient monitoring of real-world streams like user interactions without sacrificing accuracy. The method stores only essential sampled data and comes with proofs of unbiasedness.

Core claim

DEABC introduces bucket-based priority sampling that handles duplicate edges by organizing them into buckets, allowing accurate butterfly estimation with reduced memory compared to priority queue methods, while maintaining unbiased estimates and bounded variance as proven rigorously.

What carries the argument

Bucket-based priority sampling, which buckets edges to manage duplicates and samples priorities to estimate butterfly counts without storing all data.

If this is right

Real-time applications like fraud detection can process larger streams with duplicate edges.
The approach achieves higher throughput and lower memory than FABLE.
Provides theoretical guarantees that previous methods lacked in this setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This sampling technique could extend to counting other motifs in streaming graphs with duplicates.
May apply to non-bipartite graphs if adapted for general subgraphs.

Load-bearing premise

That the bucket-based priority sampling produces unbiased estimates and the stated variance bounds even when duplicate edges appear in arbitrary patterns and frequencies.

What would settle it

Apply DEABC to a synthetic stream with a known exact butterfly count and controlled duplicate rates, then check if the estimates fall within the proven variance bounds on average.

Figures

Figures reproduced from arXiv: 2412.11488 by Chengjie Li, Kai Wang, Lingkai Meng, Long Yuan, Wenjie Zhang, Xuemin Lin.

**Figure 1.** Figure 1: Butterflies in A Bipartite Graph and three-paths respectively. This measurement reveals how tightly entities are grouped, making it particularly valuable in social network analysis [2], [5], recommended systems [6], [7], and clustering [8]. Moreover, butterfly counting is crucial for identifying k-bitruss in bipartite graphs, which represents the largest subgraph where each edge is contained in at least k … view at source ↗

**Figure 2.** Figure 2: Relative Error under Different Sampling Sizes [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Throughput over All Datasets Exp-1: Accuracy. We evaluate the accuracy of estimating butterfly counts across all datasets. The relative error was computed for different sample sizes, which ranged from 2 12 to 2 20 edges. For convenience, we represent the sample size using powers of 2. The experiments were repeated 100 times and the average was calculated. The results are shown in [PITH_FULL_IMAGE:figures/… view at source ↗

**Figure 4.** Figure 4: Throughput under Different Sampling Sizes [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 7.** Figure 7: The relative error of ABACUS, FLEET3, and CAS−R increases rapidly as the edge duplication ratio grows, especially when the duplication ratio rises from 0% to 50%. In contrast, the algorithms, FABLE and DEABC, demonstrate significantly better performance, and they are almost unaffected by changes in the edge duplication ratio, while our DEABC still performs better than FABLE. Furthermore, even when handlin… view at source ↗

**Figure 5.** Figure 5: The Estimated Number of Butterflies over Time [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Memory Usage over All Datasets require loading the entire graph into memory, making it difficult to handle large-scale bipartite graphs. The Matrix-Based Butterfly Count method, proposed by Jay A. Acosta et al. [23], is efficient but suffers from unacceptable memory overhead due to storing the adjacency matrix. Although the I/O-efficient butterfly counting algorithm [18], [24] addresses the memory issue, … view at source ↗

**Figure 7.** Figure 7: Relative Error across Varying Edge Duplication Ratios [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

read the original abstract

Bipartite graphs are commonly used to model relationships between two distinct entities in real-world applications, such as user-product interactions, user-movie ratings and collaborations between authors and publications. A butterfly (a 2x2 bi-clique) is a critical substructure in bipartite graphs, playing a significant role in tasks like community detection, fraud detection, and link prediction. As more real-world data is presented in a streaming format, efficiently counting butterflies in streaming bipartite graphs has become increasingly important. However, most existing algorithms typically assume that duplicate edges are absent, which is hard to hold in real-world graph streams, as a result, they tend to sample edges that appear multiple times, leading to inaccurate results. The only algorithm designed to handle duplicate edges is FABLE, but it suffers from significant limitations, including high variance, substantial time complexity, and memory inefficiency due to its reliance on a priority queue. To overcome these limitations, we introduce DEABC (Duplicate-Edge-Aware Butterfly Counting), an innovative method that uses bucket-based priority sampling to accurately estimate the number of butterflies, accounting for duplicate edges. Compared to existing methods, DEABC significantly reduces memory usage by storing only the essential sampled edge data while maintaining high accuracy. We provide rigorous proofs of the unbiasedness and variance bounds for DEABC, ensuring they achieve high accuracy. We compare DEABC with state-of-the-art algorithms on real-world streaming bipartite graphs. The results show that our DEABC outperforms existing methods in memory efficiency and accuracy, while also achieving significantly higher throughput.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DEABC introduces bucket-based sampling to handle duplicates in streaming butterfly counting, but the unbiasedness claim for repeated edges rests on unshown multiplicity handling.

read the letter

The paper's core contribution is DEABC, which replaces FABLE's priority queue with bucket-based priority sampling to estimate butterflies while trying to account for duplicate edges in bipartite streams. It claims lower memory use by keeping only essential sampled data, plus proofs of unbiasedness and variance bounds, and better throughput and accuracy on real graphs than prior methods that either ignore duplicates or suffer high variance.

Referee Report

2 major / 2 minor

Summary. The paper introduces DEABC, a bucket-based priority sampling algorithm for estimating the number of butterflies (2x2 bicliques) in streaming bipartite graphs that may contain duplicate edges. It claims to overcome limitations of prior work (especially FABLE) by storing only essential sampled edge data, achieving lower memory usage while maintaining high accuracy, and supplies proofs of unbiasedness together with variance bounds; experimental results on real-world streams are reported to show superior memory efficiency, accuracy, and throughput.

Significance. If the unbiasedness and variance claims are substantiated, the result would be a practical advance for streaming graph analytics in applications such as fraud detection and community detection, where duplicate edges are common. The explicit handling of duplicates via bucket-based sampling and the provision of formal proofs (if complete) constitute a clear strength over heuristic approaches; the reported experimental superiority on real streams further supports utility.

major comments (2)

[Section 3] Sampling procedure (Section 3, around the definition of bucket-based priority sampling): the central unbiasedness claim for arbitrary duplicate-edge streams requires an explicit derivation of the inclusion probability that accounts for every occurrence of a repeated edge; the manuscript asserts rigorous proofs but does not display the adjusted sampling probability or the expectation calculation (e.g., for a two-edge duplicate stream), which is load-bearing for the claim that DEABC overcomes FABLE's variance issues.
[Section 4] Variance analysis (Section 4, variance-bound theorem): the stated variance bounds are presented without the intermediate steps linking the bucket priority mechanism to the final variance expression; without these steps it is impossible to confirm that the bounds remain valid once multiplicity is incorporated, undermining the accuracy guarantees.

minor comments (2)

[Abstract] The abstract and introduction would benefit from a short statement of the precise time and space complexity of DEABC relative to FABLE.
[Experiments] Figure captions and experimental tables should include error bars or standard deviations for the accuracy metrics to allow direct comparison of variance claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight areas where the presentation of our proofs can be strengthened. We address each major comment below and agree to expand the derivations in the revised manuscript.

read point-by-point responses

Referee: [Section 3] Sampling procedure (Section 3, around the definition of bucket-based priority sampling): the central unbiasedness claim for arbitrary duplicate-edge streams requires an explicit derivation of the inclusion probability that accounts for every occurrence of a repeated edge; the manuscript asserts rigorous proofs but does not display the adjusted sampling probability or the expectation calculation (e.g., for a two-edge duplicate stream), which is load-bearing for the claim that DEABC overcomes FABLE's variance issues.

Authors: We agree that the main text does not display the full step-by-step derivation of the inclusion probability under arbitrary multiplicities. In the revision we will add an explicit calculation of the per-edge inclusion probability that sums over all occurrences of a repeated edge, together with the expectation computation for a two-edge duplicate stream. This will directly link the bucket-based priority sampling rule to the unbiased estimator and clarify the variance reduction relative to FABLE. revision: yes
Referee: [Section 4] Variance analysis (Section 4, variance-bound theorem): the stated variance bounds are presented without the intermediate steps linking the bucket priority mechanism to the final variance expression; without these steps it is impossible to confirm that the bounds remain valid once multiplicity is incorporated, undermining the accuracy guarantees.

Authors: We acknowledge that the intermediate algebraic steps connecting the bucket-priority sampling probabilities to the final variance bound are omitted from the current text. The revision will insert these steps, showing how the multiplicity-adjusted inclusion probabilities propagate through the variance expression and confirming that the stated bounds continue to hold. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on standard sampling proofs without self-referential reduction

full rationale

The provided abstract and context present DEABC as using bucket-based priority sampling with claimed rigorous proofs of unbiasedness and variance bounds for duplicate-edge streams. No equations, fitted parameters, or self-citations are exhibited that reduce any estimator or prediction to its inputs by construction. The approach is described as extending existing streaming methods to handle duplicates, with no evidence of self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations. The derivation chain is therefore self-contained against external sampling theory benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the correctness of an unspecified bucket sampling procedure and on standard assumptions about streaming edge arrival; no explicit free parameters, new entities, or ad-hoc axioms are named in the abstract.

axioms (1)

domain assumption Bucket-based priority sampling yields unbiased butterfly count estimates in the presence of duplicate edges
This is the load-bearing premise asserted when the abstract claims rigorous proofs of unbiasedness.

pith-pipeline@v0.9.0 · 5817 in / 1301 out tokens · 29266 ms · 2026-05-23T07:19:26.194144+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Counting Small Balanced (p,q)-bicliques in Signed Bipartite Graphs
cs.DS 2026-05 unverdicted novelty 6.0

BBVP, a vertex-based pruning algorithm, counts balanced (p,q)-bicliques in signed bipartite graphs with an average 636x speedup over the adapted SBCList++ baseline for p=q=3 on large real-world data.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · cited by 1 Pith paper

[1]

Rectangle counting in large bipartite graphs,

J. Wang, A. W.-C. Fu, and J. Cheng, “Rectangle counting in large bipartite graphs,” in 2014 IEEE International Congress on Big Data . IEEE, 2014, pp. 17–24

work page 2014
[2]

Measuring and modeling bipartite graphs with community structure,

S. Aksoy, T. G. Kolda, and A. Pinar, “Measuring and modeling bipartite graphs with community structure,” J. Complex Networks , vol. 5, no. 4, pp. 581–603, 2017. [Online]. Available: https: //doi.org/10.1093/comnet/cnx001

work page doi:10.1093/comnet/cnx001 2017
[3]

Cycles and clustering in bipartite networks,

P. G. Lind, M. C. Gonzalez, and H. J. Herrmann, “Cycles and clustering in bipartite networks,” Physical Review E—Statistical, Nonlinear, and Soft Matter Physics , vol. 72, no. 5, p. 056127, 2005

work page 2005
[4]

Small worlds among interlocking direc- tors: Network structure and distance in bipartite graphs,

G. Robins and M. Alexander, “Small worlds among interlocking direc- tors: Network structure and distance in bipartite graphs,” Computational & Mathematical Organization Theory , vol. 10, pp. 69–94, 2004

work page 2004
[5]

Clustering coefficient and community structure of bipartite networks,

P. Zhang, J. Wang, X. Li, M. Li, Z. Di, and Y . Fan, “Clustering coefficient and community structure of bipartite networks,” Physica A: Statistical Mechanics and its Applications , vol. 387, no. 27, pp. 6869– 6875, 2008

work page 2008
[6]

Alleviating the data sparsity problem of recommender systems by clustering nodes in bipartite networks,

F. Zhang, S. Qi, Q. Liu, M. Mao, and A. Zeng, “Alleviating the data sparsity problem of recommender systems by clustering nodes in bipartite networks,” Expert Systems with Applications , vol. 149, p. 113346, 2020

work page 2020
[7]

Bipartite graph partitioning and data clustering,

H. Zha, X. He, C. H. Q. Ding, M. Gu, and H. D. Simon, “Bipartite graph partitioning and data clustering,” in Proceedings of the 2001 ACM CIKM International Conference on Information and Knowledge Management, Atlanta, Georgia, USA, November 5-10, 2001 . ACM, 2001, pp. 25–32

work page 2001
[8]

Optimal bipartite network clustering,

Z. Zhou and A. A. Amini, “Optimal bipartite network clustering,” Journal of Machine Learning Research, vol. 21, no. 40, pp. 1–68, 2020

work page 2020
[9]

Efficient bitruss decomposition for large-scale bipartite graphs,

K. Wang, X. Lin, L. Qin, W. Zhang, and Y . Zhang, “Efficient bitruss decomposition for large-scale bipartite graphs,” in 2020 IEEE 36th International Conference on Data Engineering (ICDE) . IEEE, 2020, pp. 661–672

work page 2020
[10]

Community detection in bipartite network: a modified coarsening approach,

A. Valejo, V . Ferreira, M. C. de Oliveira, and A. de Andrade Lopes, “Community detection in bipartite network: a modified coarsening approach,” in Information Management and Big Data: 4th Annual International Symposium, SIMBig 2017, Lima, Peru, September 4-6, 2017, Revised Selected Papers 4 . Springer, 2018, pp. 123–136

work page 2017
[11]

Searching personalized k- wing in bipartite graphs,

A. Abidi, L. Chen, R. Zhou, and C. Liu, “Searching personalized k- wing in bipartite graphs,” in 2024 IEEE 40th International Conference on Data Engineering (ICDE) . IEEE, 2024, pp. 5727–5728

work page 2024
[12]

Butterfly- core community search over labeled graphs,

Z. Dong, X. Huang, G. Yuan, H. Zhu, and H. Xiong, “Butterfly- core community search over labeled graphs,” arXiv preprint arXiv:2105.08628, 2021

work page arXiv 2021
[13]

Distributed approaches to butterfly analysis on large dynamic bipartite graphs,

T. Weng, X. Zhou, K. Li, K.-L. Tan, and K. Li, “Distributed approaches to butterfly analysis on large dynamic bipartite graphs,” IEEE Transac- tions on Parallel and Distributed Systems , vol. 34, no. 2, pp. 431–445, 2022

work page 2022
[14]

Butterfly counting and bitruss de- composition on uncertain bipartite graphs,

A. Zhou, Y . Wang, and L. Chen, “Butterfly counting and bitruss de- composition on uncertain bipartite graphs,” The VLDB Journal, vol. 32, no. 5, pp. 1013–1036, 2023

work page 2023
[15]

Butterfly counting in bipartite networks,

S.-V . Sanei-Mehri, A. E. Sariyuce, and S. Tirthapura, “Butterfly counting in bipartite networks,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , 2018, pp. 2150–2159

work page 2018
[16]

Efficient load-balanced butterfly counting on gpu,

Q. Xu, F. Zhang, Z. Yao, L. Lu, X. Du, D. Deng, and B. He, “Efficient load-balanced butterfly counting on gpu,” Proceedings of the VLDB Endowment, vol. 15, no. 11, pp. 2450–2462, 2022

work page 2022
[17]

Cohesive subgraph search over big heterogeneous information networks: Applications, challenges, and solutions,

Y . Fang, K. Wang, X. Lin, and W. Zhang, “Cohesive subgraph search over big heterogeneous information networks: Applications, challenges, and solutions,” in Proceedings of the 2021 International Conference on Management of Data , 2021, pp. 2829–2838

work page 2021
[18]

I/o-efficient butterfly counting at scale,

Z. Wang, L. Lai, Y . Liu, B. Shui, C. Tian, and S. Zhong, “I/o-efficient butterfly counting at scale,” Proceedings of the ACM on Management of Data, vol. 1, no. 1, pp. 1–27, 2023

work page 2023
[19]

A hybrid deep learning and modified butterfly optimization based feature selection for transaction credit card fraud detection,

N. Geetha and G. Dheepa, “A hybrid deep learning and modified butterfly optimization based feature selection for transaction credit card fraud detection,” Journal of Positive School Psychology , vol. 6, no. 7, pp. 5328–5345, 2022

work page 2022
[20]

Fraud detection: Comparison of traditional methods, hybrid methods, monarch butterfly optimization, and temporal convo- lutional networks,

P. G. Samanvita, A. Balivada, S. K. Satapathy, B. M. R. Kumar, and M. Bhargavi, “Fraud detection: Comparison of traditional methods, hybrid methods, monarch butterfly optimization, and temporal convo- lutional networks,” in 2024 3rd International Conference on Applied Artificial Intelligence and Computing (ICAAIC) . IEEE, 2024, pp. 110– 116

work page 2024
[21]

Vertex priority based butterfly counting for large-scale bipartite networks

K. Wang, X. Lin, L. Qin, W. Zhang, and Y . Zhang, “Vertex priority based butterfly counting for large-scale bipartite networks.” PVLDB, 2019

work page 2019
[22]

Parallel algorithms for butterfly computations,

J. Shi and J. Shun, “Parallel algorithms for butterfly computations,” in Massive Graph Analytics. Chapman and Hall/CRC, 2022, pp. 287–330

work page 2022
[23]

Families of butterfly counting algorithms for bipartite graphs,

J. A. Acosta, T. M. Low, and D. N. Parikh, “Families of butterfly counting algorithms for bipartite graphs,” in 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) . IEEE, 2022, pp. 304–313

work page 2022
[24]

Parallelization of butterfly counting on hierarchical memory,

Z. Wang, L. Lai, Y . Liu, B. Shui, C. Tian, and S. Zhong, “Parallelization of butterfly counting on hierarchical memory,” The VLDB Journal , pp. 1–32, 2024. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 12

work page 2024
[25]

Gpu-based butterfly counting,

Y . Xia, F. Zhang, Q. Xu, M. Zhang, Z. Yao, L. Lu, X. Du, D. Deng, B. He, and S. Ma, “Gpu-based butterfly counting,” The VLDB Journal , vol. 33, no. 5, pp. 1543–1567, 2024

work page 2024
[26]

Toward accurate butterfly counting with edge privacy preserving in bipartite networks,

M. Wang, H. Jiang, P. Peng, Y . Li, and W. Huang, “Toward accurate butterfly counting with edge privacy preserving in bipartite networks,” in IEEE INFOCOM 2024-IEEE Conference on Computer Communications. IEEE, 2024, pp. 2289–2298

work page 2024
[27]

Butterfly counting over bipartite graphs with local differential privacy,

Y . He, K. Wang, W. Zhang, X. Lin, W. Ni, and Y . Zhang, “Butterfly counting over bipartite graphs with local differential privacy,” in 2024 IEEE 40th International Conference on Data Engineering (ICDE) . IEEE, 2024, pp. 2351–2364

work page 2024
[28]

Fleet: Butterfly estimation from a bipartite graph stream,

S.-V . Sanei-Mehri, Y . Zhang, A. E. Sariy ¨uce, and S. Tirthapura, “Fleet: Butterfly estimation from a bipartite graph stream,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019, pp. 1201–1210

work page 2019
[29]

Approximately counting butterflies in large bipartite graph streams,

R. Li, P. Wang, P. Jia, X. Zhang, J. Zhao, J. Tao, Y . Yuan, and X. Guan, “Approximately counting butterflies in large bipartite graph streams,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 12, pp. 5621–5635, 2021

work page 2021
[30]

sgrapp: Butterfly approximation in streaming graphs,

A. Sheshbolouki and M. T. ¨Ozsu, “sgrapp: Butterfly approximation in streaming graphs,” ACM Trans. Knowl. Discov. Data, vol. 16, no. 4, pp. 76:1–76:43, 2022. [Online]. Available: https://doi.org/10.1145/3495011

work page doi:10.1145/3495011 2022
[31]

Counting butterflies in fully dynamic bipartite graph streams,

S. Papadias, Z. Kaoudi, V . Pandey, J.-A. Quian ´e-Ruiz, and V . Markl, “Counting butterflies in fully dynamic bipartite graph streams,” in 2024 IEEE 40th International Conference on Data Engineering (ICDE) . IEEE, 2024, pp. 2917–2930

work page 2024
[32]

Scalable approximate butterfly and bi-triangle counting for large bipartite networks,

F. Zhang, D. Chen, S. Wang, Y . Yang, and J. Gan, “Scalable approximate butterfly and bi-triangle counting for large bipartite networks,” Proc. ACM Manag. Data , vol. 1, no. 4, pp. 259:1–259:26,

work page
[33]

Available: https://doi.org/10.1145/3626753

[Online]. Available: https://doi.org/10.1145/3626753

work page doi:10.1145/3626753
[34]

Sampling algorithms for butterfly counting on temporal bipartite graphs,

J. Pu, Y . Wang, Y . Li, and X. Zhou, “Sampling algorithms for butterfly counting on temporal bipartite graphs,” arXiv preprint arXiv:2310.11886, 2023

work page arXiv 2023
[35]

Scalable distributed stream join processing,

Q. Lin, B. C. Ooi, Z. Wang, and C. Yu, “Scalable distributed stream join processing,” in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data , 2015, pp. 811–825

work page 2015
[36]

Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage,

P. Wang, Y . Qi, Y . Sun, X. Zhang, J. Tao, and X. Guan, “Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage,” Proceedings of the VLDB Endowment , vol. 11, no. 2, pp. 162–175, 2017

work page 2017
[37]

Furl: Fixed-memory and uncertainty reducing local triangle counting for multigraph streams,

M. Jung, Y . Lim, S. Lee, and U. Kang, “Furl: Fixed-memory and uncertainty reducing local triangle counting for multigraph streams,” Data Mining and Knowledge Discovery , vol. 33, pp. 1225–1253, 2019

work page 2019
[38]

An adaptive query execution system for data integration,

Z. G. Ives, D. Florescu, M. Friedman, A. Levy, and D. S. Weld, “An adaptive query execution system for data integration,” ACM SIGMOD Record, vol. 28, no. 2, pp. 299–310, 1999

work page 1999
[39]

Adaptive duplicate detection using learnable string similarity measures,

M. Bilenko and R. J. Mooney, “Adaptive duplicate detection using learnable string similarity measures,” in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 39–48

work page 2003
[40]

What’s going on? learning communication rules in edge networks,

S. Kandula, R. Chandra, and D. Katabi, “What’s going on? learning communication rules in edge networks,” in Proceedings of the ACM SIGCOMM 2008 conference on Data communication , 2008, pp. 87–98

work page 2008
[41]

The impact of duplicate orders on demand estimation and capacity investment,

M. Armony and E. L. Plambeck, “The impact of duplicate orders on demand estimation and capacity investment,” Management science, vol. 51, no. 10, pp. 1505–1518, 2005

work page 2005
[42]

Exploring duplicate orders in a single-manufacturer multi-distributor supply chain,

S. V . Betancur, G. U. Casta ˜no, and P. Gonc ¸alves, “Exploring duplicate orders in a single-manufacturer multi-distributor supply chain,” in Pro- ceedings of System Dynamics Conference , 2014

work page 2014
[43]

Fable: Approximate butterfly counting in bipartite graph stream with duplicate edges,

G. Sun, Y . Zhao, and Y . Li, “Fable: Approximate butterfly counting in bipartite graph stream with duplicate edges,” in Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024, pp. 2158–2167

work page 2024
[44]

Counting distinct elements in a data stream,

Z. Bar-Yossef, T. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan, “Counting distinct elements in a data stream,” in Randomization and Ap- proximation Techniques, 6th International Workshop, RANDOM 2002, Cambridge, MA, USA, September 13-15, 2002, Proceedings . Springer, 2002, pp. 1–10

work page 2002
[45]

On synopses for distinct-value estimation under multiset operations,

K. Beyer, P. J. Haas, B. Reinwald, Y . Sismanis, and R. Gemulla, “On synopses for distinct-value estimation under multiset operations,” in Proceedings of the 2007 ACM SIGMOD international conference on Management of data , 2007, pp. 199–210

work page 2007
[46]

Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm,

P. Flajolet, ´E. Fusy, O. Gandouet, and F. Meunier, “Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm,” Discrete mathematics & theoretical computer science , no. Proceedings, 2007

work page 2007
[47]

Probabilistic counting algorithms for data base applications,

P. Flajolet and G. N. Martin, “Probabilistic counting algorithms for data base applications,” Journal of computer and system sciences , vol. 31, no. 2, pp. 182–209, 1985

work page 1985
[48]

Streamed approximate counting of distinct elements: Beating optimal batch methods,

D. Ting, “Streamed approximate counting of distinct elements: Beating optimal batch methods,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining , 2014, pp. 442–451

work page 2014
[49]

Efficient temporal butterfly counting and enumeration on temporal bipartite graphs,

X. Cai, X. Ke, K. Wang, L. Chen, T. Zhang, Q. Liu, and Y . Gao, “Efficient temporal butterfly counting and enumeration on temporal bipartite graphs,” Proceedings of the VLDB Endowment , vol. 17, no. 4, pp. 657–670, 2023

work page 2023

[1] [1]

Rectangle counting in large bipartite graphs,

J. Wang, A. W.-C. Fu, and J. Cheng, “Rectangle counting in large bipartite graphs,” in 2014 IEEE International Congress on Big Data . IEEE, 2014, pp. 17–24

work page 2014

[2] [2]

Measuring and modeling bipartite graphs with community structure,

S. Aksoy, T. G. Kolda, and A. Pinar, “Measuring and modeling bipartite graphs with community structure,” J. Complex Networks , vol. 5, no. 4, pp. 581–603, 2017. [Online]. Available: https: //doi.org/10.1093/comnet/cnx001

work page doi:10.1093/comnet/cnx001 2017

[3] [3]

Cycles and clustering in bipartite networks,

P. G. Lind, M. C. Gonzalez, and H. J. Herrmann, “Cycles and clustering in bipartite networks,” Physical Review E—Statistical, Nonlinear, and Soft Matter Physics , vol. 72, no. 5, p. 056127, 2005

work page 2005

[4] [4]

Small worlds among interlocking direc- tors: Network structure and distance in bipartite graphs,

G. Robins and M. Alexander, “Small worlds among interlocking direc- tors: Network structure and distance in bipartite graphs,” Computational & Mathematical Organization Theory , vol. 10, pp. 69–94, 2004

work page 2004

[5] [5]

Clustering coefficient and community structure of bipartite networks,

P. Zhang, J. Wang, X. Li, M. Li, Z. Di, and Y . Fan, “Clustering coefficient and community structure of bipartite networks,” Physica A: Statistical Mechanics and its Applications , vol. 387, no. 27, pp. 6869– 6875, 2008

work page 2008

[6] [6]

Alleviating the data sparsity problem of recommender systems by clustering nodes in bipartite networks,

F. Zhang, S. Qi, Q. Liu, M. Mao, and A. Zeng, “Alleviating the data sparsity problem of recommender systems by clustering nodes in bipartite networks,” Expert Systems with Applications , vol. 149, p. 113346, 2020

work page 2020

[7] [7]

Bipartite graph partitioning and data clustering,

H. Zha, X. He, C. H. Q. Ding, M. Gu, and H. D. Simon, “Bipartite graph partitioning and data clustering,” in Proceedings of the 2001 ACM CIKM International Conference on Information and Knowledge Management, Atlanta, Georgia, USA, November 5-10, 2001 . ACM, 2001, pp. 25–32

work page 2001

[8] [8]

Optimal bipartite network clustering,

Z. Zhou and A. A. Amini, “Optimal bipartite network clustering,” Journal of Machine Learning Research, vol. 21, no. 40, pp. 1–68, 2020

work page 2020

[9] [9]

Efficient bitruss decomposition for large-scale bipartite graphs,

K. Wang, X. Lin, L. Qin, W. Zhang, and Y . Zhang, “Efficient bitruss decomposition for large-scale bipartite graphs,” in 2020 IEEE 36th International Conference on Data Engineering (ICDE) . IEEE, 2020, pp. 661–672

work page 2020

[10] [10]

Community detection in bipartite network: a modified coarsening approach,

A. Valejo, V . Ferreira, M. C. de Oliveira, and A. de Andrade Lopes, “Community detection in bipartite network: a modified coarsening approach,” in Information Management and Big Data: 4th Annual International Symposium, SIMBig 2017, Lima, Peru, September 4-6, 2017, Revised Selected Papers 4 . Springer, 2018, pp. 123–136

work page 2017

[11] [11]

Searching personalized k- wing in bipartite graphs,

A. Abidi, L. Chen, R. Zhou, and C. Liu, “Searching personalized k- wing in bipartite graphs,” in 2024 IEEE 40th International Conference on Data Engineering (ICDE) . IEEE, 2024, pp. 5727–5728

work page 2024

[12] [12]

Butterfly- core community search over labeled graphs,

Z. Dong, X. Huang, G. Yuan, H. Zhu, and H. Xiong, “Butterfly- core community search over labeled graphs,” arXiv preprint arXiv:2105.08628, 2021

work page arXiv 2021

[13] [13]

Distributed approaches to butterfly analysis on large dynamic bipartite graphs,

T. Weng, X. Zhou, K. Li, K.-L. Tan, and K. Li, “Distributed approaches to butterfly analysis on large dynamic bipartite graphs,” IEEE Transac- tions on Parallel and Distributed Systems , vol. 34, no. 2, pp. 431–445, 2022

work page 2022

[14] [14]

Butterfly counting and bitruss de- composition on uncertain bipartite graphs,

A. Zhou, Y . Wang, and L. Chen, “Butterfly counting and bitruss de- composition on uncertain bipartite graphs,” The VLDB Journal, vol. 32, no. 5, pp. 1013–1036, 2023

work page 2023

[15] [15]

Butterfly counting in bipartite networks,

S.-V . Sanei-Mehri, A. E. Sariyuce, and S. Tirthapura, “Butterfly counting in bipartite networks,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , 2018, pp. 2150–2159

work page 2018

[16] [16]

Efficient load-balanced butterfly counting on gpu,

Q. Xu, F. Zhang, Z. Yao, L. Lu, X. Du, D. Deng, and B. He, “Efficient load-balanced butterfly counting on gpu,” Proceedings of the VLDB Endowment, vol. 15, no. 11, pp. 2450–2462, 2022

work page 2022

[17] [17]

Cohesive subgraph search over big heterogeneous information networks: Applications, challenges, and solutions,

Y . Fang, K. Wang, X. Lin, and W. Zhang, “Cohesive subgraph search over big heterogeneous information networks: Applications, challenges, and solutions,” in Proceedings of the 2021 International Conference on Management of Data , 2021, pp. 2829–2838

work page 2021

[18] [18]

I/o-efficient butterfly counting at scale,

Z. Wang, L. Lai, Y . Liu, B. Shui, C. Tian, and S. Zhong, “I/o-efficient butterfly counting at scale,” Proceedings of the ACM on Management of Data, vol. 1, no. 1, pp. 1–27, 2023

work page 2023

[19] [19]

A hybrid deep learning and modified butterfly optimization based feature selection for transaction credit card fraud detection,

N. Geetha and G. Dheepa, “A hybrid deep learning and modified butterfly optimization based feature selection for transaction credit card fraud detection,” Journal of Positive School Psychology , vol. 6, no. 7, pp. 5328–5345, 2022

work page 2022

[20] [20]

Fraud detection: Comparison of traditional methods, hybrid methods, monarch butterfly optimization, and temporal convo- lutional networks,

P. G. Samanvita, A. Balivada, S. K. Satapathy, B. M. R. Kumar, and M. Bhargavi, “Fraud detection: Comparison of traditional methods, hybrid methods, monarch butterfly optimization, and temporal convo- lutional networks,” in 2024 3rd International Conference on Applied Artificial Intelligence and Computing (ICAAIC) . IEEE, 2024, pp. 110– 116

work page 2024

[21] [21]

Vertex priority based butterfly counting for large-scale bipartite networks

K. Wang, X. Lin, L. Qin, W. Zhang, and Y . Zhang, “Vertex priority based butterfly counting for large-scale bipartite networks.” PVLDB, 2019

work page 2019

[22] [22]

Parallel algorithms for butterfly computations,

J. Shi and J. Shun, “Parallel algorithms for butterfly computations,” in Massive Graph Analytics. Chapman and Hall/CRC, 2022, pp. 287–330

work page 2022

[23] [23]

Families of butterfly counting algorithms for bipartite graphs,

J. A. Acosta, T. M. Low, and D. N. Parikh, “Families of butterfly counting algorithms for bipartite graphs,” in 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) . IEEE, 2022, pp. 304–313

work page 2022

[24] [24]

Parallelization of butterfly counting on hierarchical memory,

Z. Wang, L. Lai, Y . Liu, B. Shui, C. Tian, and S. Zhong, “Parallelization of butterfly counting on hierarchical memory,” The VLDB Journal , pp. 1–32, 2024. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 12

work page 2024

[25] [25]

Gpu-based butterfly counting,

Y . Xia, F. Zhang, Q. Xu, M. Zhang, Z. Yao, L. Lu, X. Du, D. Deng, B. He, and S. Ma, “Gpu-based butterfly counting,” The VLDB Journal , vol. 33, no. 5, pp. 1543–1567, 2024

work page 2024

[26] [26]

Toward accurate butterfly counting with edge privacy preserving in bipartite networks,

M. Wang, H. Jiang, P. Peng, Y . Li, and W. Huang, “Toward accurate butterfly counting with edge privacy preserving in bipartite networks,” in IEEE INFOCOM 2024-IEEE Conference on Computer Communications. IEEE, 2024, pp. 2289–2298

work page 2024

[27] [27]

Butterfly counting over bipartite graphs with local differential privacy,

Y . He, K. Wang, W. Zhang, X. Lin, W. Ni, and Y . Zhang, “Butterfly counting over bipartite graphs with local differential privacy,” in 2024 IEEE 40th International Conference on Data Engineering (ICDE) . IEEE, 2024, pp. 2351–2364

work page 2024

[28] [28]

Fleet: Butterfly estimation from a bipartite graph stream,

S.-V . Sanei-Mehri, Y . Zhang, A. E. Sariy ¨uce, and S. Tirthapura, “Fleet: Butterfly estimation from a bipartite graph stream,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019, pp. 1201–1210

work page 2019

[29] [29]

Approximately counting butterflies in large bipartite graph streams,

R. Li, P. Wang, P. Jia, X. Zhang, J. Zhao, J. Tao, Y . Yuan, and X. Guan, “Approximately counting butterflies in large bipartite graph streams,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 12, pp. 5621–5635, 2021

work page 2021

[30] [30]

sgrapp: Butterfly approximation in streaming graphs,

A. Sheshbolouki and M. T. ¨Ozsu, “sgrapp: Butterfly approximation in streaming graphs,” ACM Trans. Knowl. Discov. Data, vol. 16, no. 4, pp. 76:1–76:43, 2022. [Online]. Available: https://doi.org/10.1145/3495011

work page doi:10.1145/3495011 2022

[31] [31]

Counting butterflies in fully dynamic bipartite graph streams,

S. Papadias, Z. Kaoudi, V . Pandey, J.-A. Quian ´e-Ruiz, and V . Markl, “Counting butterflies in fully dynamic bipartite graph streams,” in 2024 IEEE 40th International Conference on Data Engineering (ICDE) . IEEE, 2024, pp. 2917–2930

work page 2024

[32] [32]

Scalable approximate butterfly and bi-triangle counting for large bipartite networks,

F. Zhang, D. Chen, S. Wang, Y . Yang, and J. Gan, “Scalable approximate butterfly and bi-triangle counting for large bipartite networks,” Proc. ACM Manag. Data , vol. 1, no. 4, pp. 259:1–259:26,

work page

[33] [33]

Available: https://doi.org/10.1145/3626753

[Online]. Available: https://doi.org/10.1145/3626753

work page doi:10.1145/3626753

[34] [34]

Sampling algorithms for butterfly counting on temporal bipartite graphs,

J. Pu, Y . Wang, Y . Li, and X. Zhou, “Sampling algorithms for butterfly counting on temporal bipartite graphs,” arXiv preprint arXiv:2310.11886, 2023

work page arXiv 2023

[35] [35]

Scalable distributed stream join processing,

Q. Lin, B. C. Ooi, Z. Wang, and C. Yu, “Scalable distributed stream join processing,” in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data , 2015, pp. 811–825

work page 2015

[36] [36]

Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage,

P. Wang, Y . Qi, Y . Sun, X. Zhang, J. Tao, and X. Guan, “Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage,” Proceedings of the VLDB Endowment , vol. 11, no. 2, pp. 162–175, 2017

work page 2017

[37] [37]

Furl: Fixed-memory and uncertainty reducing local triangle counting for multigraph streams,

M. Jung, Y . Lim, S. Lee, and U. Kang, “Furl: Fixed-memory and uncertainty reducing local triangle counting for multigraph streams,” Data Mining and Knowledge Discovery , vol. 33, pp. 1225–1253, 2019

work page 2019

[38] [38]

An adaptive query execution system for data integration,

Z. G. Ives, D. Florescu, M. Friedman, A. Levy, and D. S. Weld, “An adaptive query execution system for data integration,” ACM SIGMOD Record, vol. 28, no. 2, pp. 299–310, 1999

work page 1999

[39] [39]

Adaptive duplicate detection using learnable string similarity measures,

M. Bilenko and R. J. Mooney, “Adaptive duplicate detection using learnable string similarity measures,” in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 39–48

work page 2003

[40] [40]

What’s going on? learning communication rules in edge networks,

S. Kandula, R. Chandra, and D. Katabi, “What’s going on? learning communication rules in edge networks,” in Proceedings of the ACM SIGCOMM 2008 conference on Data communication , 2008, pp. 87–98

work page 2008

[41] [41]

The impact of duplicate orders on demand estimation and capacity investment,

M. Armony and E. L. Plambeck, “The impact of duplicate orders on demand estimation and capacity investment,” Management science, vol. 51, no. 10, pp. 1505–1518, 2005

work page 2005

[42] [42]

Exploring duplicate orders in a single-manufacturer multi-distributor supply chain,

S. V . Betancur, G. U. Casta ˜no, and P. Gonc ¸alves, “Exploring duplicate orders in a single-manufacturer multi-distributor supply chain,” in Pro- ceedings of System Dynamics Conference , 2014

work page 2014

[43] [43]

Fable: Approximate butterfly counting in bipartite graph stream with duplicate edges,

G. Sun, Y . Zhao, and Y . Li, “Fable: Approximate butterfly counting in bipartite graph stream with duplicate edges,” in Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024, pp. 2158–2167

work page 2024

[44] [44]

Counting distinct elements in a data stream,

Z. Bar-Yossef, T. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan, “Counting distinct elements in a data stream,” in Randomization and Ap- proximation Techniques, 6th International Workshop, RANDOM 2002, Cambridge, MA, USA, September 13-15, 2002, Proceedings . Springer, 2002, pp. 1–10

work page 2002

[45] [45]

On synopses for distinct-value estimation under multiset operations,

K. Beyer, P. J. Haas, B. Reinwald, Y . Sismanis, and R. Gemulla, “On synopses for distinct-value estimation under multiset operations,” in Proceedings of the 2007 ACM SIGMOD international conference on Management of data , 2007, pp. 199–210

work page 2007

[46] [46]

Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm,

P. Flajolet, ´E. Fusy, O. Gandouet, and F. Meunier, “Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm,” Discrete mathematics & theoretical computer science , no. Proceedings, 2007

work page 2007

[47] [47]

Probabilistic counting algorithms for data base applications,

P. Flajolet and G. N. Martin, “Probabilistic counting algorithms for data base applications,” Journal of computer and system sciences , vol. 31, no. 2, pp. 182–209, 1985

work page 1985

[48] [48]

Streamed approximate counting of distinct elements: Beating optimal batch methods,

D. Ting, “Streamed approximate counting of distinct elements: Beating optimal batch methods,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining , 2014, pp. 442–451

work page 2014

[49] [49]

Efficient temporal butterfly counting and enumeration on temporal bipartite graphs,

X. Cai, X. Ke, K. Wang, L. Chen, T. Zhang, Q. Liu, and Y . Gao, “Efficient temporal butterfly counting and enumeration on temporal bipartite graphs,” Proceedings of the VLDB Endowment , vol. 17, no. 4, pp. 657–670, 2023

work page 2023