arxiv: 2604.25400 · v1 · submitted 2026-04-28 · 💻 cs.DS · cs.DB· cs.SI

Recognition: unknown

An Efficient Streaming Algorithm for Approximating Graphlet Distributions

Marco Bressan , T-H. Hubert Chan , Qipeng Kuang , Mauro Sozio

Authors on Pith no claims yet

Pith reviewed 2026-05-07 15:17 UTC · model grok-4.3

classification 💻 cs.DS cs.DBcs.SI

keywords streaming algorithmsgraphlet samplingk-graphletsdistribution approximationgraph algorithmssublinear memorymulti-pass streaming

0 comments

The pith

A streaming algorithm approximates k-graphlet distributions in O(1/c) passes using Õ(n^{1+c}) memory for any fixed c>0.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a streaming algorithm that samples induced k-vertex subgraphs without loading the full graph into memory. For any fixed positive constant c, it requires only O(1/c) passes over the input and Õ(n^{1+c}) memory, improving on an earlier O(log n)-pass method that used O(n log n) memory. The sampled graphlets then support approximation of the overall k-graphlet frequency distribution. If the claims hold, analysts can estimate subgraph statistics on graphs far larger than main memory while controlling the number of sequential data accesses.

Core claim

We give a streaming algorithm that, for any fixed c>0, makes O(1/c) passes using Õ(n^{1+c}) memory to sample k-graphlets. We use this sampler to approximate the distribution of k-graphlets. The construction is optimal up to an Õ(n^c) factor in memory usage by a known lower bound.

What carries the argument

The multi-pass streaming sampling procedure that produces unbiased k-graphlet samples while achieving a tunable tradeoff between number of passes and memory.

If this is right

Graphlet distribution approximation becomes feasible for graphs that exceed main memory capacity.
The pass count can be driven to any small constant while memory remains sub-quadratic in n.
External-memory analysis of large networks incurs only a constant number of sequential scans.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sampling trade-off might extend to counting or sampling other fixed-size induced subgraphs in streams.
On graphs of moderate density the reduction in passes could yield large practical speedups compared with the logarithmic-pass baseline.
Adjusting the exponent 1+c to depend explicitly on k would be a natural next parameter to optimize.

Load-bearing premise

The sampling procedure produces unbiased estimates sufficient for distribution approximation and the prior lower bound applies directly to this procedure.

What would settle it

A family of graphs on which the algorithm either requires more than O(1/c) passes, uses more than Õ(n^{1+c}) memory, or yields distribution estimates whose total variation distance to the true distribution exceeds the claimed bound.

Figures

Figures reproduced from arXiv: 2604.25400 by Marco Bressan, Mauro Sozio, Qipeng Kuang, T-H. Hubert Chan.

**Figure 1.** Figure 1: A toy graph (left) and its 4-graphlet distribution view at source ↗

**Figure 2.** Figure 2: Computation of a 1 1.5 -DD ordering of a toy graph, by iterative removal of a high-degree vertex. The resulting 1 1.5 -DD ordering is 1 ≺ 3 ≺ 2 ≺ 5 ≺ 6 ≺ 4. This applies in particular to H = G(v), so we may write d(u|G(v)). We denote by ∆ the maximum degree of G. The smallest value for which the problem is non-trivial is k ≥ 3. A k-graphlet of G is an induced, connected, k-vertex subgraph of G. We denote b… view at source ↗

**Figure 3.** Figure 3: The relation between the true degree dv and the estimated degree ˆdv. The area of bad events is filled in gray. The blue triangle area indicates that if v is not removed in Select-Large, we have dv ≤ ∆ 1+ ǫ 2 . The orange rectangle area indicates that if dv < ∆ K , v must remain after Select-Large. Therefore, If no type II bad event happens within the first T iterations, any vertex that survives the scan a… view at source ↗

**Figure 4.** Figure 4: Number of passes and peak memory usage, as a functio view at source ↗

**Figure 4.** Figure 4: (Continued) Number of passes and peak memory usage view at source ↗

**Figure 5.** Figure 5: Number of passes on graphs of increasing densities view at source ↗

**Figure 6.** Figure 6: Empirical distribution of ǫv under different c, using ApproxDD-ES. 5.3 Computing the Graphlet Distribution We test the performance of our full algorithm for approximating the k-graphlet distribution, ApproxDD-ES +Counter (Algorithm 7), and of its competitor ApproxDD +Counter (the algorithm of [8]). 22 view at source ↗

**Figure 7.** Figure 7: L∞ distance between the ground-truth and the estimated k-graphlet distribution as a function of the number of passes for sampling. The X-axis shows the number of passes (preprocessing on the left, sampling on the right). Missing points for ApproxDD mean it did not terminate within 36 hours. 24 view at source ↗

**Figure 7.** Figure 7: (Continued) L∞ distance between the ground-truth and the estimated k-graphlet distribution as a function of the number of passes for sampling. The X-axis shows the number of passes (preprocessing on the left, sampling on the right). Missing points for ApproxDD mean it did not terminate within 36 hours. 25 view at source ↗

**Figure 8.** Figure 8: Empirical distribution of ǫv under different c, using ApproxDD-ES. 30 view at source ↗

**Figure 9.** Figure 9: L∞ distance between the ground-truth and the estimated k-graphlet distribution as a function of the number of passes for sampling. The X-axis shows the number of passes (preprocessing on the left, sampling on the right). Missing points for ApproxDD mean it did not terminate within 36 hours. 31 view at source ↗

**Figure 9.** Figure 9: (Continued) L∞ distance between the ground-truth and the estimated k-graphlet distribution as a function of the number of passes for sampling. The X-axis shows the number of passes (preprocessing on the left, sampling on the right). Missing points for ApproxDD mean it did not terminate within 36 hours. 32 view at source ↗

read the original abstract

In recent years, the problem of computing the frequencies of the induced $k$-vertex subgraphs of a graph, or \emph{$k$-graphlets}, has become central. One approach for this problem is to sample $k$-graphlets randomly. Classic algorithms for $k$-graphlet sampling require loading the entire graph into main memory, making them impractical for massive graphs. To bypass this limitation, Bourreau et al. (NeurIPS 2024) introduced a \emph{streaming} algorithm that through nontrivial techniques makes only $O(\log n)$ passes using $O(n \log n)$ memory. In this work we break their $O(\log n)$-pass bound by giving an algorithm that, for any fixed $c>0$, makes $O(1/c)$ passes using $\tilde O(n^{1+c})$ memory. As a consequence of their lower bound, our algorithm is optimal up to a factor of $\tilde{O}(n^c)$ in the memory usage. We use this sampling algorithm to obtain an efficient method of approximating $k$-graphlet distributions. Experiments on real-world and synthetic graphs show that our algorithm is always at least as good as the one of Bourreau et al., and outperforms it by orders of magnitude on mildly dense graphs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a constant-pass streaming k-graphlet sampler that cuts passes from O(log n) to O(1/c) at the price of Õ(n^{1+c}) memory and backs it with experiments showing practical gains.

read the letter

The key advance is a multi-pass streaming procedure that samples k-graphlets in O(1/c) passes using Õ(n^{1+c}) memory for fixed c > 0, then feeds the samples into a distribution approximation. This directly improves the pass count over Bourreau et al. while staying within polynomial memory. The experiments on real and synthetic graphs are straightforward and show the new method matches or beats the baseline, with clear wins on mildly dense instances where prior streaming approaches slow down. That combination of a better theoretical trade-off and usable runtime numbers is the main practical takeaway. The citation to the NeurIPS lower bound is handled cleanly to support the near-optimality claim up to the Õ(n^c) factor. The sampling analysis appears to preserve the probabilities needed for unbiased frequency estimates, which is what lets the distribution approximation go through. One soft spot is that the optimality statement still hinges on the prior lower-bound construction carrying over without change when the pass count drops to a constant; the paper needs to make that transfer explicit rather than implicit. The experimental section is light on variance or error-bar details, so it is harder to judge how tightly the observed speedups match the theoretical guarantees. Overall the work is for people who build or use streaming tools for large-graph motif analysis. A reader who needs fewer passes on graphs that do not fit in RAM will find the result directly usable. It deserves a serious referee because the algorithmic step is non-routine and the empirical comparison is present, even if the proofs and lower-bound applicability need careful checking.

Referee Report

2 major / 1 minor

Summary. The paper presents a new streaming algorithm for sampling k-graphlets that, for any fixed c>0, uses O(1/c) passes and Õ(n^{1+c}) memory. This improves on Bourreau et al.'s O(log n)-pass, O(n log n)-memory algorithm. The new sampler is used to approximate k-graphlet distributions, with a claim of near-optimality (up to Õ(n^c) in memory) via the prior lower bound, and experiments on real/synthetic graphs showing it matches or outperforms the baseline, especially on mildly dense graphs.

Significance. If the sampling procedure is unbiased and the lower bound applies in the constant-pass regime, the result gives a practical constant-pass tradeoff for graphlet sampling on massive graphs and tightens the pass-memory frontier for this problem. The experimental comparison adds evidence of utility, but the primary value is the theoretical improvement in passes.

major comments (2)

[Abstract] Abstract: The optimality claim ('our algorithm is optimal up to a factor of Õ(n^c) in the memory usage') rests on Bourreau et al.'s lower bound carrying over unchanged to the O(1/c)-pass regime. The abstract provides no argument showing that the lower-bound construction (originally stated for O(log n) passes) rules out better constant-pass algorithms or that the new multi-pass sampling procedure inherits the same hardness.
[Abstract] Abstract: The claim that the algorithm yields an 'efficient method of approximating k-graphlet distributions' requires that the produced samples give unbiased (or sufficiently low-bias) estimates of the induced k-graphlet frequencies. No proof sketch is supplied showing that the new sampling distribution matches the uniform distribution over k-graphlets needed for unbiased frequency estimation.

minor comments (1)

[Abstract] Abstract: The range of k for which the algorithm and bounds hold is not stated; this should be clarified in the introduction or theorem statements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for recognizing the practical and theoretical value of our constant-pass streaming algorithm. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The optimality claim ('our algorithm is optimal up to a factor of Õ(n^c) in the memory usage') rests on Bourreau et al.'s lower bound carrying over unchanged to the O(1/c)-pass regime. The abstract provides no argument showing that the lower-bound construction (originally stated for O(log n) passes) rules out better constant-pass algorithms or that the new multi-pass sampling procedure inherits the same hardness.

Authors: We agree that the abstract would be strengthened by an explicit justification. The lower bound in Bourreau et al. is obtained via a communication-complexity reduction on a family of hard instances whose communication cost cannot be reduced below Ω(n) bits even with any constant number of passes; their O(log n) upper bound arises only from their particular recursive sampling technique. Our O(1/c)-pass algorithm matches this bound up to the Õ(n^c) factor for any fixed c>0. We will add a concise sentence to the abstract and a short paragraph in the introduction (or a new remark in Section 2) that cites the relevant lemma from Bourreau et al. and explains why the hardness persists in the constant-pass regime. revision: yes
Referee: [Abstract] Abstract: The claim that the algorithm yields an 'efficient method of approximating k-graphlet distributions' requires that the produced samples give unbiased (or sufficiently low-bias) estimates of the induced k-graphlet frequencies. No proof sketch is supplied showing that the new sampling distribution matches the uniform distribution over k-graphlets needed for unbiased frequency estimation.

Authors: The referee correctly notes that the abstract omits a proof sketch. In the full manuscript, Section 3 presents the multi-pass sampler and proves (via induction on the number of passes and careful accounting of the reservoir sampling probabilities) that each k-graphlet is sampled with probability (1±ε)/|G_k| where G_k is the set of all k-graphlets; Section 4 then shows that the resulting Monte-Carlo estimator for the graphlet frequency vector is unbiased up to the (1+ε) multiplicative error and gives variance bounds that yield the stated approximation guarantees. We will insert a one-sentence outline of this argument into the abstract and add a short proof sketch (or a pointer to the relevant theorem) in the introduction. revision: yes

Circularity Check

0 steps flagged

No circularity; new algorithm and optimality rest on external lower bound

full rationale

The paper introduces a new multi-pass streaming sampler for k-graphlet distributions that improves the pass complexity of Bourreau et al. (NeurIPS 2024) from O(log n) to O(1/c). Optimality up to Õ(n^c) memory is explicitly derived from the lower bound in that external prior work rather than from any self-referential construction, fitted parameter, or self-citation chain. No equations reduce a claimed prediction to its own inputs by definition, no ansatz is smuggled via self-citation, and the sampling procedure is presented as a novel construction whose unbiasedness is asserted independently of the lower-bound citation. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions of the graph streaming model and the applicability of the cited lower bound; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The input graph is presented as a stream of edges in the standard one-way or multi-pass streaming model.
Fundamental to any streaming algorithm that avoids full memory storage.

pith-pipeline@v0.9.0 · 5542 in / 1200 out tokens · 88050 ms · 2026-05-07T15:17:03.827102+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references

[1]

Mixing time bounds for graphlet random walks

Matteo Agostini, Marco Bressan, and Shahrzad Haddadan. Mixing time bounds for graphlet random walks. Information Processing Letters , 152:105851, 2019

2019
[2]

Ahmed, Nick G

Nesreen K. Ahmed, Nick G. Duﬃeld, Jennifer Neville, and R amana Rao Kompella. Graph sample and hold: a framework for big-graph analytics. In KDD, pages 1446–1455. ACM, 2014

2014
[3]

D ensest subgraph in streaming and MapReduce

Bahman Bahmani, Ravi Kumar, and Sergei Vassilvitskii. D ensest subgraph in streaming and MapReduce. Proceedings of the VLDB Endowment , 5(5):454–465, January 2012

2012
[4]

Sivakumar

Ziv Bar-Yossef, Ravi Kumar, and D. Sivakumar. Reduction s in streaming algorithms, with an application to counting triangles in graphs. In SODA, pages 623–632. ACM/SIAM, 2002

2002
[5]

Eﬃcient semi-streaming algorithms for local triangle counting in massive graphs

Luca Becchetti, Paolo Boldi, Carlos Castillo, and Arist ides Gionis. Eﬃcient semi-streaming algorithms for local triangle counting in massive graphs. I n Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining , pages 16–24, Las Vegas Nevada USA, August 2008. ACM

2008
[6]

Bera, Lior Gishboliner, Yevgeny Levanzov, C

Suman K. Bera, Lior Gishboliner, Yevgeny Levanzov, C. Se shadhri, and Asaf Shapira. Counting subgraphs in degenerate graphs. J. ACM , 69(3):23:1–23:21, 2022

2022
[7]

Bhuiyan, Mahmudur Rahman, Mahmuda Rahman, a nd Mohammad Al Hasan

Mansurul A. Bhuiyan, Mahmudur Rahman, Mahmuda Rahman, a nd Mohammad Al Hasan. Guise: Uniform sampling of graphlets for large graph analys is. In Proc. of IEEE ICDM 2012 , pages 91–100, 2012. 26

2012
[8]

Hubert Chan, Qipeng Kuang, and Mauro Sozio

Yann Bourreau, Marco Bressan, T.-H. Hubert Chan, Qipeng Kuang, and Mauro Sozio. Eﬃcient streaming algorithms for graphlet sampling. In NeurIPS, 2024

2024
[9]

Eﬃcient and near-optimal algorithms for sampling connected subgraphs

Marco Bressan. Eﬃcient and near-optimal algorithms for sampling connected subgraphs. In Proc. of ACM STOC , page 1132–1143, 2021

2021
[10]

Eﬃcient and near-optimal algorithms fo r sampling connected subgraphs, 2021

Marco Bressan. Eﬃcient and near-optimal algorithms fo r sampling connected subgraphs, 2021

2021
[11]

Faster algorithms for counting subgrap hs in sparse graphs

Marco Bressan. Faster algorithms for counting subgrap hs in sparse graphs. Algorithmica, 2021

2021
[12]

Eﬃcient and near-optimal algorithms fo r sampling small connected subgraphs

Marco Bressan. Eﬃcient and near-optimal algorithms fo r sampling small connected subgraphs. ACM Trans. Algorithms , 19(3), jun 2023

2023
[13]

Counting graphlets: Space vs time

Marco Bressan, Flavio Chierichetti, Ravi Kumar, Stefa no Leucci, and Alessandro Panconesi. Counting graphlets: Space vs time. In Proc. of ACM WSDM , pages 557–566, 2017

2017
[14]

Motif counting beyond ﬁve nodes

Marco Bressan, Flavio Chierichetti, Ravi Kumar, Stefa no Leucci, and Alessandro Panconesi. Motif counting beyond ﬁve nodes. ACM Trans. Knowl. Discov. Data , 12(4), April 2018

2018
[15]

Motivo: Fast motif counting via succinct color coding and adaptive sampling

Marco Bressan, Stefano Leucci, and Alessandro Pancone si. Motivo: Fast motif counting via succinct color coding and adaptive sampling. Proc. VLDB Endow. , 12(11):1651–1663, July 2019

2019
[16]

Faster motif counting via succinct color coding and adaptive sampling

Marco Bressan, Stefano Leucci, and Alessandro Pancone si. Faster motif counting via succinct color coding and adaptive sampling. ACM Trans. Knowl. Discov. Data , 15(6), May 2021

2021
[17]

Krishna Gummadi

Meeyoung Cha, Hamed Haddadi, Fabr ´ ıcio Benevenuto, and P. Krishna Gummadi. Measuring user inﬂuence in twitter: The million follower fallacy. In ICWSM. The AAAI Press, 2010

2010
[18]

Kanj, and Ge Xia

Jianer Chen, Xiuzhen Huang, Iyad A. Kanj, and Ge Xia. Str ong computational lower bounds via parameterized complexity. Journal of Computer and System Sciences , 72(8):1346 – 1367, 2006

2006
[19]

Xiaowei Chen, Yongkun Li, Pinghui Wang, and John C. S. Lu i. A general framework for estimating graphlet statistics via random walk. Proc. VLDB Endow., 10(3):253–264, November 2016

2016
[20]

Arboricity and su bgraph listing algorithms

Norishige Chiba and Takao Nishizeki. Arboricity and su bgraph listing algorithms. SIAM J. Comput., 14(1):210–223, 1985

1985
[21]

On graph problems in a semi-streaming model

Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Sid dharth Suri, and Jian Zhang. On graph problems in a semi-streaming model. Theoretical Computer Science , 348(2-3):207–216, December 2005

2005
[22]

Improved parallel algorithms for density-based network clustering

Mohsen Ghaﬀari, Silvio Lattanzi, and Slobodan Mitrovic . Improved parallel algorithms for density-based network clustering. In ICML, volume 97 of Proceedings of Machine Learning Research, pages 2201–2210. PMLR, 2019

2019
[23]

An eﬃcient framework for clustered federated learning

Avishek Ghosh, Jichan Chung, Dong Yin, and Kannan Ramch andran. An eﬃcient framework for clustered federated learning. IEEE Trans. Inf. Theory , 68(12):8076–8091, 2022

2022
[24]

Waddling random walk: Fast a nd accurate mining of motif statistics in large graphs

Guyue Han and Harish Sethu. Waddling random walk: Fast a nd accurate mining of motif statistics in large graphs. In Proc. of IEEE ICDM , pages 181–190, 2016. 27

2016
[25]

A generalizati on of sampling without replacement from a ﬁnite universe

Daniel G Horvitz and Donovan J Thompson. A generalizati on of sampling without replacement from a ﬁnite universe. Journal of the American statistical Association , 47(260):663–685, 1952

1952
[26]

The parameterised complex ity of counting connected subgraphs and graph motifs

Mark Jerrum and Kitty Meeks. The parameterised complex ity of counting connected subgraphs and graph motifs. J. Comput. Syst. Sci. , 81(4):702–716, 2015

2015
[27]

Seshadhri, and Ali Pinar

Madhav Jha, C. Seshadhri, and Ali Pinar. A space-eﬃcien t streaming algorithm for estimating transitivity and triangle counts using the birthday parado x. ACM Trans. Knowl. Discov. Data , 9(3):15:1–15:21, 2015

2015
[28]

Kane, Kurt Mehlhorn, Thomas Sauerwald, and He Sun

Daniel M. Kane, Kurt Mehlhorn, Thomas Sauerwald, and He Sun. Counting arbitrary subgraphs in data streams. In ICALP (2) , volume 7392 of Lecture Notes in Computer Science , pages 598–609. Springer, 2012

2012
[29]

KONECT: the koblenz network collection

J´ erˆ ome Kunegis. KONECT: the koblenz network collection. In WWW (Companion Volume) , pages 1343–1350. International World Wide Web Conferences Steering Committee / ACM, 2013

2013
[30]

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue B. Moo n. What is twitter, a social network or a news media? In WWW, pages 591–600. ACM, 2010

2010
[31]

Yongsub Lim, Minsoo Jung, and U. Kang. Memory-Eﬃcient a nd Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Mu ltigraphs. ACM Transactions on Knowledge Discovery from Data , 12(1):1–28, February 2018

2018
[32]

Improved mixing ti me for k-subgraph sampling

Ryuta Matsuno and Aristides Gionis. Improved mixing ti me for k-subgraph sampling. In Proc. of SIAM SDM , pages 568–576, 2020

2020
[33]

Matula and Leland L

David W. Matula and Leland L. Beck. Smallest-last order ing and clustering and graph coloring algorithms. J. ACM , 30(3):417–427, July 1983

1983
[34]

R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklo vskii, and U. Alon. Network motifs: Simple building blocks of complex networks. Science, 298(5594):824–827, 2002

2002
[35]

Estimating graphlet statistics via lifting

Kirill Paramonov, Dmitry Shemetov, and James Sharpnac k. Estimating graphlet statistics via lifting. In Proc. of ACM KDD , page 587–595, 2019

2019
[36]

Pavan, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu

A. Pavan, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. Counting and sampling triangles from a graph stream. Proceedings of the VLDB Endowment , 6(14):1870– 1881, September 2013

2013
[37]

Motif- matching based subgraph-level attentional convolutional network for graph classiﬁcation

Hao Peng, Jianxin Li, Qiran Gong, Yuanxin Ning, Senzhan g Wang, and Lifang He. Motif- matching based subgraph-level attentional convolutional network for graph classiﬁcation. Proc. of AAAI , 34(04):5387–5394, Apr. 2020

2020
[38]

Finding network motifs using mcmc sampling

Tanay Kumar Saha and Mohammad Al Hasan. Finding network motifs using mcmc sampling. In Proc. of CompleNet , pages 13–24, 2015

2015
[39]

C ¸ ataly¨ urek

Ahmet Erdem Sar ´ ıy¨ uce, Bu˘ gra Gedik, Gabriela Jacques-Silva, Kun-Lung Wu, and ¨Umit V. C ¸ ataly¨ urek. Streaming algorithms for k-core decomposit ion. Proceedings of the VLDB Endowment, 6(6):433–444, April 2013. 28

2013
[40]

Nino Shervashidze, S. V. N. Vishwanathan, Tobias Petri , Kurt Mehlhorn, and Karsten M. Borgwardt. Eﬃcient graphlet kernels for large graph compar ison. In AISTATS, volume 5 of JMLR Proceedings, pages 488–495. JMLR.org, 2009

2009
[41]

Triest: Counting local and global triangles in fully dynamic streams with ﬁxe d memory size

Lorenzo De Stefani, Alessandro Epasto, Matteo Riondat o, and Eli Upfal. Triest: Counting local and global triangles in fully dynamic streams with ﬁxe d memory size. ACM TKDD , 11(4):43, 2017

2017
[42]

Tiere d sampling: An eﬃcient method for approximate counting sparse motifs in massive graph stream s

Lorenzo De Stefani, Erisa Terolli, and Eli Upfal. Tiere d sampling: An eﬃcient method for approximate counting sparse motifs in massive graph stream s. In IEEE BigData , pages 776–
[43]

IEEE Computer Society, 2017

2017
[44]

Kun Tu, Jian Li, Don Towsley, Dave Braines, and Liam D. Tu rner. Gl2vec: Learning feature representation using graphlets for directed networks. In Proc. of IEEE/ACM ASONAM , page 216–221, 2019

2019
[45]

Pinghui Wang, John C. S. Lui, Bruno Ribeiro, Don Towsley , Junzhou Zhao, and Xiaohong Guan. Eﬃciently estimating motif statistics of large netwo rks. ACM TKDD , 9(2):8:1–8:27, 2014. 29 A Additional Figures Figures 8 and 9 show the ﬁgures omitted from the experiments i n Section 5. 0 0.004) 0.004 0.00 ) 0.00 0.01) 0.01 0.04) 0.04 0.0 ) ...

2014
[46]

ounter ApproxDD-ES

Ó ) Ñ 0.Ó Ò 1) Ñ 1Ò 4) Ñ 4Ò Ó ) Ñ Ó Ò 10) Ñ 10Ò 100) Ñ 100 Ò 1000) Ô 0.0 0.1 0.2 0.3 0.4 0.5 0. Õ 0.Ó Ö re × uen Ø y Ù Ú 0.01 Ù Ú 0.05 Ù Ú 0.1 Ù Ú 0.2 Ù Ú 0.3 Ù Ú 0.4 Ù Ú 0.5 ApproxDD (g) ER-5 Figure 8: Empirical distribution of ǫv under diﬀerent c, using ApproxDD-ES. 30 Û Ü 22 0 15 30 45 Ý 0 Û 5 Num of Passes ( eft Prepro essing ight Sampling) 0.000 0.00...