Pruned Landmark Labeling Meets Vertex Centric Computation: A Surprisingly Happy Marriage!
Pith reviewed 2026-05-25 13:49 UTC · model grok-4.3
The pith
BVC-PLL adapts pruned landmark labeling to vertex-centric computation and achieves lower computational and memory costs than the original sequential PLL under a reasonable assumption.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By replacing the sequential traversal of PLL with a vertex-centric computation model and introducing batch execution, BVC-PLL produces exactly the same landmark labels yet exhibits strictly lower computational and memory-access costs than the original PLL under a reasonable assumption, and therefore can serve as a faster sequential algorithm while also admitting efficient parallel execution.
What carries the argument
The batch execution model inside the vertex-centric computation of pruned landmark labeling, which groups vertex updates to remove redundant distance and label checks.
If this is right
- BVC-PLL extends without modification to directed graphs and to weighted graphs while preserving the same correctness and cost advantages.
- The batch vertex-centric formulation admits direct use of hierarchical parallelism on modern multicore architectures with SIMD units.
- The sequential BVC-PLL implementation runs more than twice as fast as the original PLL on real-world graphs.
- The parallel version of BVC-PLL demonstrates both high efficiency and good scalability as the number of cores increases.
Where Pith is reading between the lines
- The same batching technique could reduce redundant work in other label-propagation or distance-oracle algorithms that currently rely on sequential traversals.
- Lower memory-access costs may translate into better cache behavior on graphs too large to fit in main memory, allowing single-machine processing of larger instances.
- If the reasonable assumption turns out to hold for the majority of real-world graphs, practitioners could replace existing PLL implementations with BVC-PLL for both sequential and parallel use without changing query results.
Load-bearing premise
An unspecified reasonable assumption must hold for the claimed reduction in computational and memory-access costs of BVC-PLL relative to PLL.
What would settle it
Any concrete graph on which the total number of distance comparisons or random memory accesses performed by BVC-PLL exceeds the corresponding count for the original sequential PLL.
Figures
read the original abstract
In this paper, we study how the Pruned Landmark Labeling (PPL) algorithm can be parallelized in a scalable fashion, producing the same results as the sequential algorithm. More specifically, we parallelize using a Vertex-Centric (VC) computational model on a modern SIMD powered multicore architecture. We design a new VC-PLL algorithm that resolves the apparent mismatch between the inherent sequential dependence of the PLL algorithm and the Vertex- Centric (VC) computing model. Furthermore, we introduce a novel batch execution model for VC computation and the BVC-PLL algorithm to reduce the computational inefficiency in VC-PLL. Quite surprisingly, the theoretical analysis reveals that under a reasonable assumption, BVC-PLL has lower computational and memory access costs than PLL and indicates it may run faster than PLL as a sequential algorithm. We also demonstrate how BVC-PLL algorithm can be extended to handle directed graphs and weighted graphs and how it can utilize the hierarchical parallelism on a modern parallel computing architecture. Extensive experiments on real-world graphs not only show the sequential BVC-PLL can run more than two times faster than the original PLL, but also demonstrates its parallel efficiency and scalability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents VC-PLL, a vertex-centric reformulation of the sequential Pruned Landmark Labeling (PLL) algorithm for reachability queries that is claimed to produce identical results, and BVC-PLL, a batched variant. Under an unspecified 'reasonable assumption,' BVC-PLL is shown to have strictly lower computational and memory-access costs than PLL (potentially enabling faster sequential execution). The work extends the approach to directed and weighted graphs, exploits hierarchical parallelism, and reports >2× sequential speedup plus good parallel scalability on real-world graphs.
Significance. If the cost bounds hold under the stated assumption and the observed speedups are attributable to the algorithmic redesign rather than implementation details, the result would improve the practical construction of landmark labels for large-scale reachability queries on multicore hardware while preserving exactness.
major comments (2)
- [Abstract / theoretical analysis] Abstract and the theoretical cost analysis (likely §4 or §5): the central claim that 'under a reasonable assumption, BVC-PLL has lower computational and memory access costs than PLL' is load-bearing for both the sequential speedup assertion and the motivation for BVC-PLL. The assumption must be stated explicitly, shown to hold on the evaluated graphs (or on the graph families where the 2× speedup is reported), and the cost inequality must be derived without circularity. If the assumption encodes a property (e.g., label-growth rate or access pattern) that fails on the test instances, the theoretical advantage disappears and the experimental result requires a different explanation.
- [Experiments] Experimental section (likely §6): the reported >2× sequential speedup of BVC-PLL over PLL must be accompanied by controls that isolate the effect of the batch execution model from other implementation choices (SIMD tuning, memory layout, etc.). Without such controls, it is unclear whether the observed runtime improvement validates the cost analysis or arises from orthogonal engineering factors.
minor comments (2)
- [Algorithm description] Notation for the batch size parameter and the precise definition of 'batch execution model' should be introduced earlier and used consistently when comparing VC-PLL and BVC-PLL.
- [VC-PLL definition] The paper should clarify whether the vertex-centric formulation preserves the exact pruning behavior of the original PLL or only the final label sets; a short proof sketch or invariant would strengthen the 'identical results' claim.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the two major comments point-by-point below and commit to revisions that strengthen the presentation of the theoretical claims and experimental validation.
read point-by-point responses
-
Referee: [Abstract / theoretical analysis] Abstract and the theoretical cost analysis (likely §4 or §5): the central claim that 'under a reasonable assumption, BVC-PLL has lower computational and memory access costs than PLL' is load-bearing for both the sequential speedup assertion and the motivation for BVC-PLL. The assumption must be stated explicitly, shown to hold on the evaluated graphs (or on the graph families where the 2× speedup is reported), and the cost inequality must be derived without circularity. If the assumption encodes a property (e.g., label-growth rate or access pattern) that fails on the test instances, the theoretical advantage disappears and the experimental result requires a different explanation.
Authors: We agree that the assumption and its verification are central and currently underspecified. In the revised manuscript we will (i) state the assumption explicitly and formally in the cost-analysis section, (ii) provide a self-contained derivation of the strict inequality that avoids any circular reference to experimental outcomes, and (iii) add a short subsection (or appendix table) that checks the assumption on every graph used for the reported speedups. If the assumption is shown to hold, the theoretical advantage is directly linked to the observed results; if any graph violates it, we will note the discrepancy and discuss alternative explanations for the runtime improvement. revision: yes
-
Referee: [Experiments] Experimental section (likely §6): the reported >2× sequential speedup of BVC-PLL over PLL must be accompanied by controls that isolate the effect of the batch execution model from other implementation choices (SIMD tuning, memory layout, etc.). Without such controls, it is unclear whether the observed runtime improvement validates the cost analysis or arises from orthogonal engineering factors.
Authors: We accept that the current experiments do not fully isolate the batch-execution contribution. In the revision we will add an ablation-style comparison (or additional micro-benchmark) in which the only systematic difference between the two implementations is the batch versus non-batch vertex-centric schedule, while memory layout, SIMD intrinsics, and compiler flags remain identical. The results of this controlled comparison will be reported alongside the existing end-to-end timings. revision: yes
Circularity Check
No circularity; cost comparison rests on external assumption, not self-definition or fitted inputs
full rationale
The paper introduces BVC-PLL via vertex-centric redesign and batch execution, then invokes an external 'reasonable assumption' to claim strictly lower computational and memory-access costs versus PLL. No quoted equations reduce the claimed inequality to a fitted parameter, self-citation chain, or definitional renaming. The assumption is presented as an independent premise whose validity is left for verification outside the derivation; the algorithmic redesign itself contains independent content. This is the common honest case of a self-contained proposal whose central claim is falsifiable on real graphs rather than forced by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The vertex-centric model can be adapted to PLL while exactly preserving the same labeling results as the sequential algorithm.
Reference graph
Works this paper leans on
-
[1]
the Scatter function, which describes how each vertex uses its vertex value and edge value to propagate a message to its neighbors; and 2) Gather function, which describes how each vertex computes a new value based on its original value and all the new messages it received. Each phase can traverse in parallel their corresponding vertex sets: ActiveVertice...
-
[2]
BASIC VERTEXCENTRIC ALGORITHM Recall that the PLL algorithm (Alg. 1) iterates following the vertex rank (order): at the ith iteration, the vertex u with rank π(u) = i will be distributed to all other vertices in the graph using a BFS process. The key condition to add u into the label of v, L(v), is the distance check for the canonical labeling criterion:...
-
[3]
Scatter phase (Lines 3 to 7, also referred to as the push model): all active vertices with new labels perform 4 Algorithm 3 VC-PLL for G = ( V, E) with Order π {Init.: ( L(v): label; δL(v): new label from each iteration)} 1: ActiveV ertices← V ;∀ v∈ V, δL (v)←{ (v, 0)}, L(v)← δL(v) 2: while ActiveV ertices̸=∅ do {Scatter Phase:} 3: for all a∈ ActiveVertic...
-
[4]
Gather phase (Line 8-16): all vertices that receive a new message ( v.messages ̸= ∅) perform a vertex Gather function (Lines 9-15): For a vertex v, it traverses all its re- ceived messages (distance label from its neighbors), and fo r each unique vertex (u, d(u, v)) across the set of messages, it confirms the distance check for the canonical labeling crite...
-
[5]
BATCHED VERTEXCENTRIC ALG. Though VC-PLL can be described in a natural Vertex- Centric computational scheme, it also demonstrates certai n limitations of the original vertex-centric model assumpti ons:
-
[6]
Typically, the vertex value (and message) is fixed in VC, whereas in VC-PLL, each vertex value (and message) is a continuously growing list (or set); 2) In the Gather functio n, the computation needs remote memory access for checking distance conditions (Line 11 in Algorithm 3): in most cases, u is not a neighbor of v, and when we use L(u) for distance che...
-
[7]
can help provide almost O(1) pruning. Since the number of negative checks is the same for PLL and BVC-PLL, we expect their overall cost will be fairly close to each other. Putting It T ogether: Assuming that PLL and BVC-PLL have a similar cost for negative distance checks, theoreti- cally, BVC-PLL may have smaller computational cost than that of PLL (due ...
-
[8]
V ARIANTS AND IMPLEMENTATION 8 5.1 Generalization Directed Graphs: For directed graph, each vertex v is assigned with two labels Lin(v) and Lout(v). VC-PLL and BVC-PLL can be easily extended to handle directed graphs by considering these two labels as separate computations. Specifically, in the Scatter function, the new labels δLin and δLout will be sent o...
-
[9]
deg.” denotes average degree. “dia
EV ALUATION In this section, we perform a detailed evaluation of BVC- PLL, focusing on answering the following questions: 1) How does BVC-PLL algorithm perform against the original PLL in a sequential setting (single thread; no parallelism)? Sp ecif- ically, the theoretical analysis indicates it may run faste r, but we conduct experiments to confirm this. ...
-
[10]
RELATED WORK Online and Parallel Shortest Path Distance Com- putation: The standard (single source) shortest path com- putation method is Dijkstra’s algorithm [26] for weighted graphs and Breadth-First Search (BFS) traversal for un- weighted graphs. There have been quite a list of efforts in designing parallel Dijkstra and BFS algorithms [56, 49, 47]. Part...
-
[11]
We have achieved this by mapping the algorithm to a vertex-centric model
CONCLUSION In this paper, we proposed VC-PLL, which, to the best of our knowledge, is the first scalable parallelization of Prun ed Landmark Labeling (PLL) that is able to produce the same result as the sequential method. We have achieved this by mapping the algorithm to a vertex-centric model. We also introduced a new batched execution mechanism for VC-PL...
-
[12]
Graph minors. ii. algorithmic aspects of tree-width. Journal of Algorithms , 7(3):309 – 322, 1986
work page 1986
-
[13]
I. Abraham, D. Delling, A. V. Goldberg, and R. F. Werneck. A hub-based labeling algorithm for shortest paths in road networks. In SEA’11, 2011
work page 2011
-
[14]
I. Abraham, D. Delling, A. V. Goldberg, and R. F. Werneck. Hierarchical hub labelings for shortest paths. In Proceedings of the 20th Annual European Conference on Algorithms , ESA’12, 2012
work page 2012
-
[15]
I. Abraham, A. Fiat, A. V. Goldberg, and R. F. Werneck. Highway dimension, shortest paths, and provably efficient algorithms. In SODA’10, 2010
work page 2010
-
[16]
A. B. Adcock, B. D. Sullivan, and M. W. Mahoney. Tree decompositions and social graphs. Internet Mathematics, 12(5):315–361, 2016
work page 2016
- [17]
- [18]
-
[19]
M. Babenko, A. V. Goldberg, A. Gupta, and V. Nagarajan. Algorithms for hub label optimization. ACM Trans. Algorithms , 13(1), Nov. 2016
work page 2016
-
[20]
M. Babenko, A. V. Goldberg, H. Kaplan, R. Savchenko, and M. Weller. On the complexity of hub labeling (extended abstract). In Mathematical Foundations of Computer Science 2015 , pages 62–74. Springer Berlin Heidelberg, 2015
work page 2015
-
[21]
H. Bast, D. Delling, A. Goldberg, M. Müller-Hannemann, T. Pajor, P. Sanders, D. Wagner, and R. F. Werneck. Route Planning in Transportation Networks, pages 19–80. Springer International Publishing, Cham, 2016
work page 2016
-
[22]
H. Bast, S. Funke, P. Sanders, and D. Schultes. Fast Routing in Road Networks with Transit Nodes. Science, 316:566–, Apr. 2007
work page 2007
- [23]
-
[24]
M. Besta, M. Podstawski, L. Groner, E. Solomonik, and T. Hoefler. To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing , pages 93–104. ACM, 2017
work page 2017
-
[25]
U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology , 25:163–177, 2001
work page 2001
-
[26]
S. Cao, W. Lu, and Q. Xu. Grarep: Learning Graph Representations with Global Structural Information. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 891–900. ACM, 2015
work page 2015
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
-
[33]
A. Das Sarma, S. Gollapudi, and R. Najork, M.and Panigrahy. A sketch-based distance oracle for web-scale graphs. In WSDM ’10 , 2010
work page 2010
-
[34]
R. Dathathri, G. Gill, L. Hoang, H.-V. Dang, A. Brooks, N. Dryden, M. Snir, and K. Pingali. Gluon: A Communication-Optimizing Substrate for Distributed Heterogeneous Graph Analytics. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 752–768. ACM, 2018
work page 2018
-
[35]
D. Delling, A. V. Goldberg, T. Pajor, and R. F. Werneck. Robust exact distance queries on massive networks. Microsoft Research, USA, Tech. Rep , 2, 2014
work page 2014
-
[36]
L. Dhulipala, G. E. Blelloch, and J. Shun. Theoretically efficient parallel graph algorithms can be fast and scalable. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, SPAA ’18, pages 393–404, 2018
work page 2018
-
[37]
E. W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik , 1(1):269–271, December 1959
work page 1959
- [38]
-
[39]
C. Gavoille, D. Peleg, S. Pérennes, and R. Raz. Distance labeling in graphs. J. Algorithms , 53(1):85–112, 2004
work page 2004
-
[40]
R. Geisberger, P. Sanders, D. Schultes, and D. Delling. Contraction hierarchies: faster and simpler hierarchical routing in road networks. In Proceedings 13 of the 7th international conference on Experimental algorithms, pages 319–333, 2008
work page 2008
-
[41]
R. Geisberger, P. Sanders, D. Schultes, and D. Delling. Contraction hierarchies: faster and simpler hierarchical routing in road networks. In Proceedings of the 7th international conference on Experimental algorithms, 2008
work page 2008
-
[42]
A. V. Goldberg and C. Harrelson. Computing the shortest path: A search meets graph theory. In SODA ’05, 2005
work page 2005
-
[43]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI, volume 12, page 2, 2012
work page 2012
-
[44]
A. Gubichev, S. Bedathur, S. Seufert, and G. Weikum. Fast and accurate estimation of shortest paths in large graphs. In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM ’10, pages 499–508, 2010
work page 2010
-
[45]
R. J. Gutman. Reach-based routing: A new approach to shortest path algorithms optimized for road networks. In ALENEX/ANALC, pages 100–111, 2004
work page 2004
-
[46]
W. Han, D. Mawhirter, B. Wu, and M. Buland. Graphie: Large-Scale Asynchronous Graph Traversals on Just a GPU. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT),, pages 233–245. IEEE, 2017
work page 2017
- [47]
-
[48]
R. Jin, N. Ruan, Y. Xiang, and V. E. Lee. A highway-centric labeling approach for answering distance queries on large sparse graphs. In SIGMOD, 2012
work page 2012
-
[49]
R. Jin, N. Ruan, B. You, and H. Wang. Hub-accelerator: Fast and exact shortest path computation in large social networks. CoRR, abs/1305.0507, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
- [50]
-
[51]
R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In SIGMOD’09, 2009
work page 2009
-
[52]
N. Jing, Y. Huang, and E. A. Rundensteiner. Hierarchical encoded path views for path query processing: An optimal model and its performance evaluation. TKDE, 10(3):409–432, 1998
work page 1998
-
[53]
S. Jung and S. Pramanik. An efficient path computation model for hierarchically structured topographical road maps. TKDE, 14(5):1029–1046, 2002
work page 2002
-
[54]
F. Khorasani, K. Vora, R. Gupta, and L. N. Bhuyan. CuSha: Vertex-centric Graph Processing on GPUs. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing , pages 239–252. ACM, 2014
work page 2014
-
[55]
S. Ko and W.-S. Han. TurboGraph++: A Scalable and Fast Graph Analytics System. In Proceedings of the 2018 International Conference on Management of Data, pages 395–410. ACM, 2018
work page 2018
-
[56]
H. Kriegel, P. Kröger, M. Renz, and T. Schmidt. Hierarchical graph embedding for efficient query processing in very large traffic networks. In SSDBM ’08, 2008
work page 2008
- [57]
-
[58]
C. E. Leiserson and T. B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In Proceedings of the Twenty-second Annual ACM Symposium on Parallelism in Algorithms and Architectures , SPAA ’10, 2010
work page 2010
-
[59]
Y. Li, M. L. Yiu, N. M. Kou, et al. An experimental study on hub labeling based shortest path algorithms. Proceedings of the VLDB Endowment , 11(4):445–457, 2017
work page 2017
- [60]
-
[61]
H. Liu, H. H. Huang, and Y. Hu. iBFS: Concurrent Breadth-First Search on GPUs. In Proceedings of the 2016 International Conference on Management of Data, pages 403–416. ACM, 2016
work page 2016
-
[62]
Y. Low, J. E. Gonzalez, A. Kyrola, D. Bickson, C. E. Guestrin, and J. Hellerstein. Graphlab: A New Framework for Parallel Machine Learning. arXiv preprint arXiv:1408.2041, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2041
-
[63]
C. C. M. Potamias, F. Bonchi and A. Gionis. Fast shortest path distance estimation in large networks. In CIKM ’09 , 2009
work page 2009
- [64]
-
[65]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 international conference on Management of data , SIGMOD ’10, 2010
work page 2010
-
[66]
R. R. McCune, T. Weninger, and G. Madey. Thinking Like a Vertex: a Survey of Vertex-centric Frameworks for Large-scale Distributed Graph Processing. ACM Computing Surveys (CSUR) , 48(2):25, 2015
work page 2015
-
[67]
U. Meyer and P. Sanders. Delta-stepping: A parallel single source shortest path algorithm. In Proceedings of the 6th Annual European Symposium on Algorithms, ESA ’98, pages 393–404, 1998
work page 1998
- [68]
- [69]
- [70]
-
[71]
S. B. K. A. D. Patterson. Direction-Optimizing Breadth-First Search. SC12, November, pages 10–16, 2012
work page 2012
-
[72]
M. Qiao, H. Cheng, L. Chang, and J. X. Yu. Approximate shortest distance computing: A query-dependent local landmark scheme. In ICDE, 2012
work page 2012
-
[73]
K. Qiu, Y. Zhu, J. Yuan, J. Zhao, X. Wang, and T. Wolf. ParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs. In Proceedings of the 47th International Conference on Parallel Processing, page 2. ACM, 2018
work page 2018
- [74]
-
[75]
A. Roy, I. Mihailovic, and W. Zwaenepoel. X-Stream: Edge-centric Graph Processing using Streaming Partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles , pages 472–488. ACM, 2013
work page 2013
-
[76]
S. Salihoglu and J. Widom. Optimizing graph algorithms on pregel-like systems. Proc. VLDB Endow., 7(7), Mar. 2014
work page 2014
- [77]
-
[78]
P. Sanders and D. Schultes. Highway hierarchies hasten exact shortest path queries. In 17th Eur. Symp. Algorithms (ESA) , 2005
work page 2005
-
[79]
J. Sankaranarayanan, H. Alborzi, and H. Samet. Distance join queries on spatial networks. In Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems, GIS ’06, 2006
work page 2006
-
[80]
J. Sankaranarayanan, H. Samet, and H. Alborzi. Path oracles for spatial networks. PVLDB, 2, August 2009
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.