Towards Serverless Processing of Spatiotemporal Big Data Queries

David Bermbach; Diana Baumann; Tim C. Rese

arxiv: 2507.06005 · v3 · pith:67EVWESYnew · submitted 2025-07-08 · 💻 cs.DB · cs.DC

Towards Serverless Processing of Spatiotemporal Big Data Queries

Diana Baumann , Tim C. Rese , David Bermbach This is my paper

Pith reviewed 2026-05-21 23:48 UTC · model grok-4.3

classification 💻 cs.DB cs.DC

keywords serverless computingspatiotemporal databig data queriesfunction-as-a-servicequery decompositionparallel processingscalability

0 comments

The pith

Spatiotemporal queries on growing big data volumes can be scaled by decomposing them into independent subqueries run in parallel on serverless function platforms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing systems such as PostGIS inherit the scale-out limits of relational databases and therefore struggle with the continuously increasing volumes of spatiotemporal data produced by many sources. The paper proposes breaking typical queries into small, parallelizable subqueries that exploit the near-instant scaling property of Function-as-a-Service platforms. This decomposition is presented as a practical way to meet the performance needs of application fields that require rapid analysis of such data. The vision is positioned as a partial solution rather than a complete replacement for current database approaches.

Core claim

We propose our vision of a native serverless data processing approach for spatiotemporal data: We break down queries into small subqueries which then leverage the near-instant scaling of Function-as-a-Service platforms to execute them in parallel. With this, we partially solve the scalability needs of big spatiotemporal data processing.

What carries the argument

Decomposition of spatiotemporal queries into independent subqueries executed via Function-as-a-Service platforms for parallel scaling.

Load-bearing premise

Typical spatiotemporal queries can be split into independent subqueries whose coordination, data access, and result aggregation incur only low overhead on current FaaS platforms.

What would settle it

A benchmark showing that a representative set of decomposed spatiotemporal queries runs with higher total latency or monetary cost on FaaS than on a conventional PostGIS-style system would falsify the core proposal.

Figures

Figures reproduced from arXiv: 2507.06005 by David Bermbach, Diana Baumann, Tim C. Rese.

**Figure 2.** Figure 2: The coordinator manages waves of worker functions for each subquery with the amount of worker functions per wave dependent on the parallelizability of the subquery. Earth during the last two quarters. This way, the data is sharded based on the spatial attribute continents and temporal attribute quarter simultaneously. Further data sharding strategies may consider (i) the query type, e.g., spatialonly, … view at source ↗

read the original abstract

Spatiotemporal data are being produced in continuously growing volumes by a variety of data sources and a variety of application fields rely on rapid analysis of such data. Existing systems such as PostGIS or MobilityDB usually build on relational database systems, thus, inheriting their scale-out characteristics. As a consequence, big spatiotemporal data scenarios still have limited support even though many query types can easily be parallelized. In this paper, we propose our vision of a native serverless data processing approach for spatiotemporal data: We break down queries into small subqueries which then leverage the near-instant scaling of Function-as-a-Service platforms to execute them in parallel. With this, we partially solve the scalability needs of big spatiotemporal data processing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a vision for a native serverless data processing approach for spatiotemporal big data. It notes that existing systems like PostGIS inherit limited scale-out from relational databases and proposes decomposing queries into small subqueries that leverage the near-instant scaling of Function-as-a-Service (FaaS) platforms for parallel execution, thereby partially solving scalability needs for big spatiotemporal data.

Significance. If the decomposition and execution model can be implemented with low coordination overhead, the vision could enable elastic, cost-effective processing of growing spatiotemporal datasets in domains such as mobility analysis and environmental monitoring. The manuscript offers no implementation, measurements, or quantitative model, so its significance hinges on whether future work can demonstrate that FaaS-based subquery execution outperforms or complements existing parallel spatiotemporal systems.

major comments (2)

[Abstract] Abstract: The central claim that breaking queries into subqueries 'partially solve[s] the scalability needs' lacks any supporting derivation, example, or cost model. The text provides no concrete strategy for decomposing typical queries (range, kNN, joins, trajectories) while handling spatial data dependencies such as boundary overlaps or global aggregation.
[Proposed vision] Proposed vision: The assumption that subqueries can execute independently with negligible coordination overhead on stateless FaaS is load-bearing but unexamined. No discussion addresses how data access, state, or result aggregation would occur without routing through external storage (introducing latency and egress costs) or how this compares to existing partitioned spatiotemporal indexes.

minor comments (2)

[Abstract] Abstract and throughout: Add references to prior serverless database systems and spatiotemporal query engines to better contextualize the novelty of the vision.
[General] General: The manuscript would benefit from a brief section outlining potential challenges (data locality, cold starts, billing implications) even at the vision level.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments on our vision paper. The feedback correctly identifies areas where the high-level proposal would benefit from additional clarification and illustrative detail. We address each major comment below and describe the planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that breaking queries into subqueries 'partially solve[s] the scalability needs' lacks any supporting derivation, example, or cost model. The text provides no concrete strategy for decomposing typical queries (range, kNN, joins, trajectories) while handling spatial data dependencies such as boundary overlaps or global aggregation.

Authors: We agree that the abstract would be strengthened by concrete examples. As this is a vision paper, we intentionally kept the presentation high-level and did not include a full derivation or cost model. In the revised version we will add brief illustrative decomposition strategies for representative query types (e.g., partitioning a spatial range query across sub-regions with overlap handling via data replication or boundary adjustment) while explicitly stating that a quantitative cost model remains future work. revision: partial
Referee: [Proposed vision] Proposed vision: The assumption that subqueries can execute independently with negligible coordination overhead on stateless FaaS is load-bearing but unexamined. No discussion addresses how data access, state, or result aggregation would occur without routing through external storage (introducing latency and egress costs) or how this compares to existing partitioned spatiotemporal indexes.

Authors: We acknowledge that coordination, data access, and aggregation mechanisms deserve explicit discussion. The vision assumes that spatiotemporal data reside in scalable object stores directly accessible by FaaS functions, enabling largely independent execution, with a lightweight final aggregation step. In the revision we will expand the proposed-vision section to describe these mechanisms, note the potential latency and cost implications of external storage, and provide a qualitative comparison to partitioned indexes used in existing systems such as PostGIS extensions or GeoMesa. revision: yes

Circularity Check

0 steps flagged

Vision proposal contains no derivation chain or fitted results

full rationale

The manuscript is a forward-looking vision paper that proposes decomposing spatiotemporal queries into subqueries for parallel FaaS execution. No equations, parameters, or formal derivations appear in the provided text or abstract. The central claim is presented as a proposal rather than derived from prior results or self-citations within the paper itself. Consequently, there is no load-bearing step that reduces to its own inputs by construction, and the content remains self-contained as an exploratory suggestion without internal circular reasoning.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is a high-level vision paper with no formal mathematical model. The approach depends on domain assumptions about query decomposability and FaaS platform behavior rather than new axioms or entities.

axioms (1)

domain assumption Spatiotemporal queries can be broken into independent subqueries suitable for parallel execution with low coordination cost
Invoked in the proposal to enable the serverless scaling benefit.

pith-pipeline@v0.9.0 · 5646 in / 1269 out tokens · 45177 ms · 2026-05-21T23:48:22.784076+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We break down queries into small subqueries which then leverage the near-instant scaling of Function-as-a-Service platforms to execute them in parallel.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

following the MapReduce model

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

[1]

Md Mahbub Alam, Luis Torgo, and Albert Bifet. 2022. A Survey on Spatio-temporal Data Analytics Systems. ACM Comput. Surv. (2022)

work page 2022
[2]

Mohamed Bakli, Mahmoud Sakr, Esteban Zimányi, Nils Dijk, and Marco Slot. 2025. Distributed MobilityDB: A Scalable Moving Object Database Management System. ACM Trans. Spatial Algorithms Syst. (2025)

work page 2025
[3]

David Bermbach, Abhishek Chandra, Chandra Krintz, Aniruddha Gokhale, Aleksander Slominski, Lauritz Thamsen, Everton Cavalcante, Tian Guo, Ivona Brandic, and Rich Wolski. 2021. On the Future of Cloud Engineering. In Proc. of IC2E 2021 . IEEE

work page 2021
[4]

Thomas Bodner, Daniel Ritter, Martin Boissier, and Tilmann Rabl. 2025. Skyrise: Exploiting Serverless Cloud Infrastructure for Elastic Data Processing. http://arxiv.org/abs/2501.08479

work page arXiv 2025
[5]

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM (2008)

work page 2008
[6]

Joseph M Hellerstein, Jose Faleiro, Joseph E Gonzalez, Johann Schleier- Smith, Vikram Sreekanti, Alexey Tumanov, and Chenggang Wu. 2019. Serverless computing: One step forward, two steps back. Proc. of CIDR (2019)

work page 2019
[7]

Huanghuang Liang, Zheng Zhang, Chuang Hu, Yili Gong, and Dazhao Cheng. 2024. A Survey on Spatio-Temporal Big Data Analytics Ecosys- tem: Resource Management, Processing Platform, and Applications. IEEE Transactions on Big Data (2024)

work page 2024
[8]

Elyes Lounissi, Suvam Kumar Das, Ronnit Peter, Xiaozheng Zhang, Suprio Ray, and Lianyin Jia. 2025. FunDa: scalable serverless data analytics and in situ query processing. Journal of Big Data (2025)

work page 2025
[9]

Rese, Alexandra Kapp, and David Bermbach

Tim C. Rese, Alexandra Kapp, and David Bermbach. 2025. Evaluating the Impact Of Spatial Features Of Mobility Data and Index Choice On Database Performance. arXiv:2505.14466 [cs.DB] https://arxiv.org/abs/ 2505.14466

work page arXiv 2025

[1] [1]

Md Mahbub Alam, Luis Torgo, and Albert Bifet. 2022. A Survey on Spatio-temporal Data Analytics Systems. ACM Comput. Surv. (2022)

work page 2022

[2] [2]

Mohamed Bakli, Mahmoud Sakr, Esteban Zimányi, Nils Dijk, and Marco Slot. 2025. Distributed MobilityDB: A Scalable Moving Object Database Management System. ACM Trans. Spatial Algorithms Syst. (2025)

work page 2025

[3] [3]

David Bermbach, Abhishek Chandra, Chandra Krintz, Aniruddha Gokhale, Aleksander Slominski, Lauritz Thamsen, Everton Cavalcante, Tian Guo, Ivona Brandic, and Rich Wolski. 2021. On the Future of Cloud Engineering. In Proc. of IC2E 2021 . IEEE

work page 2021

[4] [4]

Thomas Bodner, Daniel Ritter, Martin Boissier, and Tilmann Rabl. 2025. Skyrise: Exploiting Serverless Cloud Infrastructure for Elastic Data Processing. http://arxiv.org/abs/2501.08479

work page arXiv 2025

[5] [5]

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM (2008)

work page 2008

[6] [6]

Joseph M Hellerstein, Jose Faleiro, Joseph E Gonzalez, Johann Schleier- Smith, Vikram Sreekanti, Alexey Tumanov, and Chenggang Wu. 2019. Serverless computing: One step forward, two steps back. Proc. of CIDR (2019)

work page 2019

[7] [7]

Huanghuang Liang, Zheng Zhang, Chuang Hu, Yili Gong, and Dazhao Cheng. 2024. A Survey on Spatio-Temporal Big Data Analytics Ecosys- tem: Resource Management, Processing Platform, and Applications. IEEE Transactions on Big Data (2024)

work page 2024

[8] [8]

Elyes Lounissi, Suvam Kumar Das, Ronnit Peter, Xiaozheng Zhang, Suprio Ray, and Lianyin Jia. 2025. FunDa: scalable serverless data analytics and in situ query processing. Journal of Big Data (2025)

work page 2025

[9] [9]

Rese, Alexandra Kapp, and David Bermbach

Tim C. Rese, Alexandra Kapp, and David Bermbach. 2025. Evaluating the Impact Of Spatial Features Of Mobility Data and Index Choice On Database Performance. arXiv:2505.14466 [cs.DB] https://arxiv.org/abs/ 2505.14466

work page arXiv 2025