LiLIS: A Lightweight Distributed Learned Index Framework for Spatial Decision Analysis

Wanjun Hao; Yikai Dong; Zhongpu Chen

arxiv: 2504.18883 · v4 · pith:K44DDTRDnew · submitted 2025-04-26 · 💻 cs.DB

LiLIS: A Lightweight Distributed Learned Index Framework for Spatial Decision Analysis

Zhongpu Chen , Yikai Dong , Wanjun Hao This is my paper

Pith reviewed 2026-05-22 17:58 UTC · model grok-4.3

classification 💻 cs.DB

keywords learned indexdistributed spatial systemsspatial queriesspatial decision analysisindex construction overheadquery latencypoint queriesrange queries

0 comments

The pith

LiLIS integrates learned indices into distributed spatial systems to reduce query latency and construction overhead with minimal changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LiLIS to address overhead in distributed spatial analysis systems for tasks like facility location and risk assessment. Existing systems incur high costs in local index building and query refinement for read-intensive uses. LiLIS adds machine-learned search strategies paired with spatial-aware partitioning to handle point queries, range queries, nearest neighbor searches, and spatial joins. This setup works without altering the core execution engines. Experiments indicate gains in speed and lower build times on both real and generated datasets.

Core claim

The central discovery is that a lightweight distributed learned index can be built by combining machine-learned search with spatial partitioning, enabling efficient support for multiple spatial query types in existing distributed frameworks while cutting both response times and index setup costs.

What carries the argument

The integration of machine-learned search strategies with spatial-aware partitioning in a distributed framework that requires no modifications to existing execution engines.

If this is right

Supports faster point, range, kNN, and spatial join queries in distributed settings
Lowers the cost of building indexes for spatial data
Enhances performance in read-intensive spatial decision analysis scenarios
Maintains compatibility with current distributed spatial analysis systems

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could extend to other forms of distributed query processing if similar learned models prove accurate.
Benefits may increase with data volume provided the partitioning strategy holds.
Similar lightweight integrations might be tested in non-spatial domains with high query loads.

Load-bearing premise

Learned index techniques can be combined with spatial partitioning and deployed in distributed systems through only lightweight additions without creating extra costs in query refinement or moving data around.

What would settle it

Running the same spatial queries on a large real-world dataset and observing that LiLIS has higher or equal latency and construction time compared to standard distributed indexes.

Figures

Figures reproduced from arXiv: 2504.18883 by Wanjun Hao, Yikai Dong, Zhongpu Chen.

**Figure 2.** Figure 2: The architecture of LiLIS on Apache Spark. It introduces spatial RDD [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The main idea of spline index is to obtain an estimated position [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: The throughput (jobs per minute) of point and range queries under [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 4.** Figure 4: The overall performance under default settings. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 7.** Figure 7: Skewed and uniform range queries under different selectivities. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: kNN queries in LiLIS when varying k [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Index building cost. To demonstrate the superior efficiency of LiLIS in index construction, we conduct comparative experiments measuring the build time for all index variants in both LiLIS and Sedona. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

read the original abstract

Spatial query and analysis results are often directly applied to decision-making processes such as facility location, proximity resource discovery, accessibility analysis, and risk assessment. Therefore, the efficiency of underlying spatial data access directly impacts the response speed of spatial decision analysis. Existing distributed spatial analysis systems (e.g., Simba, Sedona) already have relatively mature execution frameworks. However, they incur substantial overhead in local index construction and query refinement, especially in read-intensive scenarios. Recent studies have shown that learned indices exhibit considerable retrieval potential in single-machine settings, yet how to integrate them into distributed spatial analysis systems with low modification costs remains unaddressed. In this article, we present LiLIS, a Lightweight distributed Learned Index prototype for Spatial decision analysis. Without modifying existing execution engines, LiLIS integrates machine-learned search strategies with spatial-aware partitioning in a distributed framework, and efficiently supports common spatial queries such as point queries, range queries, $k$-nearest neighbor ($k$NN) queries, and spatial joins. Extensive experiments on both real-world and synthetic datasets demonstrate that LiLIS achieves lower latency across various query types and reduces index construction overhead compared with baseline approaches. These results indicate its potential for improving the responsiveness of read-intensive spatial decision-support workflows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LiLIS layers learned indexes onto existing distributed spatial engines with minimal changes and reports lower query latencies plus reduced build costs in its experiments.

read the letter

LiLIS layers learned indexes onto existing distributed spatial engines with minimal changes and reports lower query latencies plus reduced build costs in its experiments. The core move is to combine machine-learned search with spatial-aware partitioning inside frameworks like Simba or Sedona, all without touching the execution engine itself. It targets the usual spatial query mix—point lookups, ranges, kNN, and joins—for read-heavy decision-support workloads such as urban planning or resource allocation. Experiments on real-world and synthetic data are the main evidence offered for the gains over baselines. That lightweight integration angle is the clearest practical contribution here, since most prior learned-index work stayed on single machines and most distributed spatial systems still rely on traditional indexes with their known construction and refinement costs. The paper earns credit for keeping the prototype simple enough that adoption might not require rewriting large codebases. The soft spots sit in the missing specifics. How the learned models are trained on partitioned data, how they interface with query refinement, and whether data movement or model inference adds hidden latency are not laid out in enough detail to judge the integration risk. The experimental claims would land harder with clearer baselines, exact overhead measurements, and some indication of variance across runs. Without those, it is difficult to tell whether the reported improvements hold up or whether the refinement step reintroduces the overhead the authors set out to avoid. This paper is aimed at database researchers and practitioners who maintain or extend distributed spatial systems and want incremental speedups on read-intensive workloads. A reader already working in that niche would get usable ideas from the prototype description and the empirical comparison. I would send it to peer review. The idea is a reasonable extension of existing techniques, the claims are empirical rather than purely theoretical, and the gaps are fixable with more method detail rather than fatal to the central argument.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces LiLIS, a lightweight distributed learned index framework for spatial decision analysis. It integrates machine-learned search strategies with spatial-aware partitioning into existing systems such as Simba and Sedona without modifying the execution engines, supporting point queries, range queries, kNN queries, and spatial joins. The central claim is that this approach reduces query latency and index construction overhead relative to baselines, as shown in experiments on real-world and synthetic datasets.

Significance. If the performance gains and lightweight integration hold under rigorous evaluation, LiLIS could offer a practical path to incorporating learned indices into mature distributed spatial engines for read-intensive workloads in applications like facility location and risk assessment. The emphasis on minimal modifications to existing frameworks is a potential strength for adoption.

major comments (2)

[§5] §5 (Experiments): the description of the experimental setup is insufficient to support the latency and overhead claims; no details are provided on hardware, dataset scales, baseline configurations, how query refinement overhead was isolated and measured, or whether error bars and statistical tests were applied to the reported improvements.
[§4] §4 (Architecture/Implementation): the interface between the learned models, spatial-aware partitioning, and query refinement is described at a high level; without concrete mechanisms or pseudocode showing data movement costs, it is unclear whether the integration avoids new bottlenecks as asserted in the weakest assumption.

minor comments (2)

[Abstract] Abstract: the claim of 'extensive experiments' would be strengthened by briefly noting the number of datasets, query types tested, or scale of data volumes.
[§3] Notation in §3: define the training procedure and model parameters for the learned indices more explicitly to allow reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the manuscript to improve clarity and reproducibility where the comments identify gaps in the current description.

read point-by-point responses

Referee: [§5] §5 (Experiments): the description of the experimental setup is insufficient to support the latency and overhead claims; no details are provided on hardware, dataset scales, baseline configurations, how query refinement overhead was isolated and measured, or whether error bars and statistical tests were applied to the reported improvements.

Authors: We agree that the experimental section requires more detail to support the claims and enable reproducibility. In the revised manuscript we will expand §5 with: (1) hardware specifications of the distributed cluster (node count, CPU cores, memory per node, interconnect); (2) precise dataset scales including cardinalities and spatial distributions for both real-world and synthetic collections; (3) exact parameter settings and versions used for all baselines (Simba, Sedona, and any learned-index variants); (4) a description of how query-refinement overhead was isolated (by instrumenting the execution pipeline to time the refinement phase separately); and (5) error bars together with the number of runs and any statistical tests (e.g., reporting mean ± std and noting significance thresholds). These additions will be placed in the experimental setup subsection and will not change the reported performance trends. revision: yes
Referee: [§4] §4 (Architecture/Implementation): the interface between the learned models, spatial-aware partitioning, and query refinement is described at a high level; without concrete mechanisms or pseudocode showing data movement costs, it is unclear whether the integration avoids new bottlenecks as asserted in the weakest assumption.

Authors: We acknowledge that the current description of the integration remains high-level. In the revision we will augment §4 with: (1) pseudocode for the core data-flow steps (model lookup within a partition, candidate generation, and refinement hand-off); (2) an explicit accounting of data movement, showing that the learned index operates on already-local partition data and therefore adds negligible extra communication beyond what the underlying spatial framework already performs; and (3) a short discussion confirming that the lightweight design reuses existing partition boundaries and does not introduce new cross-node traffic. These concrete mechanisms will directly address the concern about potential bottlenecks. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes LiLIS as a lightweight framework for integrating learned-index search strategies with spatial-aware partitioning into existing distributed spatial execution engines without engine modifications. Its central claims rest on experimental comparisons of query latency and index construction overhead across real-world and synthetic datasets for point, range, kNN, and join queries. No equations, parameter-fitting procedures, self-citations, or uniqueness theorems are invoked in the abstract or high-level description that would reduce any prediction or result to the inputs by construction. The work is self-contained against external benchmarks via direct empirical evaluation rather than a closed mathematical derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no mathematical derivations, free parameters, or new entities are visible in the provided text.

pith-pipeline@v0.9.0 · 5753 in / 1131 out tokens · 30377 ms · 2026-05-22T17:58:22.719920+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We leverage an error-bounded spline interpolation to implement LiLIS by learning the two-dimensional distribution... S(key) = ˆp ± ϵ

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

[1]

Why-not questions about spatial temporal top-k trajectory similarity search,

C. Luo, T. Dan, Y . Li, X. Meng, and G. Li, “Why-not questions about spatial temporal top-k trajectory similarity search,” Knowledge-Based Systems, vol. 231, p. 107407, 2021

work page 2021
[2]

Nearest and farthest spa- tial skyline queries under multiplicative weighted euclidean distances,

M. Fort, J. A. Sellar `es, and N. Valladares, “Nearest and farthest spa- tial skyline queries under multiplicative weighted euclidean distances,” Knowledge-Based Systems, vol. 192, p. 105299, 2020

work page 2020
[3]

Reachable region query and its applications,

M. Nie, Z. Wang, J. Yin, and B. Yao, “Reachable region query and its applications,” Information Sciences, vol. 476, pp. 95–105, 2019

work page 2019
[4]

Answering why-not questions on top-k augmented spatial keyword queries,

Y . Li, W. Zhang, C. Luo, X. Du, and J. Li, “Answering why-not questions on top-k augmented spatial keyword queries,” Knowledge- Based Systems, vol. 223, p. 107047, 2021

work page 2021
[5]

Data-driven analysis of taxi and ride-hailing services: Case study in chengdu, china,

W. Jiang, “Data-driven analysis of taxi and ride-hailing services: Case study in chengdu, china,” Computer and Decision Making: An Interna- tional Journal, vol. 2, pp. 357–373, 2025

work page 2025
[6]

Apache Spark,

Apache Software Foundation, “Apache Spark,” 2025, accessed: 2025-03-18. [Online]. Available: https://spark.apache.org/

work page 2025
[7]

Apache Flink,

——, “Apache Flink,” 2025, accessed: 2025-03-18. [Online]. Available: https://flink.apache.org/

work page 2025
[8]

Simba: Efficient in- memory spatial analytics,

D. Xie, F. Li, B. Yao, G. Li, L. Zhou, and M. Guo, “Simba: Efficient in- memory spatial analytics,” in ACM SIGMOD International Conference on Management of Data , 2016, pp. 1071–1085

work page 2016
[9]

Spatial data management in apache spark: the geospark perspective and beyond,

J. Yu, Z. Zhang, and M. Sarwat, “Spatial data management in apache spark: the geospark perspective and beyond,” GeoInformatica, vol. 23, pp. 37–78, 2019

work page 2019
[10]

The r*-tree: An efficient and robust access method for points and rectangles,

N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The r*-tree: An efficient and robust access method for points and rectangles,” inACM SIGMOD International Conference on Management of Data , 1990, pp. 322–331

work page 1990
[11]

The quadtree and related hierarchical data structures,

H. Samet, “The quadtree and related hierarchical data structures,” ACM Computing Surveys, vol. 16, no. 2, pp. 187–260, 1984

work page 1984
[12]

Geospatial big data: Survey and challenges,

J. Wu, W. Gan, H.-C. Chao, and P. S. Yu, “Geospatial big data: Survey and challenges,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2024

work page 2024
[13]

The case for learned index structures,

T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis, “The case for learned index structures,” inACM SIGMOD International Conference on Management of Data , 2018, pp. 489–504

work page 2018
[14]

Learned index: A comprehensive experi- mental evaluation,

Z. Sun, X. Zhou, and G. Li, “Learned index: A comprehensive experi- mental evaluation,”Proceedings of the VLDB Endowment, vol. 16, no. 8, pp. 1992–2004, 2023

work page 1992
[15]

Lisa: A learned index structure for spatial data,

P. Li, H. Lu, Q. Zheng, L. Yang, and G. Pan, “Lisa: A learned index structure for spatial data,” in ACM SIGMOD International Conference on Management of Data , 2020, pp. 2119–2133

work page 2020
[16]

Efficiently learning spatial indices,

G. Liu, J. Qi, C. S. Jensen, J. Bailey, and L. Kulik, “Efficiently learning spatial indices,” in IEEE International Conference on Data Engineering. IEEE, 2023, pp. 1572–1584

work page 2023
[17]

A survey of learned indexes for the multi-dimensional space,

A. Al-Mamun, H. Wu, Q. He, J. Wang, and W. G. Aref, “A survey of learned indexes for the multi-dimensional space,” arXiv preprint arXiv:2403.06456, 2024

work page arXiv 2024
[18]

Wisk: A workload-aware learned index for spatial keyword queries,

Y . Sheng, X. Cao, Y . Fang, K. Zhao, J. Qi, G. Cong, and W. Zhang, “Wisk: A workload-aware learned index for spatial keyword queries,” ACM SIGMOD International Conference on Management of Data , vol. 1, no. 2, pp. 1–27, 2023

work page 2023
[19]

A survey of multi-dimensional indexes: past and future trends,

M. Li, H. Wang, H. Dai, M. Li, C. Chai, R. Gu, F. Chen, Z. Chen, S. Li, Q. Liu et al. , “A survey of multi-dimensional indexes: past and future trends,” IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 8, pp. 3635–3655, 2024

work page 2024
[20]

How good are modern spatial libraries?

V . Pandey, A. van Renen, A. Kipf, and A. Kemper, “How good are modern spatial libraries?” Data Science and Engineering , vol. 6, no. 2, pp. 192–208, 2021

work page 2021
[21]

ckd-tree: A compact kd-tree,

G. Guti ´errez, R. Torres-Avil´es, and M. Caniup ´an, “ckd-tree: A compact kd-tree,” IEEE Access, vol. 12, pp. 28 666–28 676, 2024

work page 2024
[22]

A hierarchical binary quadtree index for spatial queries,

K. Park, “A hierarchical binary quadtree index for spatial queries,” Wireless Networks, vol. 25, pp. 1913–1929, 2019

work page 1913
[23]

Platon: Top-down r-tree packing with learned partition policy,

J. Yang and G. Cong, “Platon: Top-down r-tree packing with learned partition policy,” in Proceedings of the ACM on Management of Data . ACM New York, NY , USA, 2023, pp. 1–26. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 10

work page 2023
[24]

Gridtuner: Rein- vestigate grid size selection for spatiotemporal prediction models,

J. Jin, P. Cheng, L. Chen, X. Lin, and W. Zhang, “Gridtuner: Rein- vestigate grid size selection for spatiotemporal prediction models,” in IEEE International Conference on Data Engineering . IEEE, 2022, pp. 1193–1205

work page 2022
[25]

Grid adaptive bucketing algorithm based on differential privacy,

X. Li, X. Zhao, H. Zhang, and J. Han, “Grid adaptive bucketing algorithm based on differential privacy,” Mobile Information Systems , vol. 2022, no. 1, p. 6988976, 2022

work page 2022
[26]

Tsunami: A learned multi-dimensional index for correlated data and skewed workloads,

J. Ding, V . Nathan, M. Alizadeh, and T. Kraska, “Tsunami: A learned multi-dimensional index for correlated data and skewed workloads,” arXiv preprint arXiv:2006.13282 , 2020

work page arXiv 2006
[27]

Enhancing in-memory spatial indexing with learned search,

A. Pandey, R. Kumar, M. Li, and W. Zhang, “Enhancing in-memory spatial indexing with learned search,” IEEE Transactions on Knowledge and Data Engineering , vol. 35, no. 4, pp. 1234–1245, 2023

work page 2023
[28]

Efficient learned spatial index with interpolation function based learned model,

S. Zhang, S. Ray, R. Lu, and Y . Zheng, “Efficient learned spatial index with interpolation function based learned model,” IEEE Transactions on Big Data, vol. 9, no. 2, pp. 733–745, 2023

work page 2023
[29]

Why are learned indexes so effective?

P. Ferragina, F. Lillo, and G. Vinciguerra, “Why are learned indexes so effective?” in International Conference on Machine Learning , 2020, pp. 3123–3132

work page 2020
[30]

Deep learn- ing modelling techniques: current progress, applications, advantages, and challenges,

S. F. Ahmed, M. S. B. Alam, M. Hassan, M. R. Rozbu, T. Ishtiak, N. Rafa, M. Mofijur, A. Shawkat Ali, and A. H. Gandomi, “Deep learn- ing modelling techniques: current progress, applications, advantages, and challenges,” Artificial Intelligence Review , vol. 56, no. 11, pp. 13 521– 13 617, 2023

work page 2023
[31]

Why are learned indexes so effective but sometimes ineffective?

Q. Liu, S. Han, Y . Qi, J. Peng, J. Li, L. Lin, and L. Chen, “Why are learned indexes so effective but sometimes ineffective?” arXiv preprint arXiv:2410.00846, 2024

work page arXiv 2024
[32]

The rlr- tree: A reinforcement learning based r-tree for spatial data,

T. Gu, K. Feng, G. Cong, C. Long, Z. Wang, and S. Wang, “The rlr- tree: A reinforcement learning based r-tree for spatial data,” in ACM SIGMOD International Conference on Management of Data , 2023, pp. 1–23

work page 2023
[33]

Alex: An updatable adaptive learned index,

J. Ding, U. F. Minhas, J. Yu, C. Wang, J. Do, Y . Li, H. Zhang, B. Chan- dramouli, J. Gehrke, D. Kossmann, D. Lomet, and T. Kraska, “Alex: An updatable adaptive learned index,” in ACM SIGMOD International Conference on Management of Data , 2020, pp. 969–984

work page 2020
[34]

How good are multi-dimensional learned indexes? an experimental survey,

Q. Liu, M. Li, Y . Zeng, Y . Shen, and L. Chen, “How good are multi-dimensional learned indexes? an experimental survey,” The VLDB Journal, vol. 34, no. 2, pp. 1–29, 2025

work page 2025
[35]

Effectively learning spatial indices,

J. Qi, G. Liu, C. S. Jensen, and L. Kulik, “Effectively learning spatial indices,” Proceedings of the VLDB Endowment , vol. 13, no. 12, pp. 2341–2354, 2020

work page 2020
[36]

Theoretically optimal and empirically efficient r-trees with strong parallelizability,

J. Qi, Y . Tao, Y . Chang, and R. Zhang, “Theoretically optimal and empirically efficient r-trees with strong parallelizability,” Proceedings of the VLDB Endowment , vol. 11, no. 5, pp. 621–634, 2018

work page 2018
[37]

A comparative experimental study of distributed storage engines for big spatial data processing using geospark,

H. Shin, K. Lee, and H.-Y . Kwon, “A comparative experimental study of distributed storage engines for big spatial data processing using geospark,” The Journal of supercomputing , vol. 78, no. 2, pp. 2556– 2579, 2022

work page 2022
[38]

Skia: Scalable and efficient in-memory analytics for big spatial-textual data,

Y . Xu, B. Yao, Z. Wang, X. Gao, J. Xie, and M. Guo, “Skia: Scalable and efficient in-memory analytics for big spatial-textual data,” IEEE Transactions on Knowledge and Data Engineering , vol. 32, no. 12, pp. 2467–2480, 2020

work page 2020
[39]

A survey on spatio-temporal data analytics systems,

M. M. Alam, L. Torgo, and A. Bifet, “A survey on spatio-temporal data analytics systems,” ACM Computing Surveys, vol. 54, no. 10s, pp. 1–38, 2022

work page 2022
[40]

Geomesa: Store, index, query, and transform spatio-temporal data at scale,

LocationTech GeoMesa Project, “Geomesa: Store, index, query, and transform spatio-temporal data at scale,” 2025, accessed: 2025-03-18. [Online]. Available: https://www.geomesa.org/

work page 2025
[41]

Apache Sedona,

Apache Software Foundation, “Apache Sedona,” 2025, accessed: 2025-03-18. [Online]. Available: https://sedona.apache.org

work page 2025
[42]

Itiss: an efficient framework for querying big temporal data,

Z. Chen, B. Yao, Z.-J. Wang, W. Zhang, K. Zheng, P. Kalnis, and F. Tang, “Itiss: an efficient framework for querying big temporal data,” GeoInformatica, vol. 24, pp. 27–59, 2020

work page 2020
[43]

Radixspline: a single-pass learned index,

A. Kipf, R. Marcus, A. van Renen, M. Stoian, A. Kemper, T. Kraska, and T. Neumann, “Radixspline: a single-pass learned index,” inInternational Workshop on Exploiting Artificial Intelligence Techniques for Data Management, 2020, pp. 1–5

work page 2020
[44]

Smooth interpolating histograms with error guarantees,

T. Neumann and S. Michel, “Smooth interpolating histograms with error guarantees,” in British National Conference on Databases . Springer, 2008, pp. 126–138

work page 2008
[45]

Str: A simple and efficient algorithm for r-tree packing,

S. T. Leutenegger, M. A. Lopez, and J. Edgington, “Str: A simple and efficient algorithm for r-tree packing,” in International Conference on Data Engineering. IEEE, 1997, pp. 497–506

work page 1997
[46]

Efficient k nearest neighbor queries on remote spatial databases using range estimation,

D. Liu, E.-P. Lim, and W.-K. Ng, “Efficient k nearest neighbor queries on remote spatial databases using range estimation,” in International Conference on Scientific and Statistical Database Management . IEEE, 2002, pp. 121–130. Zhongpu Chen is currently an Assistant Professor at Southwestern University of Finance and Economics. He received the PhD degree...

work page 2002

[1] [1]

Why-not questions about spatial temporal top-k trajectory similarity search,

C. Luo, T. Dan, Y . Li, X. Meng, and G. Li, “Why-not questions about spatial temporal top-k trajectory similarity search,” Knowledge-Based Systems, vol. 231, p. 107407, 2021

work page 2021

[2] [2]

Nearest and farthest spa- tial skyline queries under multiplicative weighted euclidean distances,

M. Fort, J. A. Sellar `es, and N. Valladares, “Nearest and farthest spa- tial skyline queries under multiplicative weighted euclidean distances,” Knowledge-Based Systems, vol. 192, p. 105299, 2020

work page 2020

[3] [3]

Reachable region query and its applications,

M. Nie, Z. Wang, J. Yin, and B. Yao, “Reachable region query and its applications,” Information Sciences, vol. 476, pp. 95–105, 2019

work page 2019

[4] [4]

Answering why-not questions on top-k augmented spatial keyword queries,

Y . Li, W. Zhang, C. Luo, X. Du, and J. Li, “Answering why-not questions on top-k augmented spatial keyword queries,” Knowledge- Based Systems, vol. 223, p. 107047, 2021

work page 2021

[5] [5]

Data-driven analysis of taxi and ride-hailing services: Case study in chengdu, china,

W. Jiang, “Data-driven analysis of taxi and ride-hailing services: Case study in chengdu, china,” Computer and Decision Making: An Interna- tional Journal, vol. 2, pp. 357–373, 2025

work page 2025

[6] [6]

Apache Spark,

Apache Software Foundation, “Apache Spark,” 2025, accessed: 2025-03-18. [Online]. Available: https://spark.apache.org/

work page 2025

[7] [7]

Apache Flink,

——, “Apache Flink,” 2025, accessed: 2025-03-18. [Online]. Available: https://flink.apache.org/

work page 2025

[8] [8]

Simba: Efficient in- memory spatial analytics,

D. Xie, F. Li, B. Yao, G. Li, L. Zhou, and M. Guo, “Simba: Efficient in- memory spatial analytics,” in ACM SIGMOD International Conference on Management of Data , 2016, pp. 1071–1085

work page 2016

[9] [9]

Spatial data management in apache spark: the geospark perspective and beyond,

J. Yu, Z. Zhang, and M. Sarwat, “Spatial data management in apache spark: the geospark perspective and beyond,” GeoInformatica, vol. 23, pp. 37–78, 2019

work page 2019

[10] [10]

The r*-tree: An efficient and robust access method for points and rectangles,

N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The r*-tree: An efficient and robust access method for points and rectangles,” inACM SIGMOD International Conference on Management of Data , 1990, pp. 322–331

work page 1990

[11] [11]

The quadtree and related hierarchical data structures,

H. Samet, “The quadtree and related hierarchical data structures,” ACM Computing Surveys, vol. 16, no. 2, pp. 187–260, 1984

work page 1984

[12] [12]

Geospatial big data: Survey and challenges,

J. Wu, W. Gan, H.-C. Chao, and P. S. Yu, “Geospatial big data: Survey and challenges,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2024

work page 2024

[13] [13]

The case for learned index structures,

T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis, “The case for learned index structures,” inACM SIGMOD International Conference on Management of Data , 2018, pp. 489–504

work page 2018

[14] [14]

Learned index: A comprehensive experi- mental evaluation,

Z. Sun, X. Zhou, and G. Li, “Learned index: A comprehensive experi- mental evaluation,”Proceedings of the VLDB Endowment, vol. 16, no. 8, pp. 1992–2004, 2023

work page 1992

[15] [15]

Lisa: A learned index structure for spatial data,

P. Li, H. Lu, Q. Zheng, L. Yang, and G. Pan, “Lisa: A learned index structure for spatial data,” in ACM SIGMOD International Conference on Management of Data , 2020, pp. 2119–2133

work page 2020

[16] [16]

Efficiently learning spatial indices,

G. Liu, J. Qi, C. S. Jensen, J. Bailey, and L. Kulik, “Efficiently learning spatial indices,” in IEEE International Conference on Data Engineering. IEEE, 2023, pp. 1572–1584

work page 2023

[17] [17]

A survey of learned indexes for the multi-dimensional space,

A. Al-Mamun, H. Wu, Q. He, J. Wang, and W. G. Aref, “A survey of learned indexes for the multi-dimensional space,” arXiv preprint arXiv:2403.06456, 2024

work page arXiv 2024

[18] [18]

Wisk: A workload-aware learned index for spatial keyword queries,

Y . Sheng, X. Cao, Y . Fang, K. Zhao, J. Qi, G. Cong, and W. Zhang, “Wisk: A workload-aware learned index for spatial keyword queries,” ACM SIGMOD International Conference on Management of Data , vol. 1, no. 2, pp. 1–27, 2023

work page 2023

[19] [19]

A survey of multi-dimensional indexes: past and future trends,

M. Li, H. Wang, H. Dai, M. Li, C. Chai, R. Gu, F. Chen, Z. Chen, S. Li, Q. Liu et al. , “A survey of multi-dimensional indexes: past and future trends,” IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 8, pp. 3635–3655, 2024

work page 2024

[20] [20]

How good are modern spatial libraries?

V . Pandey, A. van Renen, A. Kipf, and A. Kemper, “How good are modern spatial libraries?” Data Science and Engineering , vol. 6, no. 2, pp. 192–208, 2021

work page 2021

[21] [21]

ckd-tree: A compact kd-tree,

G. Guti ´errez, R. Torres-Avil´es, and M. Caniup ´an, “ckd-tree: A compact kd-tree,” IEEE Access, vol. 12, pp. 28 666–28 676, 2024

work page 2024

[22] [22]

A hierarchical binary quadtree index for spatial queries,

K. Park, “A hierarchical binary quadtree index for spatial queries,” Wireless Networks, vol. 25, pp. 1913–1929, 2019

work page 1913

[23] [23]

Platon: Top-down r-tree packing with learned partition policy,

J. Yang and G. Cong, “Platon: Top-down r-tree packing with learned partition policy,” in Proceedings of the ACM on Management of Data . ACM New York, NY , USA, 2023, pp. 1–26. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 10

work page 2023

[24] [24]

Gridtuner: Rein- vestigate grid size selection for spatiotemporal prediction models,

J. Jin, P. Cheng, L. Chen, X. Lin, and W. Zhang, “Gridtuner: Rein- vestigate grid size selection for spatiotemporal prediction models,” in IEEE International Conference on Data Engineering . IEEE, 2022, pp. 1193–1205

work page 2022

[25] [25]

Grid adaptive bucketing algorithm based on differential privacy,

X. Li, X. Zhao, H. Zhang, and J. Han, “Grid adaptive bucketing algorithm based on differential privacy,” Mobile Information Systems , vol. 2022, no. 1, p. 6988976, 2022

work page 2022

[26] [26]

Tsunami: A learned multi-dimensional index for correlated data and skewed workloads,

J. Ding, V . Nathan, M. Alizadeh, and T. Kraska, “Tsunami: A learned multi-dimensional index for correlated data and skewed workloads,” arXiv preprint arXiv:2006.13282 , 2020

work page arXiv 2006

[27] [27]

Enhancing in-memory spatial indexing with learned search,

A. Pandey, R. Kumar, M. Li, and W. Zhang, “Enhancing in-memory spatial indexing with learned search,” IEEE Transactions on Knowledge and Data Engineering , vol. 35, no. 4, pp. 1234–1245, 2023

work page 2023

[28] [28]

Efficient learned spatial index with interpolation function based learned model,

S. Zhang, S. Ray, R. Lu, and Y . Zheng, “Efficient learned spatial index with interpolation function based learned model,” IEEE Transactions on Big Data, vol. 9, no. 2, pp. 733–745, 2023

work page 2023

[29] [29]

Why are learned indexes so effective?

P. Ferragina, F. Lillo, and G. Vinciguerra, “Why are learned indexes so effective?” in International Conference on Machine Learning , 2020, pp. 3123–3132

work page 2020

[30] [30]

Deep learn- ing modelling techniques: current progress, applications, advantages, and challenges,

S. F. Ahmed, M. S. B. Alam, M. Hassan, M. R. Rozbu, T. Ishtiak, N. Rafa, M. Mofijur, A. Shawkat Ali, and A. H. Gandomi, “Deep learn- ing modelling techniques: current progress, applications, advantages, and challenges,” Artificial Intelligence Review , vol. 56, no. 11, pp. 13 521– 13 617, 2023

work page 2023

[31] [31]

Why are learned indexes so effective but sometimes ineffective?

Q. Liu, S. Han, Y . Qi, J. Peng, J. Li, L. Lin, and L. Chen, “Why are learned indexes so effective but sometimes ineffective?” arXiv preprint arXiv:2410.00846, 2024

work page arXiv 2024

[32] [32]

The rlr- tree: A reinforcement learning based r-tree for spatial data,

T. Gu, K. Feng, G. Cong, C. Long, Z. Wang, and S. Wang, “The rlr- tree: A reinforcement learning based r-tree for spatial data,” in ACM SIGMOD International Conference on Management of Data , 2023, pp. 1–23

work page 2023

[33] [33]

Alex: An updatable adaptive learned index,

J. Ding, U. F. Minhas, J. Yu, C. Wang, J. Do, Y . Li, H. Zhang, B. Chan- dramouli, J. Gehrke, D. Kossmann, D. Lomet, and T. Kraska, “Alex: An updatable adaptive learned index,” in ACM SIGMOD International Conference on Management of Data , 2020, pp. 969–984

work page 2020

[34] [34]

How good are multi-dimensional learned indexes? an experimental survey,

Q. Liu, M. Li, Y . Zeng, Y . Shen, and L. Chen, “How good are multi-dimensional learned indexes? an experimental survey,” The VLDB Journal, vol. 34, no. 2, pp. 1–29, 2025

work page 2025

[35] [35]

Effectively learning spatial indices,

J. Qi, G. Liu, C. S. Jensen, and L. Kulik, “Effectively learning spatial indices,” Proceedings of the VLDB Endowment , vol. 13, no. 12, pp. 2341–2354, 2020

work page 2020

[36] [36]

Theoretically optimal and empirically efficient r-trees with strong parallelizability,

J. Qi, Y . Tao, Y . Chang, and R. Zhang, “Theoretically optimal and empirically efficient r-trees with strong parallelizability,” Proceedings of the VLDB Endowment , vol. 11, no. 5, pp. 621–634, 2018

work page 2018

[37] [37]

A comparative experimental study of distributed storage engines for big spatial data processing using geospark,

H. Shin, K. Lee, and H.-Y . Kwon, “A comparative experimental study of distributed storage engines for big spatial data processing using geospark,” The Journal of supercomputing , vol. 78, no. 2, pp. 2556– 2579, 2022

work page 2022

[38] [38]

Skia: Scalable and efficient in-memory analytics for big spatial-textual data,

Y . Xu, B. Yao, Z. Wang, X. Gao, J. Xie, and M. Guo, “Skia: Scalable and efficient in-memory analytics for big spatial-textual data,” IEEE Transactions on Knowledge and Data Engineering , vol. 32, no. 12, pp. 2467–2480, 2020

work page 2020

[39] [39]

A survey on spatio-temporal data analytics systems,

M. M. Alam, L. Torgo, and A. Bifet, “A survey on spatio-temporal data analytics systems,” ACM Computing Surveys, vol. 54, no. 10s, pp. 1–38, 2022

work page 2022

[40] [40]

Geomesa: Store, index, query, and transform spatio-temporal data at scale,

LocationTech GeoMesa Project, “Geomesa: Store, index, query, and transform spatio-temporal data at scale,” 2025, accessed: 2025-03-18. [Online]. Available: https://www.geomesa.org/

work page 2025

[41] [41]

Apache Sedona,

Apache Software Foundation, “Apache Sedona,” 2025, accessed: 2025-03-18. [Online]. Available: https://sedona.apache.org

work page 2025

[42] [42]

Itiss: an efficient framework for querying big temporal data,

Z. Chen, B. Yao, Z.-J. Wang, W. Zhang, K. Zheng, P. Kalnis, and F. Tang, “Itiss: an efficient framework for querying big temporal data,” GeoInformatica, vol. 24, pp. 27–59, 2020

work page 2020

[43] [43]

Radixspline: a single-pass learned index,

A. Kipf, R. Marcus, A. van Renen, M. Stoian, A. Kemper, T. Kraska, and T. Neumann, “Radixspline: a single-pass learned index,” inInternational Workshop on Exploiting Artificial Intelligence Techniques for Data Management, 2020, pp. 1–5

work page 2020

[44] [44]

Smooth interpolating histograms with error guarantees,

T. Neumann and S. Michel, “Smooth interpolating histograms with error guarantees,” in British National Conference on Databases . Springer, 2008, pp. 126–138

work page 2008

[45] [45]

Str: A simple and efficient algorithm for r-tree packing,

S. T. Leutenegger, M. A. Lopez, and J. Edgington, “Str: A simple and efficient algorithm for r-tree packing,” in International Conference on Data Engineering. IEEE, 1997, pp. 497–506

work page 1997

[46] [46]

Efficient k nearest neighbor queries on remote spatial databases using range estimation,

D. Liu, E.-P. Lim, and W.-K. Ng, “Efficient k nearest neighbor queries on remote spatial databases using range estimation,” in International Conference on Scientific and Statistical Database Management . IEEE, 2002, pp. 121–130. Zhongpu Chen is currently an Assistant Professor at Southwestern University of Finance and Economics. He received the PhD degree...

work page 2002