CROSS-Net: Region-Agnostic Taxi-Demand Prediction Using Feature Disentanglement

Aidana Baimbetova; Hamada Rizk; Haruki Yonekura; Hirozumi Yamaguchi; Ren Ozeki

arxiv: 2310.18215 · v2 · submitted 2023-10-27 · 💻 cs.LG

CROSS-Net: Region-Agnostic Taxi-Demand Prediction Using Feature Disentanglement

Ren Ozeki , Haruki Yonekura , Aidana Baimbetova , Hamada Rizk , Hirozumi Yamaguchi This is my paper

Pith reviewed 2026-05-24 06:22 UTC · model grok-4.3

classification 💻 cs.LG

keywords taxi demand predictionregion-agnostic featuresfeature disentanglementvariational autoencodergraph neural networkscross-region generalizationride-hailing services

0 comments

The pith

A variational autoencoder separates taxi demand features into region-agnostic components that support accurate forecasts in entirely new urban areas.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CROSS-Net, which pairs multiview graph neural networks with a variational autoencoder to forecast taxi demand. The autoencoder splits input features into those unique to a given region and those that are not. Relying only on the region-agnostic features lets a model trained in one city produce predictions in different cities it has never seen. This removes the restriction that current systems must be rebuilt or retrained for every new service area.

Core claim

The paper claims that a variational autoencoder can disentangle the input features of a multiview graph neural network into region-specific and region-agnostic parts, and that the region-agnostic parts alone suffice for accurate taxi demand prediction when the model is applied to previously unobserved regions.

What carries the argument

Variational autoencoder that disentangles region-specific from region-agnostic features in multiview graph neural network inputs for cross-region use.

Load-bearing premise

That the variational autoencoder can reliably separate input features into region-specific and region-agnostic components such that the region-agnostic part alone supports accurate prediction on entirely unseen regions without further adaptation or data.

What would settle it

Train the model on data from one set of cities, apply it unchanged to a city with markedly different street network and density, and measure whether prediction error stays comparable to a model trained directly on the target city.

Figures

Figures reproduced from arXiv: 2310.18215 by Aidana Baimbetova, Hamada Rizk, Haruki Yonekura, Hirozumi Yamaguchi, Ren Ozeki.

**Figure 1.** Figure 1: Proposed framework structure 𝐿𝑒𝑙𝑏𝑜 = 𝐸𝑟,𝑞𝑟 (𝑧𝑟 |𝑥 𝑟 ;Φ𝑟 ),𝑞(𝑧 |𝑥 𝑟 ;Φ) [log 𝑝(𝑥 𝑟 |𝑧𝑟, 𝑧;𝜙)] − 𝐾𝐿(𝑞𝑟 (𝑧𝑟 |𝑥𝑟; Φ𝑟)||𝑝(𝑧𝑟)) − 𝐾𝐿(𝑞(𝑧|𝑥𝑟; Φ)||𝑝(𝑧)) (2) where the first term is the reconstruction error, which measures the deviation between the original features 𝑥 𝑟 and the reconstructed features 𝑝(𝑥 𝑟 |𝑧𝑟, 𝑧;𝜙). The last two terms calculate Kullback-Leibler (KL) divergence between the sampled latent features a… view at source ↗

**Figure 2.** Figure 2: Demand prediction accuracy of each method [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

The growing demand for ride-hailing services has led to an increasing need for accurate taxi demand prediction. Existing systems are limited to specific regions, lacking generality to unseen areas. This paper presents a novel taxi demand prediction system, harnessing the strengths of multiview graph neural networks to capture spatial-temporal dependencies and patterns in urban environments. Additionally, the proposed system CROSS-Net employs a spatially transferable approach, enabling it to train a model that can be deployed to previously unseen regions. To achieve this, the framework incorporates the power of a Variational Autoencoder to disentangle the input features into region-specific and region-agnostic components. The region-agnostic features facilitate cross-region taxi demand predictions, allowing the model to generalize well across different urban areas. Experimental results demonstrate the effectiveness of CROSS-Net in accurately forecasting taxi demand, even in previously unobserved regions, thus showcasing its potential for optimizing taxi services and improving transportation efficiency on a broader scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CROSS-Net applies multiview GNNs plus VAE disentanglement to taxi demand but offers no clear enforcement that the agnostic latents are actually region-free.

read the letter

The central claim is that a VAE can split input features so the region-agnostic part alone supports accurate demand prediction on entirely new cities. That is the point a reader should check first. The paper combines multiview graph neural networks for spatial-temporal patterns with this disentanglement step and reports that the resulting model works on unseen regions. The specific packaging for cross-city taxi demand is new relative to earlier region-specific models. It does a reasonable job of stating a practical deployment issue that matters for ride-hailing operators. The architecture is described clearly enough that someone could re-implement the high-level structure. The main weakness is the one flagged in the stress test. Standard VAE training supplies no mechanism that forces the agnostic branch to exclude region-specific information. Without an adversarial term, mutual-information penalty, or even simple post-training diagnostics such as latent-region correlation or branch ablations, any observed transfer performance could result from partial leakage rather than clean separation. The abstract gives no indication that such checks were performed. Experiments are mentioned but supply no numbers, baselines, or held-out region details in the provided text, so the strength of the evidence cannot be assessed. This work is aimed at applied researchers who build demand models for logistics or urban services. A reader already working on graph transfer or transportation forecasting might pick up the architecture and test the disentanglement claim themselves. It is coherent on its own terms and engages the relevant literature, so it clears the bar for peer review. An editor should send it out, but should ask referees to focus on whether the region-agnostic features are verifiably free of city-specific signal.

Referee Report

3 major / 2 minor

Summary. The paper proposes CROSS-Net, a multiview graph neural network architecture augmented with a variational autoencoder (VAE) that disentangles input features into region-specific and region-agnostic components. The region-agnostic latents are then used to train a predictor that generalizes to entirely unseen urban regions for taxi demand forecasting, addressing the limitation of region-specific models.

Significance. If the disentanglement is verifiably achieved and the cross-region generalization holds, the result would be significant for practical deployment of demand-prediction systems across cities without retraining, a common pain point in spatial-temporal forecasting. The combination of multiview GNNs with explicit feature separation is a reasonable direction, though the manuscript provides no machine-checked proofs, reproducible code artifacts, or parameter-free derivations.

major comments (3)

[§3] §3 (Method), VAE objective: the training loss is the standard VAE ELBO (reconstruction + KL); no auxiliary term (adversarial region classifier on the agnostic branch, mutual-information penalty, or explicit orthogonality constraint) is introduced to enforce that the region-agnostic latent contains no residual region-specific signal. This directly undermines the central claim that the agnostic features alone suffice for prediction on unseen regions.
[§4] §4 (Experiments), cross-region evaluation: no post-training diagnostic is reported that quantifies leakage (e.g., accuracy of a region classifier trained on the agnostic latents, or correlation between agnostic codes and city identity). Without such verification, observed cross-city performance could be explained by partial leakage rather than true agnostic features.
[§4.2] §4.2 (Ablation study): the ablation removes the entire VAE but does not isolate the effect of the disentanglement mechanism itself (e.g., comparing against a non-disentangled shared encoder with region embeddings). This leaves open whether the reported gains are due to the claimed separation or simply to the multiview GNN backbone.

minor comments (2)

[Abstract] Abstract: the phrase 'spatially transferable approach' is used without a precise definition; clarify whether it refers only to the VAE or to the full pipeline.
[§3.1] Notation: the symbols for the region-specific and region-agnostic latent variables are introduced without an explicit equation linking them to the encoder outputs; add a short equation block in §3.1.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below.

read point-by-point responses

Referee: [§3] §3 (Method), VAE objective: the training loss is the standard VAE ELBO (reconstruction + KL); no auxiliary term (adversarial region classifier on the agnostic branch, mutual-information penalty, or explicit orthogonality constraint) is introduced to enforce that the region-agnostic latent contains no residual region-specific signal. This directly undermines the central claim that the agnostic features alone suffice for prediction on unseen regions.

Authors: We acknowledge that the VAE objective follows the standard ELBO without auxiliary terms such as mutual-information penalties. The manuscript induces separation via the dual-branch architecture (region-specific vs. agnostic) and the downstream use of agnostic latents for cross-region prediction. We agree an explicit constraint would strengthen the claim and will add a mutual-information minimization term in the revised method section. revision: yes
Referee: [§4] §4 (Experiments), cross-region evaluation: no post-training diagnostic is reported that quantifies leakage (e.g., accuracy of a region classifier trained on the agnostic latents, or correlation between agnostic codes and city identity). Without such verification, observed cross-city performance could be explained by partial leakage rather than true agnostic features.

Authors: We agree that a direct leakage diagnostic would provide stronger evidence. In the revision we will train a region classifier on the agnostic latents after training and report its accuracy (and correlation with city identity) as a quantitative check. revision: yes
Referee: [§4.2] §4.2 (Ablation study): the ablation removes the entire VAE but does not isolate the effect of the disentanglement mechanism itself (e.g., comparing against a non-disentangled shared encoder with region embeddings). This leaves open whether the reported gains are due to the claimed separation or simply to the multiview GNN backbone.

Authors: The existing ablation removes the full VAE to show its overall contribution. To isolate the disentanglement effect we will add, in the revised ablation, a baseline that uses a shared encoder without VAE-based separation but with explicit region embeddings. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on empirical VAE training and cross-region evaluation

full rationale

The paper's central mechanism is a VAE trained to produce region-agnostic latents that are then fed to a multiview GNN predictor. No equations, fitted parameters, or self-citations are shown that would make the cross-region prediction equivalent to the training inputs by construction. The claim rests on the (unverified in the provided text) success of the disentanglement plus held-out region testing, which is an independent empirical test rather than a definitional or self-referential reduction. No load-bearing self-citation chain or ansatz smuggling is present in the abstract or described architecture.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only input supplies no explicit free parameters, axioms, or invented entities with supporting evidence; the central claim depends on the unstated success of the VAE disentanglement step.

invented entities (1)

region-agnostic features no independent evidence
purpose: to enable cross-region generalization of taxi demand predictions
Introduced via the VAE but no independent evidence or falsifiable test is described in the abstract.

pith-pipeline@v0.9.0 · 5707 in / 1191 out tokens · 19653 ms · 2026-05-24T06:22:57.594697+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 2 internal anchors

[1]

Yumeki Goto, Tomoya Matsumoto, Hamada Rizk, Naoto Yanai, and Hirozumi Yamaguchi. 2023. Privacy-Preserving Taxi-Demand Prediction Using Federated Learning. In International Conference on Smart Computing . IEEE

work page 2023
[2]

Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 855–864

work page 2016
[3]

Thomas N Kipf and Max Welling. 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[4]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. arXiv:1609.02907 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[5]

Fei Miao, Shuo Han, Shan Lin, and George J Pappas. 2015. Robust taxi dispatch under model uncertainties. In 2015 54th IEEE Conference on Decision and Control (CDC). IEEE, 2816–2821

work page 2015
[6]

Fei Miao, Shuo Han, Shan Lin, Qian Wang, John A Stankovic, Abdeltawab Hen- dawi, Desheng Zhang, Tian He, and George J Pappas. 2017. Data-driven robust taxi dispatch under demand uncertainties. IEEE Transactions on Control Systems Technology 27, 1 (2017), 175–191

work page 2017
[7]

Mohsen, H

M. Mohsen, H. Rizk, and M. Youssef. 2023. Privacy-Preserving by Design: Indoor Positioning System Using Wi-Fi Passive TDOA. In 2023 24th IEEE International Conference on Mobile Data Management (MDM) . IEEE Computer Society, Los Alamitos, CA, USA, 221–230. https://doi.org/10.1109/MDM58254.2023.00045

work page doi:10.1109/mdm58254.2023.00045 2023
[8]

Masakazu Ohno, Riki Ukyo, Tatsuya Amano, Hamada Rizk, and Hirozumi Ya- maguchi. 2023. Privacy-preserving Pedestrian Tracking using Distributed 3D LiDARs. In 2023 IEEE International Conference on Pervasive Computing and Com- munications (PerCom). IEEE, 43–52

work page 2023
[9]

Ren Ozeki, Haruki Yonekura, Hamada Rizk, and Hirozumi Yamaguchi. 2022. Sharing without caring: privacy protection of users’ spatio-temporal data without compromise on utility. In Proceedings of the 30th International Conference on Advances in Geographic Information Systems . 1–2

work page 2022
[10]

Ren Ozeki, Haruki Yonekura, Hamada Rizk, and Hirozumi Yamaguchi. 2023. Bal- ancing Privacy and Utility of Spatio-Temporal Data for Taxi-Demand Prediction. In The 24th IEEE International Conference on Mobile Data Management . IEEE

work page 2023
[11]

Michal Piorkowski, Natasa Sarafijanovic-Djukic, and Matthias Grossglauser. 2022. CRAWDAD epfl/mobility

work page 2022
[12]

Abolfazl Safikhani, Camille Kamga, Sandeep Mudigonda, Sabiheh Sadat Faghih, and Bahman Moghimi. 2020. Spatio-temporal modeling of yellow taxi demands in New York City using generalized STAR models. International Journal of Forecasting 36, 3 (2020), 1138–1148

work page 2020
[13]

Jun Xu, Rouhollah Rahmatizadeh, Ladislau Bölöni, and Damla Turgut. 2017. Real-time prediction of taxi demand using recurrent neural networks. IEEE Transactions on Intelligent Transportation Systems 19, 8 (2017), 2572–2581

work page 2017
[14]

Xiang Yan, Xinyu Liu, and Xilei Zhao. 2020. Using machine learning for direct demand modeling of ridesourcing services in Chicago. Journal of Transport Geography 83 (2020), 102661

work page 2020
[15]

Haruki Yonekura, Ren Ozeki, Hamada Rizk, and Hirozumi Yamaguchi. 2023. STM-A Privacy-Enhanced Solution for Spatio-Temporal Trajectory Management. In 2023 24th IEEE International Conference on Mobile Data Management (MDM) . IEEE, 168–171

work page 2023
[16]

Daqing Zhang, Lin Sun, Bin Li, Chao Chen, Gang Pan, Shijian Li, and Zhaohui Wu. 2015. Understanding Taxi Service Strategies From Taxi GPS Traces. IEEE Transactions on Intelligent Transportation Systems 16, 1 (2015), 123–135

work page 2015
[17]

Kanglei Zhou, Zhiyuan Cheng, Hubert P. H. Shum, Frederick W. B. Li, and Xiaohui Liang. 2021. STGAE: Spatial-Temporal Graph Auto-Encoder for Hand Motion Denoising. In 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). 41–49

work page 2021

[1] [1]

Yumeki Goto, Tomoya Matsumoto, Hamada Rizk, Naoto Yanai, and Hirozumi Yamaguchi. 2023. Privacy-Preserving Taxi-Demand Prediction Using Federated Learning. In International Conference on Smart Computing . IEEE

work page 2023

[2] [2]

Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 855–864

work page 2016

[3] [3]

Thomas N Kipf and Max Welling. 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[4] [4]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. arXiv:1609.02907 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2017

[5] [5]

Fei Miao, Shuo Han, Shan Lin, and George J Pappas. 2015. Robust taxi dispatch under model uncertainties. In 2015 54th IEEE Conference on Decision and Control (CDC). IEEE, 2816–2821

work page 2015

[6] [6]

Fei Miao, Shuo Han, Shan Lin, Qian Wang, John A Stankovic, Abdeltawab Hen- dawi, Desheng Zhang, Tian He, and George J Pappas. 2017. Data-driven robust taxi dispatch under demand uncertainties. IEEE Transactions on Control Systems Technology 27, 1 (2017), 175–191

work page 2017

[7] [7]

Mohsen, H

M. Mohsen, H. Rizk, and M. Youssef. 2023. Privacy-Preserving by Design: Indoor Positioning System Using Wi-Fi Passive TDOA. In 2023 24th IEEE International Conference on Mobile Data Management (MDM) . IEEE Computer Society, Los Alamitos, CA, USA, 221–230. https://doi.org/10.1109/MDM58254.2023.00045

work page doi:10.1109/mdm58254.2023.00045 2023

[8] [8]

Masakazu Ohno, Riki Ukyo, Tatsuya Amano, Hamada Rizk, and Hirozumi Ya- maguchi. 2023. Privacy-preserving Pedestrian Tracking using Distributed 3D LiDARs. In 2023 IEEE International Conference on Pervasive Computing and Com- munications (PerCom). IEEE, 43–52

work page 2023

[9] [9]

Ren Ozeki, Haruki Yonekura, Hamada Rizk, and Hirozumi Yamaguchi. 2022. Sharing without caring: privacy protection of users’ spatio-temporal data without compromise on utility. In Proceedings of the 30th International Conference on Advances in Geographic Information Systems . 1–2

work page 2022

[10] [10]

Ren Ozeki, Haruki Yonekura, Hamada Rizk, and Hirozumi Yamaguchi. 2023. Bal- ancing Privacy and Utility of Spatio-Temporal Data for Taxi-Demand Prediction. In The 24th IEEE International Conference on Mobile Data Management . IEEE

work page 2023

[11] [11]

Michal Piorkowski, Natasa Sarafijanovic-Djukic, and Matthias Grossglauser. 2022. CRAWDAD epfl/mobility

work page 2022

[12] [12]

Abolfazl Safikhani, Camille Kamga, Sandeep Mudigonda, Sabiheh Sadat Faghih, and Bahman Moghimi. 2020. Spatio-temporal modeling of yellow taxi demands in New York City using generalized STAR models. International Journal of Forecasting 36, 3 (2020), 1138–1148

work page 2020

[13] [13]

Jun Xu, Rouhollah Rahmatizadeh, Ladislau Bölöni, and Damla Turgut. 2017. Real-time prediction of taxi demand using recurrent neural networks. IEEE Transactions on Intelligent Transportation Systems 19, 8 (2017), 2572–2581

work page 2017

[14] [14]

Xiang Yan, Xinyu Liu, and Xilei Zhao. 2020. Using machine learning for direct demand modeling of ridesourcing services in Chicago. Journal of Transport Geography 83 (2020), 102661

work page 2020

[15] [15]

Haruki Yonekura, Ren Ozeki, Hamada Rizk, and Hirozumi Yamaguchi. 2023. STM-A Privacy-Enhanced Solution for Spatio-Temporal Trajectory Management. In 2023 24th IEEE International Conference on Mobile Data Management (MDM) . IEEE, 168–171

work page 2023

[16] [16]

Daqing Zhang, Lin Sun, Bin Li, Chao Chen, Gang Pan, Shijian Li, and Zhaohui Wu. 2015. Understanding Taxi Service Strategies From Taxi GPS Traces. IEEE Transactions on Intelligent Transportation Systems 16, 1 (2015), 123–135

work page 2015

[17] [17]

Kanglei Zhou, Zhiyuan Cheng, Hubert P. H. Shum, Frederick W. B. Li, and Xiaohui Liang. 2021. STGAE: Spatial-Temporal Graph Auto-Encoder for Hand Motion Denoising. In 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). 41–49

work page 2021