MobText-SISA: Efficient Machine Unlearning for Mobility Logs with Spatio-Temporal and Natural-Language Data

Hamada Rizk; Haruki Yonekura; Hirozumi Yamaguchi; Ren Ozeki; Tatsuya Amano

arxiv: 2508.19554 · v1 · submitted 2025-08-27 · 💻 cs.LG

MobText-SISA: Efficient Machine Unlearning for Mobility Logs with Spatio-Temporal and Natural-Language Data

Haruki Yonekura , Ren Ozeki , Tatsuya Amano , Hamada Rizk , Hirozumi Yamaguchi This is my paper

Pith reviewed 2026-05-18 21:27 UTC · model grok-4.3

classification 💻 cs.LG

keywords machine unlearningmobility dataSISA trainingspatio-temporal datatext embeddingssimilarity clusteringprivacy complianceincremental learning

0 comments

The pith

MobText-SISA enables exact unlearning on mobility logs by retraining only the shard that contains a deleted record after similarity-aware clustering.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MobText-SISA to handle on-demand removal of individual contributions from large multimodal mobility datasets that combine GPS trajectories, timestamps, and free-form text. It maps each trip's numerical and linguistic features into a shared latent space and then applies similarity-aware clustering to place similar records into the same shard. This design means any future deletion request touches only one constituent model, which can be retrained from its most recent checkpoint while the remaining shards stay frozen. A reader would care because privacy rules require such removals yet repeated full retraining is too expensive for urban-scale data. Experiments on a ten-month real-world log show the method preserves baseline accuracy and reaches lower error faster than random sharding.

Core claim

MobText-SISA embeds each trip's numerical and linguistic features into a shared latent space, then uses similarity-aware clustering to distribute samples across shards so that future deletions affect only a single constituent model. Each shard trains incrementally; constituent predictions are aggregated at inference time. A deletion request triggers retraining solely of the affected shard from its last valid checkpoint, which guarantees exact unlearning.

What carries the argument

Similarity-aware clustering of embedded spatio-temporal and textual features that localizes deletion impact to one shard while preserving inter-shard diversity for accurate aggregation.

If this is right

Deletion requests are handled by retraining only the affected shard from its last checkpoint, leaving all other shards untouched.
Exact unlearning is guaranteed because the retrained shard matches the state it would have reached without the deleted sample.
Overall predictive accuracy on mobility tasks stays at the level achieved by a baseline model trained on the full dataset.
Training and unlearning both converge faster than when shards are formed by random assignment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same embedding-plus-clustering step could be applied to other multimodal streams that mix location traces with text, such as ride-hailing notes or social-media check-ins.
If the latent-space similarity metric is refined, the trade-off between shard locality and prediction diversity could be tuned further to reduce error after unlearning.
The checkpoint-based retraining pattern suggests a path toward lifelong learning systems that forget old records without periodic full resets.

Load-bearing premise

Similarity-aware clustering can partition the data so any single deletion request affects only one shard while the shards retain enough diversity for the aggregated model to remain accurate.

What would settle it

A concrete test would be to delete a record, retrain only its shard from the checkpoint, and then compare the aggregated model's predictions on a held-out set against the predictions of a model retrained from scratch on the remaining data; any systematic difference would show the unlearning is not exact.

Figures

Figures reproduced from arXiv: 2508.19554 by Hamada Rizk, Haruki Yonekura, Hirozumi Yamaguchi, Ren Ozeki, Tatsuya Amano.

**Figure 1.** Figure 1: System Overview. whose slice contains that sample is retrained from its last valid checkpoint, drastically reducing computational overhead compared with full-model retraining. Despite these efficiency gains, recent studies show that SISA’s effectiveness hinges on the statistical similarity of the shards: when the data distributions differ markedly, constituent models learn shard-specific boundaries that f… view at source ↗

**Figure 2.** Figure 2: Umap Visualization[1] and Gaussian mixture model-based clustering. retrieved from OpenStreetMap’s routing API3 by the applicable speed limit. 2.2 Sharding Mechanism The dataset is divided into 𝑘 disjoint shards that can later be retrained independently when a right-to-be-forgotten request arrives. The partitioning seeks two outcomes: first, any single deletion request should affect as few shards as possib… view at source ↗

**Figure 4.** Figure 4: Training Epochs vs. The Number of Shards. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

Modern mobility platforms have stored vast streams of GPS trajectories, temporal metadata, free-form textual notes, and other unstructured data. Privacy statutes such as the GDPR require that any individual's contribution be unlearned on demand, yet retraining deep models from scratch for every request is untenable. We introduce MobText-SISA, a scalable machine-unlearning framework that extends Sharded, Isolated, Sliced, and Aggregated (SISA) training to heterogeneous spatio-temporal data. MobText-SISA first embeds each trip's numerical and linguistic features into a shared latent space, then employs similarity-aware clustering to distribute samples across shards so that future deletions touch only a single constituent model while preserving inter-shard diversity. Each shard is trained incrementally; at inference time, constituent predictions are aggregated to yield the output. Deletion requests trigger retraining solely of the affected shard from its last valid checkpoint, guaranteeing exact unlearning. Experiments on a ten-month real-world mobility log demonstrate that MobText-SISA (i) sustains baseline predictive accuracy, and (ii) consistently outperforms random sharding in both error and convergence speed. These results establish MobText-SISA as a practical foundation for privacy-compliant analytics on multimodal mobility data at urban scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MobText-SISA adapts SISA unlearning to mixed GPS and text mobility logs via joint embeddings and similarity clustering, but the single-shard deletion claim may not hold if user trips get split.

read the letter

MobText-SISA takes SISA and adds a shared latent space for numerical and linguistic trip features, then clusters on those embeddings to assign shards. Deletions then trigger retraining only the affected shard from its checkpoint. On a ten-month real mobility log the method keeps baseline accuracy and beats random sharding on error and convergence speed. That is the concrete advance: a workable pipeline for exact unlearning on heterogeneous urban data where full retraining is too slow for regulatory requests. The engineering is direct and the choice of real logs over synthetic data is the right one for this domain. The central risk is in the clustering step. The description embeds and clusters individual trips without any mention of forcing all trips from one user into the same shard. Users with varied routes or notes can easily end up split, so a single deletion request would hit multiple models. That undercuts both the efficiency gain and the guarantee that the full user contribution is removed from one constituent model. The reported gains over random sharding are plausible but rest on comparisons whose details, metrics, and variance are not shown in the abstract, so the strength of the empirical support is still open. This paper is for people building privacy tools for location services or multimodal unlearning systems. A reader who needs a practical SISA extension for mobility data will find usable ideas here. It is not a general theory paper but an applied one that deserves referee time to check the user-level sharding constraints and the experimental controls.

Referee Report

2 major / 1 minor

Summary. The paper introduces MobText-SISA, an extension of SISA training for exact machine unlearning on heterogeneous mobility logs containing GPS trajectories, temporal metadata, and free-form textual notes. It embeds individual trips into a shared latent space and applies similarity-aware clustering to distribute samples across shards such that any deletion request affects only one constituent model while preserving inter-shard diversity. Shards are trained incrementally, predictions are aggregated at inference, and deletions trigger retraining of only the affected shard from its last checkpoint. Experiments on a ten-month real-world mobility dataset are reported to show that the method sustains baseline predictive accuracy and outperforms random sharding in both error and convergence speed.

Significance. If the empirical claims and the clustering guarantee hold, the work provides a practical, scalable framework for privacy-compliant analytics on multimodal mobility data at urban scale, directly addressing GDPR-style deletion requests without full retraining. The integration of embedding-based similarity clustering with SISA-style sharding for spatio-temporal and natural-language data represents a targeted advance over generic unlearning methods.

major comments (2)

[Abstract] Abstract: The central efficiency and exact-unlearning claims rest on the assertion that similarity-aware clustering ensures 'future deletions touch only a single constituent model.' However, the description indicates clustering is performed on per-trip embeddings without explicit mention of user-ID aggregation or constraints. This leaves open the possibility that heterogeneous trips from the same individual (differing in time, location, or textual notes) are placed in separate shards, requiring retraining of multiple models upon deletion and violating both the single-shard efficiency and the exact-unlearning guarantee.
[Abstract] Experiments (as summarized in Abstract): The claims of sustained baseline accuracy and consistent outperformance versus random sharding in error and convergence speed are presented without reference to concrete baselines, evaluation metrics, error bars, statistical tests, or ablation studies isolating the contribution of similarity-aware clustering versus random sharding. This absence weakens the empirical support for the central performance claims.

minor comments (1)

[Abstract] The abstract would be strengthened by explicitly naming the downstream predictive task (e.g., next-location prediction or trajectory forecasting) used to measure accuracy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments on our work. We address each of the major comments below and indicate the revisions we plan to make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The central efficiency and exact-unlearning claims rest on the assertion that similarity-aware clustering ensures 'future deletions touch only a single constituent model.' However, the description indicates clustering is performed on per-trip embeddings without explicit mention of user-ID aggregation or constraints. This leaves open the possibility that heterogeneous trips from the same individual (differing in time, location, or textual notes) are placed in separate shards, requiring retraining of multiple models upon deletion and violating both the single-shard efficiency and the exact-unlearning guarantee.

Authors: We thank the referee for pointing out this potential ambiguity. The current description focuses on trip-level embeddings, but to rigorously guarantee that deletions for an individual affect only one shard, we will revise the method to first aggregate all trips belonging to the same user (via user-ID) and compute a single embedding per user before applying similarity-aware clustering. This will be clearly stated in the revised abstract and Section 3, ensuring the exact-unlearning property holds at the user level as intended. revision: yes
Referee: [Abstract] Experiments (as summarized in Abstract): The claims of sustained baseline accuracy and consistent outperformance versus random sharding in error and convergence speed are presented without reference to concrete baselines, evaluation metrics, error bars, statistical tests, or ablation studies isolating the contribution of similarity-aware clustering versus random sharding. This absence weakens the empirical support for the central performance claims.

Authors: The abstract is intended as a concise summary, while the detailed experimental setup, including specific metrics (e.g., prediction error for mobility trajectories and classification accuracy for textual notes), baselines (original model and random sharding), error bars from repeated trials, statistical significance testing, and ablations on the clustering component, are fully reported in the Experiments section of the manuscript. To address the concern, we will add a brief clause in the abstract referencing the quantitative improvements and directing readers to the full evaluation for details. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on experimental validation of a design choice

full rationale

The paper describes a practical extension of SISA training via trip embeddings and similarity-aware clustering to enable shard-isolated unlearning on multimodal mobility data. The core guarantee (deletions affect only one shard) is presented as a consequence of the clustering design rather than a mathematical derivation that reduces to fitted parameters or self-referential equations. Central claims are supported by direct comparisons on a ten-month real-world dataset against baselines and random sharding, with no load-bearing self-citations, ansatzes smuggled via prior work, or renaming of known results as novel predictions. The method is self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard machine-learning assumptions about embeddings and clustering quality rather than introducing new free parameters or invented entities in the abstract description.

axioms (1)

domain assumption Joint embedding of numerical spatio-temporal features and natural-language notes into a shared latent space preserves information sufficient for both accurate prediction and effective similarity clustering.
Invoked when the paper states that each trip's features are embedded before clustering.

pith-pipeline@v0.9.0 · 5766 in / 1218 out tokens · 38184 ms · 2026-05-18T21:27:11.337261+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

embeds each trip’s numerical and linguistic features into a shared latent space, then employs similarity-aware clustering to distribute samples across shards so that future deletions touch only a single constituent model while preserving inter-shard diversity
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Gaussian mixture model defined on this space provides a soft similarity metric among trips. Clusters are traversed in a round-robin manner

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[1]

Etienne Becht et al. 2019. Dimensionality reduction for visualizing single-cell data using UMAP. Nature biotechnology 37, 1 (2019), 38–44

work page 2019
[2]

Lucas Bourtoule et al. 2021. Machine unlearning. In 2021 IEEE symposium on security and privacy (SP) . IEEE, 141–159

work page 2021
[3]

Shushman Choudhury et al. 2024. Towards a Trajectory-powered Foundation Model of Mobility. In Proceedings of the 3rd ACM SIGSPATIAL International Work- shop on Spatial Big Data and AI for Industrial Applications (GeoIndustry ’24) . Association for Computing Machinery, New York, NY, USA, 1–4

work page 2024
[4]

Gelei Deng et al. 2024. MASTERKEY: Automated Jailbreaking of Large Language Model Chatbots. In NDSS

work page 2024
[5]

Jacob Devlin et al. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) . 4171–4186

work page 2019
[6]

Yaron Kanza, Balachander Krishnamurthy, and Divesh Srivastava. 2024. A Geospa- tial Perspective on Data Ownership, the Right to be Forgotten, Copyrights, and Plagiarism in Generative AI. In Proceedings of the 32nd ACM International Confer- ence on Advances in Geographic Information Systems (SIGSPATIAL ’24). Association for Computing Machinery, New York, N...

work page 2024
[7]

Korbinian Koch and Marcus Soll. 2023. No matter how you slice it: Machine unlearning with sisa comes at the expense of minority classes. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) . IEEE, 622–637

work page 2023
[8]

Ren Ozeki, Haruki Yonekura, Hamada Rizk, and Hirozumi Yamaguchi. 2023. Bal- ancing Privacy and Utility of Spatio-Temporal Data for Taxi-Demand Prediction. In 2023 24th IEEE International Conference on Mobile Data Management (MDM) . 215–220. https://doi.org/10.1109/MDM58254.2023.00044

work page doi:10.1109/mdm58254.2023.00044 2023
[9]

Ren Ozeki, Haruki Yonekura, Hamada Rizk, and Hirozumi Yamaguchi. 2024. Privacy Preserved Taxi Demand Prediction System for Distributed Data. In Pro- ceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems. 123–134

work page 2024
[10]

Mohammad Mehdi Rastikerdar et al. 2024. CACTUS: Dynamically Switchable Context-aware micro-Classifiers for Efficient IoT Inference. InProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services (MOBISYS ’24). ACM, New York, NY, USA, 505–518

work page 2024
[11]

Douglas Reynolds. 2009. Gaussian mixture models. In Encyclopedia of biometrics. Springer, 659–663

work page 2009
[12]

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Mem- bership inference attacks against machine learning models. In 2017 IEEE Sympo- sium on Security and Privacy (SP) . IEEE, 3–18

work page 2017
[13]

Yuya Takeuchi, Haruki Yonekura, Kyosuke Yamashita, and Hirozumi Yamaguchi

work page
[14]

Evaluating Attribute Inference Risks in Urban Care Taxi Arrival Time Prediction Models Using Geospatial Data. In Proceedings of the 14th International Workshop on Urban Computing (UrbComp) (in conjunction with the 31st ACM International Conference on Knowledge Discovery and Data Mining(SIGKDD))

work page
[15]

World Health Organization. 2022. Long-term care. https://www.who.int/europe/ news-room/questions-and-answers/item/long-term-care. Accessed: May 15, 2025

work page 2022
[16]

World Health Organization. 2025. Health workforce. https://www.who.int/health- topics/health-workforce. Accessed: May 15, 2025

work page 2025
[17]

Benjamin Zi Hao Zhao et al. 2021. On the (In)Feasibility of Attribute Inference Attacks on Machine Learning Models . In 2021 IEEE European Symposium on Security and Privacy (EuroS&P) . IEEE Computer Society, Los Alamitos, CA, USA, 232–251

work page 2021

[1] [1]

Etienne Becht et al. 2019. Dimensionality reduction for visualizing single-cell data using UMAP. Nature biotechnology 37, 1 (2019), 38–44

work page 2019

[2] [2]

Lucas Bourtoule et al. 2021. Machine unlearning. In 2021 IEEE symposium on security and privacy (SP) . IEEE, 141–159

work page 2021

[3] [3]

Shushman Choudhury et al. 2024. Towards a Trajectory-powered Foundation Model of Mobility. In Proceedings of the 3rd ACM SIGSPATIAL International Work- shop on Spatial Big Data and AI for Industrial Applications (GeoIndustry ’24) . Association for Computing Machinery, New York, NY, USA, 1–4

work page 2024

[4] [4]

Gelei Deng et al. 2024. MASTERKEY: Automated Jailbreaking of Large Language Model Chatbots. In NDSS

work page 2024

[5] [5]

Jacob Devlin et al. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) . 4171–4186

work page 2019

[6] [6]

Yaron Kanza, Balachander Krishnamurthy, and Divesh Srivastava. 2024. A Geospa- tial Perspective on Data Ownership, the Right to be Forgotten, Copyrights, and Plagiarism in Generative AI. In Proceedings of the 32nd ACM International Confer- ence on Advances in Geographic Information Systems (SIGSPATIAL ’24). Association for Computing Machinery, New York, N...

work page 2024

[7] [7]

Korbinian Koch and Marcus Soll. 2023. No matter how you slice it: Machine unlearning with sisa comes at the expense of minority classes. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) . IEEE, 622–637

work page 2023

[8] [8]

Ren Ozeki, Haruki Yonekura, Hamada Rizk, and Hirozumi Yamaguchi. 2023. Bal- ancing Privacy and Utility of Spatio-Temporal Data for Taxi-Demand Prediction. In 2023 24th IEEE International Conference on Mobile Data Management (MDM) . 215–220. https://doi.org/10.1109/MDM58254.2023.00044

work page doi:10.1109/mdm58254.2023.00044 2023

[9] [9]

Ren Ozeki, Haruki Yonekura, Hamada Rizk, and Hirozumi Yamaguchi. 2024. Privacy Preserved Taxi Demand Prediction System for Distributed Data. In Pro- ceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems. 123–134

work page 2024

[10] [10]

Mohammad Mehdi Rastikerdar et al. 2024. CACTUS: Dynamically Switchable Context-aware micro-Classifiers for Efficient IoT Inference. InProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services (MOBISYS ’24). ACM, New York, NY, USA, 505–518

work page 2024

[11] [11]

Douglas Reynolds. 2009. Gaussian mixture models. In Encyclopedia of biometrics. Springer, 659–663

work page 2009

[12] [12]

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Mem- bership inference attacks against machine learning models. In 2017 IEEE Sympo- sium on Security and Privacy (SP) . IEEE, 3–18

work page 2017

[13] [13]

Yuya Takeuchi, Haruki Yonekura, Kyosuke Yamashita, and Hirozumi Yamaguchi

work page

[14] [14]

Evaluating Attribute Inference Risks in Urban Care Taxi Arrival Time Prediction Models Using Geospatial Data. In Proceedings of the 14th International Workshop on Urban Computing (UrbComp) (in conjunction with the 31st ACM International Conference on Knowledge Discovery and Data Mining(SIGKDD))

work page

[15] [15]

World Health Organization. 2022. Long-term care. https://www.who.int/europe/ news-room/questions-and-answers/item/long-term-care. Accessed: May 15, 2025

work page 2022

[16] [16]

World Health Organization. 2025. Health workforce. https://www.who.int/health- topics/health-workforce. Accessed: May 15, 2025

work page 2025

[17] [17]

Benjamin Zi Hao Zhao et al. 2021. On the (In)Feasibility of Attribute Inference Attacks on Machine Learning Models . In 2021 IEEE European Symposium on Security and Privacy (EuroS&P) . IEEE Computer Society, Los Alamitos, CA, USA, 232–251

work page 2021