Robust OT-Guided Generative Residual Domain Adaptation for Bike-Sharing Demand Prediction under Temporal Domain Shift
Pith reviewed 2026-05-25 05:05 UTC · model grok-4.3
The pith
Gen-ROTDA adapts bike-sharing demand models across years by anchoring on few target labels, generating residuals, and trimming costly optimal transport matches.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that fitting a target-domain station-time anchor with limited labels, transferring residual rather than raw demand via a deterministic label-preserving feature generator, and trimming high-cost transport matches produces a residual predictor that adapts source models to later years more accurately and stably than standard OT or non-robust variants.
What carries the argument
Gen-ROTDA framework, which anchors adaptation on a small labeled target subset, applies residual transfer through a deterministic generator, and trims high-cost OT matches before final prediction.
If this is right
- Source models from earlier years can be updated for later years using only a modest number of new labeled records.
- Residual transfer combined with trimming improves stability when target data contains noise or anomalies.
- OT-based adaptation can be made more reliable for temporal shifts by focusing on residuals and discarding costly matches.
- The approach outperforms other OT-family methods on average across multi-year bike demand tasks.
Where Pith is reading between the lines
- The same residual-plus-trimming pattern could be tested on other sequential prediction problems that face yearly distribution drift, such as traffic or energy load forecasting.
- If the anchor step works with very few labels, the method could reduce the cost of collecting fresh ground truth each year.
- The observed stability under abnormal records points to possible use in settings where sensor outages or special events corrupt portions of the target data.
Load-bearing premise
A small labeled target subset is sufficient to build an effective station-time anchor and that discarding high-cost transport pairs together with the residual generator will keep essential adaptation information without adding bias.
What would settle it
If ablating the high-cost trimming step makes Gen-ROTDA lose its stability advantage over non-robust OT methods on data with abnormal records, the benefit of the robust component would be refuted.
Figures
read the original abstract
Bike-sharing models trained on historical station-hour data may degrade when deployed in later years because travel patterns change over time. This paper studies March Citi Bike demand prediction from 2021 to 2026 as a temporal domain adaptation problem and proposes Gen-ROTDA, a robust optimal transport-guided residual domain adaptation framework. The method fits a target-domain station-time anchor with a small labeled target subset, transfers residual rather than raw demand, applies a deterministic label-preserving residual feature generator, and trims high-cost transport matches before training the final residual predictor. Experiments compare Gen-ROTDA with anchor-only, source-only, target-only, fine-tuning, MMD adaptation, Sinkhorn OTDA, ROTDA, and Gen-OTDA. Gen-ROTDA achieves the lowest MAE on the main 2025 to 2026 task and is the best OT-family method on average across multi-year tasks, although fine-tuning and MMD adaptation remain strong overall baselines. Under abnormal target-unlabeled records, Gen-ROTDA is much more stable than non-robust OT variants, suggesting that robust transport is useful for noisy temporal transfer in bike-sharing demand prediction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Gen-ROTDA, a robust optimal transport-guided residual domain adaptation framework for temporal domain shift in bike-sharing demand prediction. Using Citi Bike station-hour data from 2021–2026, the method fits a target-domain station-time anchor from a small labeled target subset, transfers residuals rather than raw demand via a deterministic label-preserving feature generator, and trims high-cost transport matches before training the final predictor. Experiments compare it against anchor-only, source-only, target-only, fine-tuning, MMD, Sinkhorn OTDA, ROTDA, and Gen-OTDA baselines; the central empirical claim is that Gen-ROTDA attains the lowest MAE on the 2025–2026 task, is the strongest OT-family method on average across multi-year tasks, and exhibits markedly higher stability than non-robust OT variants when target-unlabeled records contain abnormalities.
Significance. If the reported MAE ordering and stability results hold under rigorous controls, the work supplies a practical, OT-based recipe for handling gradual temporal drift in demand forecasting that is more resilient to noisy target data than standard OTDA. The residual-transfer plus trimming design is internally consistent with the goal of preserving label information while discarding outlier matches, and the explicit multi-year and noise-injection experiments provide a useful template for other transportation time-series adaptation tasks.
major comments (2)
- [Abstract] Abstract and experimental claims: the statement that Gen-ROTDA achieves the lowest MAE on the 2025–2026 task and is the best OT-family method on average is presented without error bars, number of random seeds, statistical significance tests, or explicit data-split protocol; these omissions are load-bearing because the central claim is an empirical ranking among baselines.
- [Method] Method (anchor fitting and trimming): the paper relies on a small labeled target subset to fit the station-time anchor and on trimming high-cost matches for robustness, yet provides no ablation on subset size, no sensitivity curve for the trimming threshold, and no quantification of how many matches are discarded; these choices directly affect whether the reported stability advantage is reproducible or artifactual.
minor comments (2)
- [Abstract] The abstract would be clearer if it reported the actual MAE numbers (or relative deltas) rather than only qualitative rankings.
- Notation for the residual feature generator and the cost-trimming rule should be introduced with explicit equations to avoid ambiguity when readers attempt to re-implement the deterministic label-preserving step.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for minor revision. We address the two major comments below and will incorporate the suggested additions to strengthen the empirical reporting and method analysis.
read point-by-point responses
-
Referee: [Abstract] Abstract and experimental claims: the statement that Gen-ROTDA achieves the lowest MAE on the 2025–2026 task and is the best OT-family method on average is presented without error bars, number of random seeds, statistical significance tests, or explicit data-split protocol; these omissions are load-bearing because the central claim is an empirical ranking among baselines.
Authors: We agree that the central empirical claims require additional statistical support to be fully reproducible. In the revised manuscript we will (i) report all MAE results as mean ± standard deviation over 5 independent random seeds, (ii) state the exact data-split protocol (chronological 80/10/10 per year with no future leakage), and (iii) add paired t-test p-values comparing Gen-ROTDA against the strongest baselines. These details will appear both in the abstract and in a new “Experimental Protocol” subsection. revision: yes
-
Referee: [Method] Method (anchor fitting and trimming): the paper relies on a small labeled target subset to fit the station-time anchor and on trimming high-cost matches for robustness, yet provides no ablation on subset size, no sensitivity curve for the trimming threshold, and no quantification of how many matches are discarded; these choices directly affect whether the reported stability advantage is reproducible or artifactual.
Authors: We acknowledge that the current manuscript lacks these controls. We will add: (a) an ablation table varying the labeled target subset from 5 % to 25 % of stations, (b) a sensitivity plot for the trimming threshold (0.5–2.0 quantiles), and (c) the average fraction of matches discarded per experiment. These results will be placed in a new “Hyper-parameter Sensitivity” subsection and will confirm that the stability gains remain consistent across reasonable choices. revision: yes
Circularity Check
No significant circularity; empirical comparisons stand alone
full rationale
The paper frames Gen-ROTDA as an empirical framework whose performance claims rest on direct MAE comparisons against external baselines (anchor-only, source-only, fine-tuning, MMD, Sinkhorn OTDA, ROTDA, Gen-OTDA) on held-out 2025-2026 and multi-year tasks. No derivation chain is presented that reduces a claimed prediction to a fitted parameter by construction, nor does any load-bearing step invoke a self-citation whose content is itself unverified. The listed design choices (anchor fitting from small labeled target data, residual transfer, deterministic label-preserving generator, high-cost match trimming) are presented as engineering decisions whose justification is the observed stability under noisy records, not a mathematical identity. The derivation is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Data analysis and optimization for (citi)bike sharing,
E. O’Mahony and D. B. Shmoys, “Data analysis and optimization for (citi)bike sharing,” inProceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), 2015, pp. 687–694
work page 2015
-
[2]
Understanding bike-sharing systems using data mining: Exploring activity patterns,
P. V ogel, T. Greiser, and D. C. Mattfeld, “Understanding bike-sharing systems using data mining: Exploring activity patterns,”Procedia – Social and Behavioral Sciences, vol. 20, pp. 514–523, 2011
work page 2011
-
[3]
A theory of learning from different domains,
S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,”Machine Learning, vol. 79, no. 1–2, pp. 151–175, 2010
work page 2010
-
[4]
A survey on transfer learning,
S. J. Pan and Q. Yang, “A survey on transfer learning,”IEEE Transac- tions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345– 1359, 2010
work page 2010
-
[5]
Joint distribution optimal transportation for domain adaptation,
N. Courty, R. Flamary, A. Habrard, and A. Rakotomamonjy, “Joint distribution optimal transportation for domain adaptation,” inAdvances in Neural Information Processing Systems 30 (NeurIPS), 2017, pp. 3730–3739
work page 2017
-
[6]
Optimal transport for domain adaptation,
N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, “Optimal transport for domain adaptation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 9, pp. 1853–1865, 2017
work page 2017
-
[7]
Sinkhorn distances: Lightspeed computation of optimal transport,
M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” inAdvances in Neural Information Processing Systems 26 (NeurIPS), 2013, pp. 2292–2300
work page 2013
-
[8]
Computational optimal transport,
G. Peyr ´e and M. Cuturi, “Computational optimal transport,”Foundations and Trends in Machine Learning, vol. 11, no. 5–6, pp. 355–607, 2019
work page 2019
-
[9]
Inference via robust optimal transportation: Theory and methods,
Y . Ma, H. Liu, D. La Vecchia, and M. Lerasle, “Inference via robust optimal transportation: Theory and methods,”International Statistical Review, 2025, early View
work page 2025
-
[10]
Outlier-robust optimal transport,
D. Mukherjee, A. Guha, J. M. Solomon, Y . Sun, and M. Yurochkin, “Outlier-robust optimal transport,” inProceedings of the 38th Interna- tional Conference on Machine Learning (ICML), ser. PMLR, vol. 139, 2021, pp. 7850–7860
work page 2021
-
[11]
Domain-adversarial training of neural networks,
Y . Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Lavi- olette, M. Marchand, and V . Lempitsky, “Domain-adversarial training of neural networks,”Journal of Machine Learning Research, vol. 17, no. 59, pp. 1–35, 2016
work page 2016
-
[12]
A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch ¨olkopf, and A. Smola, “A kernel two-sample test,”Journal of Machine Learning Research, vol. 13, no. 25, pp. 723–773, 2012
work page 2012
-
[13]
Learning transferable features with deep adaptation networks,
M. Long, Y . Cao, J. Wang, and M. I. Jordan, “Learning transferable features with deep adaptation networks,” inProceedings of the 32nd International Conference on Machine Learning (ICML), ser. PMLR, vol. 37, 2015, pp. 97–105
work page 2015
-
[14]
Deep CORAL: Correlation alignment for deep domain adaptation,
B. Sun and K. Saenko, “Deep CORAL: Correlation alignment for deep domain adaptation,” inComputer Vision – ECCV 2016 Workshops, ser. Lecture Notes in Computer Science, vol. 9915. Springer, 2016, pp. 443–450
work page 2016
-
[15]
Citi Bike, “Citi bike system data,” 2026, [Online]. Available: https:// citibikenyc.com/system-data. Accessed: May 19, 2026
work page 2026
-
[16]
L. Breiman, “Random forests,”Machine Learning, vol. 45, no. 1, pp. 5–32, 2001
work page 2001
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.