MoveOD: Synthesizing Origin-Destination Commute Distribution from U.S. Census Data

Abhishek Dubey; Aron Laszka; Ayan Mukhopadhyay; Jose Paolo Talusan; Rishav Sen; Samitha Samaranayake

arxiv: 2510.18858 · v2 · submitted 2025-10-21 · 💻 cs.CY · cs.SI

MoveOD: Synthesizing Origin-Destination Commute Distribution from U.S. Census Data

Rishav Sen , Jose Paolo Talusan , Abhishek Dubey , Ayan Mukhopadhyay , Samitha Samaranayake , Aron Laszka This is my paper

Pith reviewed 2026-05-18 04:18 UTC · model grok-4.3

classification 💻 cs.CY cs.SI

keywords origin-destination flowscommute synthesisAmerican Community SurveyLODESopen datatransportation planningsynthetic datavehicle routing

0 comments

The pith

MOVEOD fuses public census and mapping data into fine-grained origin-destination commute tables for any U.S. county.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a pipeline that builds detailed tables of where people live and work, plus when they depart and how long their trips take. These tables are missing in most places because private location data is expensive or unavailable. The method pulls together American Community Survey departure times, LODES employment flows, county maps, OpenStreetMap roads, and building footprints. It then applies constrained sampling and integer programming to force the synthetic trips to match the reported totals for commuters, workplaces, and durations. If the resulting tables are close enough to real patterns, transportation planners and routing systems gain usable input data for any county without new surveys.

Core claim

MOVEOD is an end-to-end automated system that combines ACS departure-time and travel-time distributions, LODES residence-to-workplace flows, county geometries, OSM road networks, and building footprints into a single OD dataset. Constrained sampling and integer programming reconcile the data by matching commuter totals per origin zone, aligning workplace destinations with employment distributions, and calibrating travel durations to ACS-reported commute times, yielding roughly 150,000 synthetic trips for a test county in minutes.

What carries the argument

constrained sampling and integer-programming reconciliation that forces synthetic OD flows to match aggregate commuter counts, workplace destinations, and travel durations from ACS and LODES.

If this is right

Traffic simulation and signal optimization models can be run at county scale using the generated tables.
Classical and learning-based vehicle-routing algorithms receive realistic inputs for benchmarking.
Any user can produce data for a chosen county and year by supplying only that information to the open pipeline.
The same reconciliation steps support studies of congestion pricing or routing at local resolution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The pipeline logic could be adapted to other countries that release comparable census and employment statistics.
Generated trips could be checked against mobile-phone or app-based mobility traces now becoming available.
Adding non-work trips would let the method support full-day activity-based models rather than commute-only flows.

Load-bearing premise

Matching the reported aggregate totals for commuters, workplaces, and trip lengths produces individual trip patterns that are close enough to actual unobserved commuting behavior.

What would settle it

Comparison of the generated trip origins, destinations, departure times, and durations against a held-out household travel survey or GPS trace dataset collected inside the same county.

Figures

Figures reproduced from arXiv: 2510.18858 by Abhishek Dubey, Aron Laszka, Ayan Mukhopadhyay, Jose Paolo Talusan, Rishav Sen, Samitha Samaranayake.

**Figure 1.** Figure 1: MOVEOD preserves the conditional destination distribution for each origin census unit, ensuring that the synthetic workplace assignments match the empirical residence to work flow proportions reported in LODES. 00:00–04:59 05:00–05:29 05:30–05:59 06:00–06:29 06:30–06:59 07:00–07:29 07:30–07:59 08:00–08:29 08:30–08:59 09:00–09:59 10:00–10:59 11:00–11:59 12:00–15:59 16:00–23:59 0 5 10 15 Departure Time Bin … view at source ↗

**Figure 2.** Figure 2: MOVEOD calibrates the departure times for all origin census units to align with ACS departure times (B08302). b) Validating the marginals [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 4.** Figure 4: MOVEOD commute data for Tennessee. (a) Residential building locations and (b) workplace locations in Hamilton County. (c) Sample of origin–destination (OD) commuter trips within Hamilton County, where each arc is shaded green at the origin and red at the destination. (d) Sample of OD trips for Davidson County, which has more than twice the number of commuters as Hamilton County. travel-time bins), and O(Z … view at source ↗

read the original abstract

High-resolution origin-destination (OD) tables are essential for a wide spectrum of transportation applications, from modeling traffic and signal timing optimization to congestion pricing and vehicle routing. However, outside a handful of data rich cities, such data is rarely available. We introduce MOVEOD, an open-source pipeline that synthesizes public data into commuter OD flows with fine-grained spatial and temporal departure times for any county in the United States. MOVEOD combines five open data sources: American Community Survey (ACS) departure time and travel time distributions, Longitudinal Employer-Household Dynamics (LODES) residence-to-workplace flows, county geometries, road network information from OpenStreetMap (OSM), and building footprints from OSM and Microsoft, into a single OD dataset. We use a constrained sampling and integer-programming method to reconcile the OD dataset with data from ACS and LODES. Our approach involves: (1) matching commuter totals per origin zone, (2) aligning workplace destinations with employment distributions, and (3) calibrating travel durations to ACS-reported commute times. This ensures the OD data accurately reflects commuting patterns. We demonstrate the framework on Hamilton County, Tennessee, where we generate roughly 150,000 synthetic trips in minutes, which we feed into a benchmark suite of classical and learning-based vehicle-routing algorithms. The MOVEOD pipeline is an end-to-end automated system, enabling users to easily apply it across the United States by giving only a county and a year; and it can be adapted to other countries with comparable census datasets. The source code and a lightweight browser interface are publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MoveOD gives a practical open pipeline for generating synthetic fine-grained OD commute tables from public US data sources, but the validation stays at the aggregate level.

read the letter

The main point is that this paper describes an automated pipeline to create origin-destination commute tables with spatial and temporal detail for any US county, using only ACS departure and travel time distributions, LODES residence-to-work flows, OSM networks, and building footprints. It reconciles these via constrained sampling and integer programming so that origin totals, workplace destinations, and duration distributions line up with the input marginals. The Hamilton County run produces about 150,000 trips quickly and drops them into a vehicle-routing benchmark, which shows the output is immediately usable for downstream modeling work. The code and a simple browser interface are public, so the whole thing is reproducible from a county name and year alone. That combination is new enough in the cited literature and fills a real gap for counties without local survey data. The approach is straightforward and grounded in standard public sources, which is a strength for anyone who needs county-scale OD tables without starting from scratch. The soft spot is the validation. The method enforces the known aggregates, but the paper does not report held-out comparisons or checks on whether the specific origin-destination pairings and departure-time assignments reflect real joint patterns such as income-time correlations or spatial clustering of short versus long trips. The demonstration focuses on generation speed and routing use rather than micro-level fidelity. This leaves the realism of the disaggregated results as an assumption rather than a tested claim. The stress-test note on that point holds up. This work is aimed at transportation modelers, planners, and researchers who need synthetic OD data for traffic simulation or routing in data-scarce areas. A reader looking for a ready-to-run tool would find it useful even with the current validation limits. It deserves a serious referee because the pipeline is reproducible, addresses a documented need, and rests on public data rather than fitted parameters. I would send it to review and ask for additional validation experiments on the joint distributions.

Referee Report

2 major / 3 minor

Summary. The manuscript presents MOVEOD, an open-source pipeline that synthesizes high-resolution origin-destination (OD) commute distributions for any U.S. county by integrating American Community Survey (ACS) departure-time and travel-time distributions, Longitudinal Employer-Household Dynamics (LODES) residence-to-workplace flows, county geometries, OpenStreetMap (OSM) road networks, and building footprints. It applies constrained sampling followed by integer programming to reconcile origin-zone commuter totals, workplace destinations, and travel durations with the input marginals. The pipeline is demonstrated on Hamilton County, Tennessee, generating roughly 150,000 synthetic trips that are subsequently used to benchmark classical and learning-based vehicle-routing algorithms. The system is designed to require only a county and year as input and is claimed to be adaptable to other countries with comparable census data.

Significance. If the synthesized OD tables faithfully reproduce real commuting patterns beyond the enforced marginals, the work would provide a practical, reproducible resource for transportation modeling, traffic simulation, congestion pricing, and optimization in data-scarce regions. The end-to-end automation, public code release, and minimal user requirements constitute clear strengths that could enable widespread use. The approach builds on standard public datasets and constrained optimization, which is methodologically sound in principle. Significance is limited, however, by the absence of evidence that the disaggregated origin-destination-departure assignments capture higher-order structure present in actual commute behavior.

major comments (2)

[§3] §3 (reconciliation pipeline): The constrained sampling plus integer-programming procedure is described as matching origin commuter totals, aligning LODES workplace flows, and calibrating durations to ACS travel-time distributions, yet the manuscript provides no explicit integer-program formulation, feasibility proof, or post-solution verification that the joint distribution of building-level origins, LODES destinations, and sampled departure times remains consistent with all marginals simultaneously without iterative adjustments.
[§4] §4 (Hamilton County demonstration): Only generation runtime and downstream VRP benchmark performance are reported; no held-out validation, comparison against withheld ACS or LODES micro-data, or metrics for higher-order statistics (e.g., correlation between origin income proxy and departure time or spatial clustering of short versus long commutes) are presented, leaving the realism of the disaggregated trip patterns unverified.

minor comments (3)

[Abstract] Abstract: The claim that the pipeline 'ensures the OD data accurately reflects commuting patterns' should be qualified to indicate that only aggregate marginals are enforced; the phrasing overstates the current empirical support.
[Data sources] Data sources paragraph: Clarify whether 'county geometries' constitute a distinct fifth source or are derived from the ACS/LODES extracts, and specify the exact preprocessing steps applied to building footprints before sampling.
[Availability] Code and interface: While the source code and browser interface are stated to be publicly available, the manuscript should include a permanent DOI or GitHub release tag to facilitate reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the presentation and analysis.

read point-by-point responses

Referee: [§3] §3 (reconciliation pipeline): The constrained sampling plus integer-programming procedure is described as matching origin commuter totals, aligning LODES workplace flows, and calibrating durations to ACS travel-time distributions, yet the manuscript provides no explicit integer-program formulation, feasibility proof, or post-solution verification that the joint distribution of building-level origins, LODES destinations, and sampled departure times remains consistent with all marginals simultaneously without iterative adjustments.

Authors: We agree that the current description of the reconciliation procedure is high-level and lacks the explicit mathematical details requested. In the revised manuscript we will add a dedicated subsection in §3 that presents the full integer-programming formulation (objective and all constraints), a brief feasibility argument based on the consistency of the input marginals, and post-solution verification steps (including constraint satisfaction checks and a description of the single-pass procedure that avoids iterative adjustments). These additions will make the method fully reproducible from the text. revision: yes
Referee: [§4] §4 (Hamilton County demonstration): Only generation runtime and downstream VRP benchmark performance are reported; no held-out validation, comparison against withheld ACS or LODES micro-data, or metrics for higher-order statistics (e.g., correlation between origin income proxy and departure time or spatial clustering of short versus long commutes) are presented, leaving the realism of the disaggregated trip patterns unverified.

Authors: The referee correctly observes that the demonstration section emphasizes runtime and the VRP use case. Because fine-grained building-level ground-truth OD data with departure times are not publicly available, a true held-out micro-data validation is not possible. In the revision we will add (i) quantitative checks of selected higher-order statistics against available ACS aggregates (e.g., income-by-departure-time correlations and commute-distance distributions) and (ii) an explicit limitations paragraph discussing what aspects of realism cannot be verified with public data. We believe these additions will address the core concern while remaining honest about data limitations. revision: partial

Circularity Check

0 steps flagged

No significant circularity; pipeline reconciles external aggregates via standard optimization

full rationale

The described MOVEOD method ingests independent public datasets (ACS departure-time and travel-time distributions, LODES residence-to-workplace flows, OSM networks, and building footprints) and applies constrained sampling plus integer programming to enforce matching on commuter totals, workplace destinations, and duration distributions. No equation or procedure in the provided text defines a target quantity in terms of a parameter fitted from the same run, nor does any load-bearing step reduce to a self-citation or internal renaming. The derivation remains self-contained against the external census benchmarks it explicitly reconciles.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the representativeness of ACS and LODES aggregates and on the assumption that the chosen integer-programming constraints are sufficient to recover realistic joint distributions.

axioms (2)

domain assumption ACS departure-time and travel-time distributions are accurate marginals for the target county and year.
Invoked when calibrating synthetic trip durations and departure times to ACS-reported values.
domain assumption LODES residence-to-workplace flows provide reliable origin-destination marginals at the zone level.
Used to align workplace destinations with employment distributions.

pith-pipeline@v0.9.0 · 5844 in / 1286 out tokens · 30498 ms · 2026-05-18T04:18:49.193335+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We use a constrained sampling and integer-programming method to reconcile the OD dataset with data from ACS and LODES.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MOVEOD combines five open data sources... into a single OD dataset.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 1 internal anchor

[1]

J. M. Abowd, J. Haltiwanger, and J. Lane. Integrated longitudinal employer-employee data for the united states.American Economic Review, 94(2):224–229, 2004

work page 2004
[2]

Alonso-Mora, S

J. Alonso-Mora, S. Samaranayake, A. Wallar, E. Frazzoli, and D. Rus. On-demand high-capacity ride-sharing via dynamic trip-vehicle assign- ment.Proceedings of the National Academy of Sciences, 114(3):462– 467, 2017

work page 2017
[3]

B. M. Baker and M. Ayechew. A genetic algorithm for the vehicle routing problem.Computers & Operations Research, 30(5):787–800, 2003

work page 2003
[4]

Cipriani, S

E. Cipriani, S. Gori, and M. Petrelli. Transit network design: A procedure and an application to a large urban area.Transportation Research Part C: Emerging Technologies, 20(1):3–14, 2012

work page 2012
[5]

Crawford, D

F. Crawford, D. P. Watling, and R. D. Connors. Identifying road user classes based on repeated trip behaviour using bluetooth data. Transportation Research Part A: Policy and Practice, 113:55–74, 2018

work page 2018
[6]

Z. J. Czech and P. Czarnas. Parallel simulated annealing for the vehicle routing problem with time windows. In10th Euromicro Workshop on Parallel, Distributed and Network-based Processing, pages 376–383. IEEE, 2002

work page 2002
[7]

F. F. Dias, P. S. Lavieri, T. Kim, C. R. Bhat, and R. M. Pendyala. Fusing multiple sources of data to understand ride-hailing use.Transportation Research Record, 2673(6):214–224, 2019

work page 2019
[8]

W. Du, J. Ye, J. Gu, J. Li, H. Wei, and G. Wang. Safelight: A rein- forcement learning method toward collision-free traffic signal control. InAAAI Conference on Artificial Intelligence, volume 37, pages 14801– 14810, 2023

work page 2023
[9]

Dumez, F

D. Dumez, F. Lehu ´ed´e, and O. P ´eton. A large neighborhood search approach to the vehicle routing problem with delivery options.Trans- portation Research Part B: Methodological, 144:103–132, 2021

work page 2021
[10]

Eksioglu, A

B. Eksioglu, A. V . Vural, and A. Reisman. The vehicle routing problem: A taxonomic review.Computers & Industrial Engineering, 57(4):1472– 1483, 2009

work page 2009
[11]

Ferreira, J

N. Ferreira, J. Poco, H. T. V o, J. Freire, and C. T. Silva. Visual exploration of big spatio-temporal urban data: A study of new york city taxi trips.IEEE Transactions on Visualization and Computer Graphics, 19(12):2149–2158, 2013

work page 2013
[12]

Guihaire and J.-K

V . Guihaire and J.-K. Hao. Transit network design and scheduling: A global review.Transportation Research Part A: Policy and Practice, 42(10):1251–1273, 2008

work page 2008
[13]

D. Guo, X. Zhu, H. Jin, P. Gao, and C. Andris. Discovering spatial patterns in origin-destination mobility data.Transactions in GIS, 16(3):411–429, 2012

work page 2012
[14]

Helsgaun

K. Helsgaun. An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems. Roskilde: Roskilde University, 12:966–980, 2017

work page 2017
[15]

M. P. Heris, N. L. Foks, K. J. Bagstad, A. Troy, and Z. H. Ancona. A rasterized building footprint dataset for the united states.Scientific Data, 7(1):207, 2020

work page 2020
[16]

Huang, C

S. Huang, C. Zhang, J. Zhao, and Y . Han. Traffic origin–destination flow prediction considering individual travel frequency: A classification- based approach.Journal of Transportation Engineering, Part A: Systems, 151(2):04024106, 2025

work page 2025
[17]

Irimia, M

O. Irimia, M. Panaite-Lehadus, C. Tomozei, E. Mosnegutu, and G. Przy- datek. Origin-destination traffic survey—case study: Data analyse for bacau municipality.Sustainability, 15(6):4975, 2023

work page 2023
[18]

Jiang, D

R. Jiang, D. Yin, Z. Wang, Y . Wang, J. Deng, H. Liu, Z. Cai, J. Deng, X. Song, and R. Shibasaki. Dl-traff: Survey and benchmark of deep learning models for urban traffic prediction. In30th ACM International Conference on Information & Knowledge Management (CIKM), pages 4515–4525, 2021

work page 2021
[19]

Kashiyama, Y

T. Kashiyama, Y . Pang, and Y . Sekimoto. Open pflow: Creation and evaluation of an open dataset for typical people mass movement in urban areas.Transportation Research Part C: Emerging Technologies, 85:249– 267, 2017

work page 2017
[20]

Kitchin.The data revolution: Big data, open data, data infrastruc- tures and their consequences

R. Kitchin.The data revolution: Big data, open data, data infrastruc- tures and their consequences. Sage, 2014

work page 2014
[21]

W. Kool, H. van Hoof, and M. Welling. Attention, learn to solve routing problems! In7th International Conference on Learning Representations (ICLR), 2019

work page 2019
[22]

Y .-D. Kwon, J. Choo, B. Kim, I. Yoon, Y . Gwon, and S. Min. Pomo: Policy optimization with multiple optima for reinforcement learning. Advances in Neural Information Processing Systems, 33, 2020

work page 2020
[23]

Q. Li, Z. M. Peng, L. Feng, Z. Liu, C. Duan, W. Mo, and B. Zhou. Sce- narionet: Open-source platform for large-scale traffic scenario simulation and modeling.Advances in Neural Information Processing Systems, 36:3894–3920, 2023

work page 2023
[24]

S. Li, Z. Yan, and C. Wu. Learning to delegate for large-scale vehicle routing.Advances in Neural Information Processing Systems (NeurIPS), 34:26198–26211, 2021

work page 2021
[25]

Y . Li, R. Yu, C. Shahabi, and Y . Liu. Diffusion convolutional re- current neural network: Data-driven traffic forecasting.arXiv preprint arXiv:1707.01926, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[26]

X. Liu, Y . Xia, Y . Liang, J. Hu, Y . Wang, L. Bai, C. Huang, Z. Liu, B. Hooi, and R. Zimmermann. Largest: A benchmark dataset for large- scale traffic forecasting. InAdvances in Neural Information Processing Systems, 2023

work page 2023
[27]

Liyanage and H

S. Liyanage and H. Dia. An agent-based simulation approach for evaluating the performance of on-demand bus services.Sustainability, 12(10):4117, 2020

work page 2020
[28]

Mohammed and J

M. Mohammed and J. Oke. Origin-destination inference in public transportation systems: A comprehensive review.International Journal of Transportation Science and Technology, 12(1):315–328, 2023

work page 2023
[29]

Mooney, M

P. Mooney, M. Minghini, et al. A review of openstreetmap data.Mapping and the citizen sensor, pages 37–59, 2017

work page 2017
[30]

The National Academies Press, Washington, DC, 2007

National Research Council and National Academies of Sciences, Engi- neering, and Medicine.Using the American Community Survey: Benefits and Challenges. The National Academies Press, Washington, DC, 2007

work page 2007
[31]

S. Peer, J. Knockaert, P. Koster, and E. T. Verhoef. Over-reporting vs. overreacting: Commuters’ perceptions of travel times.Transportation Research Part A: Policy and Practice, 69:476–494, 2014

work page 2014
[32]

Perlman, K

D. Perlman, K. Tufte, L. Flint, and T. Reel. Emerging data science for transit: Market scan and feasibility analysis. Technical Report FTA Report No. 0218, U.S. Department of Transportation, Federal Transit Administration, Washington, D.C., June 2022. Prepared by the John A. V olpe National Transportation Systems Center

work page 2022
[33]

Pichpibul and R

T. Pichpibul and R. Kawtummachai. A heuristic approach based on clarke-wright algorithm for open vehicle routing problem.Scientific World Journal, 2013(1):874349, 2013

work page 2013
[34]

Randall, A

M. Randall, A. Kheiri, and A. N. Letchford. Insertion heuristics for a class of dynamic vehicle routing problems, 2022

work page 2022
[35]

A. E. Rizzoli, R. Montemanni, E. Lucibello, and L. M. Gambardella. Ant colony optimization for real-world vehicle routing problems: from theory to applications.Swarm Intelligence, 1:135–151, 2007

work page 2007
[36]

R. Sen, T. Tran, S. Khaleghian, P. Pugliese, M. Sartipi, H. Neema, and A. Dubey. Bte-sim: Fast simulation environment for public transporta- tion. In2022 IEEE International Conference on Big Data (Big Data), pages 2886–2894, 2022

work page 2022
[37]

P. R. Stopher and S. P. Greaves. Household travel surveys: Where are we going?Transportation Research Part A: Policy and Practice, 41(5):367– 381, 2007

work page 2007
[38]

Thakuriah, N

P. Thakuriah, N. Y . Tilahun, and M. Zellner.Introduction to seeing cities through big data: Research, methods and applications in urban informatics. Springer, 2017

work page 2017
[39]

Toth and D

P. Toth and D. Vigo.Vehicle routing: problems, methods, and applica- tions. SIAM, 2014

work page 2014
[40]

Longitudinal Employer-Household Dy- namics - Origin-Destination Employment Statistics, 2017

United States Census Bureau. Longitudinal Employer-Household Dy- namics - Origin-Destination Employment Statistics, 2017

work page 2017
[41]

Uppoor, O

S. Uppoor, O. Trullols-Cruces, M. Fiore, and J. M. Barcelo-Ordinas. Generation and analysis of a large-scale urban vehicular mobility dataset. IEEE Transactions on Mobile Computing, 13(5):1061–1075, 2013

work page 2013
[42]

W. Wang, W. Jiang, B. Zhang, Q. Zhu, and C. Liao. A real network environment dataset for traffic analysis.Scientific Data, 12(1):1–12, 2025

work page 2025
[43]

B. Xu, Y . Wang, Z. Wang, H. Jia, and Z. Lu. Hierarchically and cooperatively learning traffic signal control. InAAAI Conference on Artificial Intelligence, volume 35, pages 669–677, 2021

work page 2021
[44]

X. Xu, Z. Zheng, Z. Hu, K. Feng, and W. Ma. A unified dataset for the city-scale traffic assignment model in 20 us cities.Scientific Data, 11(1):325, 2024

work page 2024

[1] [1]

J. M. Abowd, J. Haltiwanger, and J. Lane. Integrated longitudinal employer-employee data for the united states.American Economic Review, 94(2):224–229, 2004

work page 2004

[2] [2]

Alonso-Mora, S

J. Alonso-Mora, S. Samaranayake, A. Wallar, E. Frazzoli, and D. Rus. On-demand high-capacity ride-sharing via dynamic trip-vehicle assign- ment.Proceedings of the National Academy of Sciences, 114(3):462– 467, 2017

work page 2017

[3] [3]

B. M. Baker and M. Ayechew. A genetic algorithm for the vehicle routing problem.Computers & Operations Research, 30(5):787–800, 2003

work page 2003

[4] [4]

Cipriani, S

E. Cipriani, S. Gori, and M. Petrelli. Transit network design: A procedure and an application to a large urban area.Transportation Research Part C: Emerging Technologies, 20(1):3–14, 2012

work page 2012

[5] [5]

Crawford, D

F. Crawford, D. P. Watling, and R. D. Connors. Identifying road user classes based on repeated trip behaviour using bluetooth data. Transportation Research Part A: Policy and Practice, 113:55–74, 2018

work page 2018

[6] [6]

Z. J. Czech and P. Czarnas. Parallel simulated annealing for the vehicle routing problem with time windows. In10th Euromicro Workshop on Parallel, Distributed and Network-based Processing, pages 376–383. IEEE, 2002

work page 2002

[7] [7]

F. F. Dias, P. S. Lavieri, T. Kim, C. R. Bhat, and R. M. Pendyala. Fusing multiple sources of data to understand ride-hailing use.Transportation Research Record, 2673(6):214–224, 2019

work page 2019

[8] [8]

W. Du, J. Ye, J. Gu, J. Li, H. Wei, and G. Wang. Safelight: A rein- forcement learning method toward collision-free traffic signal control. InAAAI Conference on Artificial Intelligence, volume 37, pages 14801– 14810, 2023

work page 2023

[9] [9]

Dumez, F

D. Dumez, F. Lehu ´ed´e, and O. P ´eton. A large neighborhood search approach to the vehicle routing problem with delivery options.Trans- portation Research Part B: Methodological, 144:103–132, 2021

work page 2021

[10] [10]

Eksioglu, A

B. Eksioglu, A. V . Vural, and A. Reisman. The vehicle routing problem: A taxonomic review.Computers & Industrial Engineering, 57(4):1472– 1483, 2009

work page 2009

[11] [11]

Ferreira, J

N. Ferreira, J. Poco, H. T. V o, J. Freire, and C. T. Silva. Visual exploration of big spatio-temporal urban data: A study of new york city taxi trips.IEEE Transactions on Visualization and Computer Graphics, 19(12):2149–2158, 2013

work page 2013

[12] [12]

Guihaire and J.-K

V . Guihaire and J.-K. Hao. Transit network design and scheduling: A global review.Transportation Research Part A: Policy and Practice, 42(10):1251–1273, 2008

work page 2008

[13] [13]

D. Guo, X. Zhu, H. Jin, P. Gao, and C. Andris. Discovering spatial patterns in origin-destination mobility data.Transactions in GIS, 16(3):411–429, 2012

work page 2012

[14] [14]

Helsgaun

K. Helsgaun. An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems. Roskilde: Roskilde University, 12:966–980, 2017

work page 2017

[15] [15]

M. P. Heris, N. L. Foks, K. J. Bagstad, A. Troy, and Z. H. Ancona. A rasterized building footprint dataset for the united states.Scientific Data, 7(1):207, 2020

work page 2020

[16] [16]

Huang, C

S. Huang, C. Zhang, J. Zhao, and Y . Han. Traffic origin–destination flow prediction considering individual travel frequency: A classification- based approach.Journal of Transportation Engineering, Part A: Systems, 151(2):04024106, 2025

work page 2025

[17] [17]

Irimia, M

O. Irimia, M. Panaite-Lehadus, C. Tomozei, E. Mosnegutu, and G. Przy- datek. Origin-destination traffic survey—case study: Data analyse for bacau municipality.Sustainability, 15(6):4975, 2023

work page 2023

[18] [18]

Jiang, D

R. Jiang, D. Yin, Z. Wang, Y . Wang, J. Deng, H. Liu, Z. Cai, J. Deng, X. Song, and R. Shibasaki. Dl-traff: Survey and benchmark of deep learning models for urban traffic prediction. In30th ACM International Conference on Information & Knowledge Management (CIKM), pages 4515–4525, 2021

work page 2021

[19] [19]

Kashiyama, Y

T. Kashiyama, Y . Pang, and Y . Sekimoto. Open pflow: Creation and evaluation of an open dataset for typical people mass movement in urban areas.Transportation Research Part C: Emerging Technologies, 85:249– 267, 2017

work page 2017

[20] [20]

Kitchin.The data revolution: Big data, open data, data infrastruc- tures and their consequences

R. Kitchin.The data revolution: Big data, open data, data infrastruc- tures and their consequences. Sage, 2014

work page 2014

[21] [21]

W. Kool, H. van Hoof, and M. Welling. Attention, learn to solve routing problems! In7th International Conference on Learning Representations (ICLR), 2019

work page 2019

[22] [22]

Y .-D. Kwon, J. Choo, B. Kim, I. Yoon, Y . Gwon, and S. Min. Pomo: Policy optimization with multiple optima for reinforcement learning. Advances in Neural Information Processing Systems, 33, 2020

work page 2020

[23] [23]

Q. Li, Z. M. Peng, L. Feng, Z. Liu, C. Duan, W. Mo, and B. Zhou. Sce- narionet: Open-source platform for large-scale traffic scenario simulation and modeling.Advances in Neural Information Processing Systems, 36:3894–3920, 2023

work page 2023

[24] [24]

S. Li, Z. Yan, and C. Wu. Learning to delegate for large-scale vehicle routing.Advances in Neural Information Processing Systems (NeurIPS), 34:26198–26211, 2021

work page 2021

[25] [25]

Y . Li, R. Yu, C. Shahabi, and Y . Liu. Diffusion convolutional re- current neural network: Data-driven traffic forecasting.arXiv preprint arXiv:1707.01926, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[26] [26]

X. Liu, Y . Xia, Y . Liang, J. Hu, Y . Wang, L. Bai, C. Huang, Z. Liu, B. Hooi, and R. Zimmermann. Largest: A benchmark dataset for large- scale traffic forecasting. InAdvances in Neural Information Processing Systems, 2023

work page 2023

[27] [27]

Liyanage and H

S. Liyanage and H. Dia. An agent-based simulation approach for evaluating the performance of on-demand bus services.Sustainability, 12(10):4117, 2020

work page 2020

[28] [28]

Mohammed and J

M. Mohammed and J. Oke. Origin-destination inference in public transportation systems: A comprehensive review.International Journal of Transportation Science and Technology, 12(1):315–328, 2023

work page 2023

[29] [29]

Mooney, M

P. Mooney, M. Minghini, et al. A review of openstreetmap data.Mapping and the citizen sensor, pages 37–59, 2017

work page 2017

[30] [30]

The National Academies Press, Washington, DC, 2007

National Research Council and National Academies of Sciences, Engi- neering, and Medicine.Using the American Community Survey: Benefits and Challenges. The National Academies Press, Washington, DC, 2007

work page 2007

[31] [31]

S. Peer, J. Knockaert, P. Koster, and E. T. Verhoef. Over-reporting vs. overreacting: Commuters’ perceptions of travel times.Transportation Research Part A: Policy and Practice, 69:476–494, 2014

work page 2014

[32] [32]

Perlman, K

D. Perlman, K. Tufte, L. Flint, and T. Reel. Emerging data science for transit: Market scan and feasibility analysis. Technical Report FTA Report No. 0218, U.S. Department of Transportation, Federal Transit Administration, Washington, D.C., June 2022. Prepared by the John A. V olpe National Transportation Systems Center

work page 2022

[33] [33]

Pichpibul and R

T. Pichpibul and R. Kawtummachai. A heuristic approach based on clarke-wright algorithm for open vehicle routing problem.Scientific World Journal, 2013(1):874349, 2013

work page 2013

[34] [34]

Randall, A

M. Randall, A. Kheiri, and A. N. Letchford. Insertion heuristics for a class of dynamic vehicle routing problems, 2022

work page 2022

[35] [35]

A. E. Rizzoli, R. Montemanni, E. Lucibello, and L. M. Gambardella. Ant colony optimization for real-world vehicle routing problems: from theory to applications.Swarm Intelligence, 1:135–151, 2007

work page 2007

[36] [36]

R. Sen, T. Tran, S. Khaleghian, P. Pugliese, M. Sartipi, H. Neema, and A. Dubey. Bte-sim: Fast simulation environment for public transporta- tion. In2022 IEEE International Conference on Big Data (Big Data), pages 2886–2894, 2022

work page 2022

[37] [37]

P. R. Stopher and S. P. Greaves. Household travel surveys: Where are we going?Transportation Research Part A: Policy and Practice, 41(5):367– 381, 2007

work page 2007

[38] [38]

Thakuriah, N

P. Thakuriah, N. Y . Tilahun, and M. Zellner.Introduction to seeing cities through big data: Research, methods and applications in urban informatics. Springer, 2017

work page 2017

[39] [39]

Toth and D

P. Toth and D. Vigo.Vehicle routing: problems, methods, and applica- tions. SIAM, 2014

work page 2014

[40] [40]

Longitudinal Employer-Household Dy- namics - Origin-Destination Employment Statistics, 2017

United States Census Bureau. Longitudinal Employer-Household Dy- namics - Origin-Destination Employment Statistics, 2017

work page 2017

[41] [41]

Uppoor, O

S. Uppoor, O. Trullols-Cruces, M. Fiore, and J. M. Barcelo-Ordinas. Generation and analysis of a large-scale urban vehicular mobility dataset. IEEE Transactions on Mobile Computing, 13(5):1061–1075, 2013

work page 2013

[42] [42]

W. Wang, W. Jiang, B. Zhang, Q. Zhu, and C. Liao. A real network environment dataset for traffic analysis.Scientific Data, 12(1):1–12, 2025

work page 2025

[43] [43]

B. Xu, Y . Wang, Z. Wang, H. Jia, and Z. Lu. Hierarchically and cooperatively learning traffic signal control. InAAAI Conference on Artificial Intelligence, volume 35, pages 669–677, 2021

work page 2021

[44] [44]

X. Xu, Z. Zheng, Z. Hu, K. Feng, and W. Ma. A unified dataset for the city-scale traffic assignment model in 20 us cities.Scientific Data, 11(1):325, 2024

work page 2024