pith. sign in

arxiv: 2510.18858 · v2 · submitted 2025-10-21 · 💻 cs.CY · cs.SI

MoveOD: Synthesizing Origin-Destination Commute Distribution from U.S. Census Data

Pith reviewed 2026-05-18 04:18 UTC · model grok-4.3

classification 💻 cs.CY cs.SI
keywords origin-destination flowscommute synthesisAmerican Community SurveyLODESopen datatransportation planningsynthetic datavehicle routing
0
0 comments X

The pith

MOVEOD fuses public census and mapping data into fine-grained origin-destination commute tables for any U.S. county.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a pipeline that builds detailed tables of where people live and work, plus when they depart and how long their trips take. These tables are missing in most places because private location data is expensive or unavailable. The method pulls together American Community Survey departure times, LODES employment flows, county maps, OpenStreetMap roads, and building footprints. It then applies constrained sampling and integer programming to force the synthetic trips to match the reported totals for commuters, workplaces, and durations. If the resulting tables are close enough to real patterns, transportation planners and routing systems gain usable input data for any county without new surveys.

Core claim

MOVEOD is an end-to-end automated system that combines ACS departure-time and travel-time distributions, LODES residence-to-workplace flows, county geometries, OSM road networks, and building footprints into a single OD dataset. Constrained sampling and integer programming reconcile the data by matching commuter totals per origin zone, aligning workplace destinations with employment distributions, and calibrating travel durations to ACS-reported commute times, yielding roughly 150,000 synthetic trips for a test county in minutes.

What carries the argument

constrained sampling and integer-programming reconciliation that forces synthetic OD flows to match aggregate commuter counts, workplace destinations, and travel durations from ACS and LODES.

If this is right

  • Traffic simulation and signal optimization models can be run at county scale using the generated tables.
  • Classical and learning-based vehicle-routing algorithms receive realistic inputs for benchmarking.
  • Any user can produce data for a chosen county and year by supplying only that information to the open pipeline.
  • The same reconciliation steps support studies of congestion pricing or routing at local resolution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The pipeline logic could be adapted to other countries that release comparable census and employment statistics.
  • Generated trips could be checked against mobile-phone or app-based mobility traces now becoming available.
  • Adding non-work trips would let the method support full-day activity-based models rather than commute-only flows.

Load-bearing premise

Matching the reported aggregate totals for commuters, workplaces, and trip lengths produces individual trip patterns that are close enough to actual unobserved commuting behavior.

What would settle it

Comparison of the generated trip origins, destinations, departure times, and durations against a held-out household travel survey or GPS trace dataset collected inside the same county.

Figures

Figures reproduced from arXiv: 2510.18858 by Abhishek Dubey, Aron Laszka, Ayan Mukhopadhyay, Jose Paolo Talusan, Rishav Sen, Samitha Samaranayake.

Figure 1
Figure 1. Figure 1: MOVEOD preserves the conditional destination distri￾bution for each origin census unit, ensuring that the synthetic workplace assignments match the empirical residence to work flow proportions reported in LODES. 00:00–04:59 05:00–05:29 05:30–05:59 06:00–06:29 06:30–06:59 07:00–07:29 07:30–07:59 08:00–08:29 08:30–08:59 09:00–09:59 10:00–10:59 11:00–11:59 12:00–15:59 16:00–23:59 0 5 10 15 Departure Time Bin … view at source ↗
Figure 2
Figure 2. Figure 2: MOVEOD calibrates the departure times for all origin census units to align with ACS departure times (B08302). b) Validating the marginals [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: MOVEOD commute data for Tennessee. (a) Residential building locations and (b) workplace locations in Hamilton County. (c) Sample of origin–destination (OD) commuter trips within Hamilton County, where each arc is shaded green at the origin and red at the destination. (d) Sample of OD trips for Davidson County, which has more than twice the number of commuters as Hamilton County. travel-time bins), and O(Z … view at source ↗
read the original abstract

High-resolution origin-destination (OD) tables are essential for a wide spectrum of transportation applications, from modeling traffic and signal timing optimization to congestion pricing and vehicle routing. However, outside a handful of data rich cities, such data is rarely available. We introduce MOVEOD, an open-source pipeline that synthesizes public data into commuter OD flows with fine-grained spatial and temporal departure times for any county in the United States. MOVEOD combines five open data sources: American Community Survey (ACS) departure time and travel time distributions, Longitudinal Employer-Household Dynamics (LODES) residence-to-workplace flows, county geometries, road network information from OpenStreetMap (OSM), and building footprints from OSM and Microsoft, into a single OD dataset. We use a constrained sampling and integer-programming method to reconcile the OD dataset with data from ACS and LODES. Our approach involves: (1) matching commuter totals per origin zone, (2) aligning workplace destinations with employment distributions, and (3) calibrating travel durations to ACS-reported commute times. This ensures the OD data accurately reflects commuting patterns. We demonstrate the framework on Hamilton County, Tennessee, where we generate roughly 150,000 synthetic trips in minutes, which we feed into a benchmark suite of classical and learning-based vehicle-routing algorithms. The MOVEOD pipeline is an end-to-end automated system, enabling users to easily apply it across the United States by giving only a county and a year; and it can be adapted to other countries with comparable census datasets. The source code and a lightweight browser interface are publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript presents MOVEOD, an open-source pipeline that synthesizes high-resolution origin-destination (OD) commute distributions for any U.S. county by integrating American Community Survey (ACS) departure-time and travel-time distributions, Longitudinal Employer-Household Dynamics (LODES) residence-to-workplace flows, county geometries, OpenStreetMap (OSM) road networks, and building footprints. It applies constrained sampling followed by integer programming to reconcile origin-zone commuter totals, workplace destinations, and travel durations with the input marginals. The pipeline is demonstrated on Hamilton County, Tennessee, generating roughly 150,000 synthetic trips that are subsequently used to benchmark classical and learning-based vehicle-routing algorithms. The system is designed to require only a county and year as input and is claimed to be adaptable to other countries with comparable census data.

Significance. If the synthesized OD tables faithfully reproduce real commuting patterns beyond the enforced marginals, the work would provide a practical, reproducible resource for transportation modeling, traffic simulation, congestion pricing, and optimization in data-scarce regions. The end-to-end automation, public code release, and minimal user requirements constitute clear strengths that could enable widespread use. The approach builds on standard public datasets and constrained optimization, which is methodologically sound in principle. Significance is limited, however, by the absence of evidence that the disaggregated origin-destination-departure assignments capture higher-order structure present in actual commute behavior.

major comments (2)
  1. [§3] §3 (reconciliation pipeline): The constrained sampling plus integer-programming procedure is described as matching origin commuter totals, aligning LODES workplace flows, and calibrating durations to ACS travel-time distributions, yet the manuscript provides no explicit integer-program formulation, feasibility proof, or post-solution verification that the joint distribution of building-level origins, LODES destinations, and sampled departure times remains consistent with all marginals simultaneously without iterative adjustments.
  2. [§4] §4 (Hamilton County demonstration): Only generation runtime and downstream VRP benchmark performance are reported; no held-out validation, comparison against withheld ACS or LODES micro-data, or metrics for higher-order statistics (e.g., correlation between origin income proxy and departure time or spatial clustering of short versus long commutes) are presented, leaving the realism of the disaggregated trip patterns unverified.
minor comments (3)
  1. [Abstract] Abstract: The claim that the pipeline 'ensures the OD data accurately reflects commuting patterns' should be qualified to indicate that only aggregate marginals are enforced; the phrasing overstates the current empirical support.
  2. [Data sources] Data sources paragraph: Clarify whether 'county geometries' constitute a distinct fifth source or are derived from the ACS/LODES extracts, and specify the exact preprocessing steps applied to building footprints before sampling.
  3. [Availability] Code and interface: While the source code and browser interface are stated to be publicly available, the manuscript should include a permanent DOI or GitHub release tag to facilitate reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the presentation and analysis.

read point-by-point responses
  1. Referee: [§3] §3 (reconciliation pipeline): The constrained sampling plus integer-programming procedure is described as matching origin commuter totals, aligning LODES workplace flows, and calibrating durations to ACS travel-time distributions, yet the manuscript provides no explicit integer-program formulation, feasibility proof, or post-solution verification that the joint distribution of building-level origins, LODES destinations, and sampled departure times remains consistent with all marginals simultaneously without iterative adjustments.

    Authors: We agree that the current description of the reconciliation procedure is high-level and lacks the explicit mathematical details requested. In the revised manuscript we will add a dedicated subsection in §3 that presents the full integer-programming formulation (objective and all constraints), a brief feasibility argument based on the consistency of the input marginals, and post-solution verification steps (including constraint satisfaction checks and a description of the single-pass procedure that avoids iterative adjustments). These additions will make the method fully reproducible from the text. revision: yes

  2. Referee: [§4] §4 (Hamilton County demonstration): Only generation runtime and downstream VRP benchmark performance are reported; no held-out validation, comparison against withheld ACS or LODES micro-data, or metrics for higher-order statistics (e.g., correlation between origin income proxy and departure time or spatial clustering of short versus long commutes) are presented, leaving the realism of the disaggregated trip patterns unverified.

    Authors: The referee correctly observes that the demonstration section emphasizes runtime and the VRP use case. Because fine-grained building-level ground-truth OD data with departure times are not publicly available, a true held-out micro-data validation is not possible. In the revision we will add (i) quantitative checks of selected higher-order statistics against available ACS aggregates (e.g., income-by-departure-time correlations and commute-distance distributions) and (ii) an explicit limitations paragraph discussing what aspects of realism cannot be verified with public data. We believe these additions will address the core concern while remaining honest about data limitations. revision: partial

Circularity Check

0 steps flagged

No significant circularity; pipeline reconciles external aggregates via standard optimization

full rationale

The described MOVEOD method ingests independent public datasets (ACS departure-time and travel-time distributions, LODES residence-to-workplace flows, OSM networks, and building footprints) and applies constrained sampling plus integer programming to enforce matching on commuter totals, workplace destinations, and duration distributions. No equation or procedure in the provided text defines a target quantity in terms of a parameter fitted from the same run, nor does any load-bearing step reduce to a self-citation or internal renaming. The derivation remains self-contained against the external census benchmarks it explicitly reconciles.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the representativeness of ACS and LODES aggregates and on the assumption that the chosen integer-programming constraints are sufficient to recover realistic joint distributions.

axioms (2)
  • domain assumption ACS departure-time and travel-time distributions are accurate marginals for the target county and year.
    Invoked when calibrating synthetic trip durations and departure times to ACS-reported values.
  • domain assumption LODES residence-to-workplace flows provide reliable origin-destination marginals at the zone level.
    Used to align workplace destinations with employment distributions.

pith-pipeline@v0.9.0 · 5844 in / 1286 out tokens · 30498 ms · 2026-05-18T04:18:49.193335+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 1 internal anchor

  1. [1]

    J. M. Abowd, J. Haltiwanger, and J. Lane. Integrated longitudinal employer-employee data for the united states.American Economic Review, 94(2):224–229, 2004

  2. [2]

    Alonso-Mora, S

    J. Alonso-Mora, S. Samaranayake, A. Wallar, E. Frazzoli, and D. Rus. On-demand high-capacity ride-sharing via dynamic trip-vehicle assign- ment.Proceedings of the National Academy of Sciences, 114(3):462– 467, 2017

  3. [3]

    B. M. Baker and M. Ayechew. A genetic algorithm for the vehicle routing problem.Computers & Operations Research, 30(5):787–800, 2003

  4. [4]

    Cipriani, S

    E. Cipriani, S. Gori, and M. Petrelli. Transit network design: A procedure and an application to a large urban area.Transportation Research Part C: Emerging Technologies, 20(1):3–14, 2012

  5. [5]

    Crawford, D

    F. Crawford, D. P. Watling, and R. D. Connors. Identifying road user classes based on repeated trip behaviour using bluetooth data. Transportation Research Part A: Policy and Practice, 113:55–74, 2018

  6. [6]

    Z. J. Czech and P. Czarnas. Parallel simulated annealing for the vehicle routing problem with time windows. In10th Euromicro Workshop on Parallel, Distributed and Network-based Processing, pages 376–383. IEEE, 2002

  7. [7]

    F. F. Dias, P. S. Lavieri, T. Kim, C. R. Bhat, and R. M. Pendyala. Fusing multiple sources of data to understand ride-hailing use.Transportation Research Record, 2673(6):214–224, 2019

  8. [8]

    W. Du, J. Ye, J. Gu, J. Li, H. Wei, and G. Wang. Safelight: A rein- forcement learning method toward collision-free traffic signal control. InAAAI Conference on Artificial Intelligence, volume 37, pages 14801– 14810, 2023

  9. [9]

    Dumez, F

    D. Dumez, F. Lehu ´ed´e, and O. P ´eton. A large neighborhood search approach to the vehicle routing problem with delivery options.Trans- portation Research Part B: Methodological, 144:103–132, 2021

  10. [10]

    Eksioglu, A

    B. Eksioglu, A. V . Vural, and A. Reisman. The vehicle routing problem: A taxonomic review.Computers & Industrial Engineering, 57(4):1472– 1483, 2009

  11. [11]

    Ferreira, J

    N. Ferreira, J. Poco, H. T. V o, J. Freire, and C. T. Silva. Visual exploration of big spatio-temporal urban data: A study of new york city taxi trips.IEEE Transactions on Visualization and Computer Graphics, 19(12):2149–2158, 2013

  12. [12]

    Guihaire and J.-K

    V . Guihaire and J.-K. Hao. Transit network design and scheduling: A global review.Transportation Research Part A: Policy and Practice, 42(10):1251–1273, 2008

  13. [13]

    D. Guo, X. Zhu, H. Jin, P. Gao, and C. Andris. Discovering spatial patterns in origin-destination mobility data.Transactions in GIS, 16(3):411–429, 2012

  14. [14]

    Helsgaun

    K. Helsgaun. An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems. Roskilde: Roskilde University, 12:966–980, 2017

  15. [15]

    M. P. Heris, N. L. Foks, K. J. Bagstad, A. Troy, and Z. H. Ancona. A rasterized building footprint dataset for the united states.Scientific Data, 7(1):207, 2020

  16. [16]

    Huang, C

    S. Huang, C. Zhang, J. Zhao, and Y . Han. Traffic origin–destination flow prediction considering individual travel frequency: A classification- based approach.Journal of Transportation Engineering, Part A: Systems, 151(2):04024106, 2025

  17. [17]

    Irimia, M

    O. Irimia, M. Panaite-Lehadus, C. Tomozei, E. Mosnegutu, and G. Przy- datek. Origin-destination traffic survey—case study: Data analyse for bacau municipality.Sustainability, 15(6):4975, 2023

  18. [18]

    Jiang, D

    R. Jiang, D. Yin, Z. Wang, Y . Wang, J. Deng, H. Liu, Z. Cai, J. Deng, X. Song, and R. Shibasaki. Dl-traff: Survey and benchmark of deep learning models for urban traffic prediction. In30th ACM International Conference on Information & Knowledge Management (CIKM), pages 4515–4525, 2021

  19. [19]

    Kashiyama, Y

    T. Kashiyama, Y . Pang, and Y . Sekimoto. Open pflow: Creation and evaluation of an open dataset for typical people mass movement in urban areas.Transportation Research Part C: Emerging Technologies, 85:249– 267, 2017

  20. [20]

    Kitchin.The data revolution: Big data, open data, data infrastruc- tures and their consequences

    R. Kitchin.The data revolution: Big data, open data, data infrastruc- tures and their consequences. Sage, 2014

  21. [21]

    W. Kool, H. van Hoof, and M. Welling. Attention, learn to solve routing problems! In7th International Conference on Learning Representations (ICLR), 2019

  22. [22]

    Y .-D. Kwon, J. Choo, B. Kim, I. Yoon, Y . Gwon, and S. Min. Pomo: Policy optimization with multiple optima for reinforcement learning. Advances in Neural Information Processing Systems, 33, 2020

  23. [23]

    Q. Li, Z. M. Peng, L. Feng, Z. Liu, C. Duan, W. Mo, and B. Zhou. Sce- narionet: Open-source platform for large-scale traffic scenario simulation and modeling.Advances in Neural Information Processing Systems, 36:3894–3920, 2023

  24. [24]

    S. Li, Z. Yan, and C. Wu. Learning to delegate for large-scale vehicle routing.Advances in Neural Information Processing Systems (NeurIPS), 34:26198–26211, 2021

  25. [25]

    Y . Li, R. Yu, C. Shahabi, and Y . Liu. Diffusion convolutional re- current neural network: Data-driven traffic forecasting.arXiv preprint arXiv:1707.01926, 2017

  26. [26]

    X. Liu, Y . Xia, Y . Liang, J. Hu, Y . Wang, L. Bai, C. Huang, Z. Liu, B. Hooi, and R. Zimmermann. Largest: A benchmark dataset for large- scale traffic forecasting. InAdvances in Neural Information Processing Systems, 2023

  27. [27]

    Liyanage and H

    S. Liyanage and H. Dia. An agent-based simulation approach for evaluating the performance of on-demand bus services.Sustainability, 12(10):4117, 2020

  28. [28]

    Mohammed and J

    M. Mohammed and J. Oke. Origin-destination inference in public transportation systems: A comprehensive review.International Journal of Transportation Science and Technology, 12(1):315–328, 2023

  29. [29]

    Mooney, M

    P. Mooney, M. Minghini, et al. A review of openstreetmap data.Mapping and the citizen sensor, pages 37–59, 2017

  30. [30]

    The National Academies Press, Washington, DC, 2007

    National Research Council and National Academies of Sciences, Engi- neering, and Medicine.Using the American Community Survey: Benefits and Challenges. The National Academies Press, Washington, DC, 2007

  31. [31]

    S. Peer, J. Knockaert, P. Koster, and E. T. Verhoef. Over-reporting vs. overreacting: Commuters’ perceptions of travel times.Transportation Research Part A: Policy and Practice, 69:476–494, 2014

  32. [32]

    Perlman, K

    D. Perlman, K. Tufte, L. Flint, and T. Reel. Emerging data science for transit: Market scan and feasibility analysis. Technical Report FTA Report No. 0218, U.S. Department of Transportation, Federal Transit Administration, Washington, D.C., June 2022. Prepared by the John A. V olpe National Transportation Systems Center

  33. [33]

    Pichpibul and R

    T. Pichpibul and R. Kawtummachai. A heuristic approach based on clarke-wright algorithm for open vehicle routing problem.Scientific World Journal, 2013(1):874349, 2013

  34. [34]

    Randall, A

    M. Randall, A. Kheiri, and A. N. Letchford. Insertion heuristics for a class of dynamic vehicle routing problems, 2022

  35. [35]

    A. E. Rizzoli, R. Montemanni, E. Lucibello, and L. M. Gambardella. Ant colony optimization for real-world vehicle routing problems: from theory to applications.Swarm Intelligence, 1:135–151, 2007

  36. [36]

    R. Sen, T. Tran, S. Khaleghian, P. Pugliese, M. Sartipi, H. Neema, and A. Dubey. Bte-sim: Fast simulation environment for public transporta- tion. In2022 IEEE International Conference on Big Data (Big Data), pages 2886–2894, 2022

  37. [37]

    P. R. Stopher and S. P. Greaves. Household travel surveys: Where are we going?Transportation Research Part A: Policy and Practice, 41(5):367– 381, 2007

  38. [38]

    Thakuriah, N

    P. Thakuriah, N. Y . Tilahun, and M. Zellner.Introduction to seeing cities through big data: Research, methods and applications in urban informatics. Springer, 2017

  39. [39]

    Toth and D

    P. Toth and D. Vigo.Vehicle routing: problems, methods, and applica- tions. SIAM, 2014

  40. [40]

    Longitudinal Employer-Household Dy- namics - Origin-Destination Employment Statistics, 2017

    United States Census Bureau. Longitudinal Employer-Household Dy- namics - Origin-Destination Employment Statistics, 2017

  41. [41]

    Uppoor, O

    S. Uppoor, O. Trullols-Cruces, M. Fiore, and J. M. Barcelo-Ordinas. Generation and analysis of a large-scale urban vehicular mobility dataset. IEEE Transactions on Mobile Computing, 13(5):1061–1075, 2013

  42. [42]

    W. Wang, W. Jiang, B. Zhang, Q. Zhu, and C. Liao. A real network environment dataset for traffic analysis.Scientific Data, 12(1):1–12, 2025

  43. [43]

    B. Xu, Y . Wang, Z. Wang, H. Jia, and Z. Lu. Hierarchically and cooperatively learning traffic signal control. InAAAI Conference on Artificial Intelligence, volume 35, pages 669–677, 2021

  44. [44]

    X. Xu, Z. Zheng, Z. Hu, K. Feng, and W. Ma. A unified dataset for the city-scale traffic assignment model in 20 us cities.Scientific Data, 11(1):325, 2024