Critical Transit Infrastructure in Smart Cities and Urban Air Quality: A Multi-City Seasonal Comparison of Ridership and PM2.5

Sean Elliott; Sohini Roy

arxiv: 2601.19937 · v2 · submitted 2026-01-16 · ⚛️ physics.soc-ph · cs.CY

Critical Transit Infrastructure in Smart Cities and Urban Air Quality: A Multi-City Seasonal Comparison of Ridership and PM2.5

Sean Elliott , Sohini Roy This is my paper

Pith reviewed 2026-05-16 14:07 UTC · model grok-4.3

classification ⚛️ physics.soc-ph cs.CY

keywords public transitPM2.5urban air qualitysmart citiesridershipseasonal shiftsmulti-city comparison

0 comments

The pith

Transit ridership and PM2.5 relationships are not uniform across cities or seasons but shaped by local baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a transparent dataset integrating public transit ridership reports with EPA air quality measurements for four U.S. cities across two seasons. It finds large differences in how much transit each city uses and how pollution levels change with the seasons. These patterns are not the same everywhere, and simple models show that city-specific factors explain much of the apparent connection between riding transit and breathing cleaner air. A sympathetic reader would care because this offers a practical way to monitor urban infrastructure use and health risks together in smart city systems.

Core claim

By integrating agency-reported transit ridership with ambient fine particulate matter PM2.5 from the U.S. EPA for New York City, Chicago, Las Vegas, and Phoenix using two seasonal snapshots, the study reveals pronounced structural differences in transit scale and intensity, with consistent seasonal shifts in both ridership and PM2.5 that vary by urban context. A set of lightweight regression specifications indicates that apparent mobility-PM2.5 relationships are not uniform across cities or seasons and are strongly shaped by baseline city effects. This positions integrated mobility and environment monitoring as a practical smart-city capability.

What carries the argument

The harmonized multi-source monitoring dataset that integrates ridership feeds with PM2.5 readings, using monthly system totals and per-capita metrics for cross-city comparability.

If this is right

Apparent mobility-PM2.5 relationships vary by city and season.
Baseline city effects strongly shape these relationships.
Integrated monitoring provides a scalable framework for tracking infrastructure utilization and air-quality indicators.
This supports sustainable communities and public-health-aware urban resilience.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be extended to additional cities or time periods for broader policy insights.
City-specific baselines imply that uniform national policies on transit and air quality may need local tailoring.
Per-capita metrics highlight potential differences in exposure across urban populations.

Load-bearing premise

Agency-reported ridership feeds and EPA PM2.5 readings can be harmonized into comparable monthly totals without introducing material bias from differing collection methods or missing data across the four cities.

What would settle it

Re-collecting or independently verifying the ridership and PM2.5 data for one of the cities and finding that the seasonal shifts or cross-city differences disappear would falsify the claim of non-uniform relationships shaped by city effects.

read the original abstract

Public transit is a critical component of urban mobility and equity, yet mobility and air-quality linkages are rarely operationalized in reproducible smart-city analytics workflows. This study develops a transparent, multi-source monitoring dataset that integrates agency-reported transit ridership with ambient fine particulate matter PM2.5 from the U.S. EPA Air Quality System (AQS) for four U.S. metropolitan areas - New York City, Chicago, Las Vegas, and Phoenix, using two seasonal snapshots (March and October 2024). We harmonize heterogeneous ridership feeds (daily and stop-level) to monthly system totals and pair them with monthly mean PM2.5 , reporting both absolute and per-capita metrics to enable cross-city comparability. Results show pronounced structural differences in transit scale and intensity, with consistent seasonal shifts in both ridership and PM2.5 that vary by urban context. A set of lightweight regression specifications is used as a descriptive sensitivity analysis, indicating that apparent mobility-PM2.5 relationships are not uniform across cities or seasons and are strongly shaped by baseline city effects. Overall, the paper positions integrated mobility and environment monitoring as a practical smart-city capability, offering a scalable framework for tracking infrastructure utilization alongside exposure-relevant air-quality indicators to support sustainable communities and public-health-aware urban resilience.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a new harmonized ridership-PM2.5 dataset for four cities across two seasons with clear descriptive differences, but the unvalidated data aggregation step undercuts how much we can trust the non-uniformity claims.

read the letter

The paper's main contribution is a harmonized dataset pairing transit ridership with PM2.5 levels for New York, Chicago, Las Vegas, and Phoenix in March and October 2024. It shows clear differences in scale and seasonal patterns that depend on the city. This kind of paired data isn't common in the literature, so having it for these specific places and times is new. What works is the straightforward use of public sources like agency reports and EPA data to create comparable metrics, including per-capita versions. The descriptive regressions highlight that mobility-air quality links aren't uniform, which aligns with what you'd expect from different urban setups. It's a practical template for smart-city workflows. The soft spot is the data harmonization. The abstract mentions turning daily and stop-level feeds into monthly totals, but there's no check on how missing data or different collection methods might skew the numbers across cities or seasons. If coverage varies systematically, the reported shifts could be partly from processing rather than real changes. The regressions are presented as sensitivity checks only, so they don't carry heavy weight, and without error bars or full specs it's hard to gauge precision. No causal mechanisms are claimed, which is good because the data doesn't support that. This is for people building smart-city monitoring tools who need examples of integrating mobility and environment data. It deserves a serious referee to look at the methods details and data quality, even if the core idea is incremental rather than revolutionary. Send it for peer review with requests for more validation on the harmonization and perhaps additional robustness checks.

Referee Report

1 major / 1 minor

Summary. The manuscript develops a transparent dataset integrating agency-reported transit ridership with EPA PM2.5 data for four U.S. cities (New York City, Chicago, Las Vegas, Phoenix) across two seasonal periods (March and October 2024). It harmonizes ridership to monthly system totals and per-capita metrics, documents structural differences and seasonal shifts in ridership and PM2.5, and employs lightweight regressions as descriptive sensitivity analyses to demonstrate that mobility-PM2.5 relationships are non-uniform and dominated by city effects.

Significance. If the harmonization procedures prove robust, the work offers a reproducible, scalable approach to monitoring transit infrastructure utilization alongside air-quality indicators in smart cities. This could support evidence-based urban planning for sustainable mobility and public health, emphasizing context-specific rather than universal mobility-environment linkages.

major comments (1)

[Section 2] Section 2: The harmonization of heterogeneous ridership feeds (daily vs. stop-level) into monthly system totals and per-capita metrics is described without quantitative assessment of coverage gaps, missing-data rates, imputation rules, or cross-validation against external sources such as NTD or MTA open data. Because the central claim of non-uniform relationships and city-effect dominance rests on the comparability of these aggregated metrics across cities and seasons, this gap is load-bearing.

minor comments (1)

[Abstract] Abstract and methods: The regression specifications are described only as 'lightweight' and 'descriptive sensitivity analysis' without listing the exact functional forms, dependent variables, or controls; adding one sentence or a small table would improve reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The single major comment identifies a genuine gap in the quantitative documentation of our data harmonization procedures. We address this point directly below and will revise the manuscript to incorporate the requested assessments.

read point-by-point responses

Referee: [Section 2] Section 2: The harmonization of heterogeneous ridership feeds (daily vs. stop-level) into monthly system totals and per-capita metrics is described without quantitative assessment of coverage gaps, missing-data rates, imputation rules, or cross-validation against external sources such as NTD or MTA open data. Because the central claim of non-uniform relationships and city-effect dominance rests on the comparability of these aggregated metrics across cities and seasons, this gap is load-bearing.

Authors: We agree that the original manuscript provided insufficient quantitative detail on the harmonization process, and that this information is necessary to substantiate the cross-city comparability underlying our descriptive findings. In the revised manuscript we will add a new subsection 2.3 (Data Coverage, Imputation, and Validation) that reports: (i) missing-data rates for each city's ridership feed (daily and stop-level), (ii) the specific imputation rules applied (linear interpolation for gaps shorter than three consecutive days, with longer gaps flagged and excluded from monthly totals), (iii) coverage-gap percentages expressed relative to total system ridership, and (iv) cross-validation of the resulting monthly aggregates against the National Transit Database (NTD) 2024 annual reports and, where accessible, MTA open-data portals. These additions will be presented in a table and accompanying text so that readers can directly evaluate the robustness of the per-capita and system-total metrics used in the subsequent regressions. We expect these changes to strengthen rather than alter the central claim that mobility-PM2.5 relationships are non-uniform and dominated by city effects. revision: yes

Circularity Check

0 steps flagged

No significant circularity; observational integration of external public datasets

full rationale

The paper integrates agency-reported ridership feeds and EPA AQS PM2.5 readings for four cities across two seasonal snapshots, harmonizes them to monthly totals and per-capita metrics, and applies lightweight descriptive regressions. No equations, fitted parameters, or predictions are defined in terms of the paper's own outputs. Central claims rest on empirical patterns in independent external data sources rather than any self-referential reduction. No self-citations are load-bearing for the core results, and the analysis contains no ansatz, uniqueness theorems, or renamings that collapse to inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Central claim rests on the domain assumption that heterogeneous agency ridership feeds can be reliably converted to comparable monthly system totals and that monthly mean PM2.5 adequately represents exposure-relevant air quality; one free parameter class appears in the lightweight regressions used for sensitivity checks.

free parameters (1)

regression coefficients in sensitivity models
Lightweight regression specifications fit coefficients to describe apparent mobility-PM2.5 relationships and city baseline effects.

axioms (1)

domain assumption Agency-reported ridership data accurately reflects actual system usage after harmonization to monthly totals.
Invoked in the data integration step to enable cross-city comparability.

pith-pipeline@v0.9.0 · 5536 in / 1307 out tokens · 111456 ms · 2026-05-16T14:07:46.314102+00:00 · methodology

Critical Transit Infrastructure in Smart Cities and Urban Air Quality: A Multi-City Seasonal Comparison of Ridership and PM2.5

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)