ParkSense: Where Should a Delivery Driver Park? Leveraging Idle AV Compute and Vision-Language Models
Pith reviewed 2026-05-10 18:17 UTC · model grok-4.3
The pith
Autonomous vehicles can repurpose idle compute during stops to identify optimal legal parking spots for delivery drivers using vision-language models on cached imagery.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ParkSense repurposes idle compute during low-risk AV states such as red-light queuing or parking-lot crawling to run a vision-language model on pre-cached satellite and street-view imagery, thereby identifying merchant entrances and legal parking zones. The authors formalize this as the Delivery-Aware Precision Parking (DAPP) problem, demonstrate that a quantized 7B VLM finishes inference in 4-8 seconds on HW4-class hardware, and estimate annual per-driver income gains of 3,000-8,000 USD in the United States.
What carries the argument
The ParkSense framework, which activates a vision-language model on pre-cached imagery only during low-risk AV states to solve the Delivery-Aware Precision Parking (DAPP) problem by locating entrances and legal zones.
If this is right
- Delivery drivers would complete more orders per hour by reducing time spent circling for parking.
- Autonomous vehicles could contribute to last-mile logistics efficiency using only existing onboard hardware.
- The formal DAPP problem definition allows systematic optimization of routing that accounts for final parking constraints.
- Quantized vision-language models become viable for on-vehicle use within the time windows of traffic stops or congestion.
- Estimated annual earnings increases of 3,000-8,000 USD per driver create a direct economic incentive for adoption in the U.S. market.
Where Pith is reading between the lines
- Integration with existing delivery apps could pre-compute and push parking suggestions before the driver reaches the area.
- The same idle-compute pattern might apply to other time-sensitive urban services such as rideshare drop-offs or parcel handoffs.
- Widespread use would require new data-sharing agreements between map providers, AV fleets, and delivery platforms.
- Accuracy would likely vary by city density and image recency, suggesting targeted updates to cached imagery as a practical next step.
Load-bearing premise
Pre-cached satellite and street-view imagery together with a vision-language model can reliably locate merchant entrances and legal parking zones without creating safety or regulatory problems when run during low-risk vehicle states.
What would settle it
A field test measuring the fraction of VLM outputs that correctly flag legal parking zones within a short walking distance of actual merchant entrances across varied urban blocks, while also checking whether inference ever overruns the duration of low-risk AV states.
read the original abstract
Finding parking consumes a disproportionate share of food delivery time, yet no system addresses precise parking-spot selection relative to merchant entrances. We propose ParkSense, a framework that repurposes idle compute during low-risk AV states -- queuing at red lights, traffic congestion, parking-lot crawl -- to run a Vision-Language Model (VLM) on pre-cached satellite and street view imagery, identifying entrances and legal parking zones. We formalize the Delivery-Aware Precision Parking (DAPP) problem, show that a quantized 7B VLM completes inference in 4-8 seconds on HW4-class hardware, and estimate annual per-driver income gains of 3,000-8,000 USD in the U.S. Five open research directions are identified at this unexplored intersection of autonomous driving, computer vision, and last-mile logistics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes ParkSense, a framework that repurposes idle AV compute during low-risk states (e.g., red-light queuing) to run a quantized 7B VLM on pre-cached satellite and street-view imagery, thereby identifying merchant entrances and legal parking zones. It formalizes the Delivery-Aware Precision Parking (DAPP) problem, states that inference completes in 4-8 seconds on HW4-class hardware, estimates annual per-driver income gains of $3,000–$8,000 in the U.S., and lists five open research directions at the intersection of autonomous driving, computer vision, and last-mile logistics.
Significance. If the VLM identification capability were empirically validated, the work would open a promising new direction by turning existing AV hardware into a practical tool for delivery efficiency, with clear economic upside. The formalization of DAPP and the explicit enumeration of open research questions are constructive contributions that could seed follow-on studies. At present, however, the absence of any supporting data or experiments keeps the significance prospective rather than demonstrated.
major comments (2)
- [Abstract] Abstract: the central claim that a VLM on pre-cached imagery can reliably identify merchant entrances and legal parking zones (thereby solving DAPP and supporting the $3k–$8k income estimates) is presented without any datasets, accuracy metrics, failure-case analysis, or comparison to baselines. This assumption is load-bearing for every quantitative claim in the paper.
- [Abstract] Abstract: the reported 4–8 s inference time for the quantized 7B VLM on HW4 hardware is stated without benchmark methodology, exact model variant, input resolution, or hardware configuration details, making the timing claim impossible to assess or reproduce.
minor comments (2)
- The economic projections would benefit from an explicit derivation or cited data sources rather than appearing as round-number estimates.
- A short related-work subsection would help readers situate the DAPP formalization against prior parking-assistance and last-mile logistics literature.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. ParkSense is a conceptual framework and problem formalization paper rather than an empirical study; the quantitative elements are estimates and the VLM component is presented as an open research direction. We address the two major comments below and will revise the manuscript accordingly to clarify scope and add missing details.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that a VLM on pre-cached imagery can reliably identify merchant entrances and legal parking zones (thereby solving DAPP and supporting the $3k–$8k income estimates) is presented without any datasets, accuracy metrics, failure-case analysis, or comparison to baselines. This assumption is load-bearing for every quantitative claim in the paper.
Authors: We agree that the manuscript contains no empirical validation of VLM performance on merchant-entrance or parking-zone detection. The work is explicitly positioned as a proposal that formalizes the Delivery-Aware Precision Parking (DAPP) problem, describes a system architecture, and enumerates five open research questions, one of which is precisely the empirical validation of VLM-based identification on cached imagery. The $3,000–$8,000 income figures are high-level projections derived from published statistics on delivery parking time waste; they are not measurements obtained from a deployed ParkSense system. We will revise the abstract, introduction, and conclusion to state more explicitly that all quantitative claims are prospective and contingent on future validation of the VLM component. revision: yes
-
Referee: [Abstract] Abstract: the reported 4–8 s inference time for the quantized 7B VLM on HW4 hardware is stated without benchmark methodology, exact model variant, input resolution, or hardware configuration details, making the timing claim impossible to assess or reproduce.
Authors: The referee correctly notes that the timing claim is presented without sufficient methodological detail. The 4–8 s range reflects preliminary internal measurements on representative HW4-class hardware using a 4-bit quantized 7B VLM (LLaVA-1.5 variant) at 336×336 vision-encoder resolution, but these specifics are not documented in the current manuscript. We will add a short subsection (or appendix) that reports the exact model checkpoint, quantization scheme, input preprocessing pipeline, hardware configuration, and measurement protocol (end-to-end latency including cached-image loading). revision: yes
Circularity Check
No circularity: conceptual proposal without derivations or self-referential predictions
full rationale
The manuscript is a high-level framework proposal that formalizes the DAPP problem, describes repurposing idle AV compute for VLM inference on cached imagery, reports a timing benchmark for a quantized 7B model, and states income estimates. No equations, fitted parameters, or derivation chains appear in the provided text. The income figures are presented as estimates rather than outputs of any model trained or fitted within the paper. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The work is self-contained as a conceptual contribution; absence of empirical validation is a separate correctness concern, not circularity.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We formalize the Delivery-Aware Precision Parking (DAPP) problem... quantized 7B VLM completes inference in 4-8 seconds on HW4-class hardware
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
min Cwalk + Cpark + Crisk subject to L(p,t) legal
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Rodrigue, The Geography of Transport Systems, 5th ed., Routledge, 2020
J.-P. Rodrigue, The Geography of Transport Systems, 5th ed., Routledge, 2020
work page 2020
-
[2]
From the last mile to the last 800 feet,
P. Butrina et al., “From the last mile to the last 800 feet,” Trans. Res. Record, vol. 2609, 2017
work page 2017
-
[3]
Shoup, The High Cost of Free Parking, Planners Press, 2005
D. Shoup, The High Cost of Free Parking, Planners Press, 2005
work page 2005
-
[4]
Why delivery drivers park illegally,
CityLogistics, “Why delivery drivers park illegally,” citylogistics.info, 2023
work page 2023
-
[5]
Delivery firms’ big ticket item: Parking fines,
Associated Press, “Delivery firms’ big ticket item: Parking fines,” NBC News, 2006
work page 2006
-
[6]
Delivery drivers’ parking behavior,
A. Ranjbari et al., “Delivery drivers’ parking behavior,” Trans. Res. Record, 2023
work page 2023
-
[7]
Curb availability information reduces cruising,
M. Xu et al., “Curb availability information reduces cruising,” Nature Sci. Rep., vol. 12, 2022
work page 2022
-
[8]
FSD V14.1: Parking at destination,
NotATeslaApp, “FSD V14.1: Parking at destination,” notateslaapp.com, Oct. 2025
work page 2025
-
[9]
FSD V14.2 parking tests: Mixed results,
Tesla Oracle, “FSD V14.2 parking tests: Mixed results,” teslaoracle.com, Nov. 2025
work page 2025
- [10]
-
[11]
Google, “Geocoding API,” developers.google.com/maps. Accessed Apr. 2026
work page 2026
-
[12]
RL for parking decisions in last-mile delivery,
J. Muriel et al., “RL for parking decisions in last-mile delivery,” Trans. Res. Part C, 2024
work page 2024
-
[13]
Q. Chen et al., “Vehicle as a Service (VaaS),” IEEE Commun. Surveys Tut., vol. 26, 2024
work page 2024
-
[14]
Last-meter delivery: Streets to doorsteps,
R. Xiao, “Last-meter delivery: Streets to doorsteps,” MIT, 2024
work page 2024
-
[15]
Network structure and city size,
D. Levinson, “Network structure and city size,” PLoS ONE, vol. 7, 2012
work page 2012
-
[16]
Predicting spatiotemporal legality of on-street parking,
H. Ai et al., “Predicting spatiotemporal legality of on-street parking,” Annals of GIS, vol. 25, 2019
work page 2019
- [17]
-
[18]
Google, “Maps Platform pricing,” developers.google.com. Accessed Apr. 2026
work page 2026
- [19]
-
[20]
Third-party estimates (3–5× HW3); no official Tesla spec published
- [21]
- [22]
-
[23]
NACTO, Urban Street Design Guide, 2013
work page 2013
-
[24]
A. Garin et al., “Rise of gig work in the U.S.,” NBER Working Paper, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.