ParkSense: Where Should a Delivery Driver Park? Leveraging Idle AV Compute and Vision-Language Models

Die Hu; Henan Li

arxiv: 2604.07912 · v1 · submitted 2026-04-09 · 💻 cs.CV · cs.RO

ParkSense: Where Should a Delivery Driver Park? Leveraging Idle AV Compute and Vision-Language Models

Die Hu , Henan Li This is my paper

Pith reviewed 2026-05-10 18:17 UTC · model grok-4.3

classification 💻 cs.CV cs.RO

keywords ParkSenseDelivery-Aware Precision ParkingVision-Language ModelsAutonomous VehiclesLast-mile logisticsIdle compute repurposingFood delivery efficiencyParking spot identification

0 comments

The pith

Autonomous vehicles can repurpose idle compute during stops to identify optimal legal parking spots for delivery drivers using vision-language models on cached imagery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a system called ParkSense that activates vision-language models on pre-cached satellite and street-view images while an autonomous vehicle is in low-risk states such as waiting at lights or in congestion. This targets the Delivery-Aware Precision Parking problem, where drivers need spots close to merchant entrances without violating rules. A sympathetic reader would care because food delivery time is heavily spent searching for parking, and the approach claims to convert otherwise wasted AV resources into actionable guidance. The work shows a 7B quantized model runs in 4-8 seconds on available hardware and projects yearly earnings gains of several thousand dollars per driver in the US. It closes by naming five open research directions at the meeting point of autonomous driving, computer vision, and logistics.

Core claim

ParkSense repurposes idle compute during low-risk AV states such as red-light queuing or parking-lot crawling to run a vision-language model on pre-cached satellite and street-view imagery, thereby identifying merchant entrances and legal parking zones. The authors formalize this as the Delivery-Aware Precision Parking (DAPP) problem, demonstrate that a quantized 7B VLM finishes inference in 4-8 seconds on HW4-class hardware, and estimate annual per-driver income gains of 3,000-8,000 USD in the United States.

What carries the argument

The ParkSense framework, which activates a vision-language model on pre-cached imagery only during low-risk AV states to solve the Delivery-Aware Precision Parking (DAPP) problem by locating entrances and legal zones.

If this is right

Delivery drivers would complete more orders per hour by reducing time spent circling for parking.
Autonomous vehicles could contribute to last-mile logistics efficiency using only existing onboard hardware.
The formal DAPP problem definition allows systematic optimization of routing that accounts for final parking constraints.
Quantized vision-language models become viable for on-vehicle use within the time windows of traffic stops or congestion.
Estimated annual earnings increases of 3,000-8,000 USD per driver create a direct economic incentive for adoption in the U.S. market.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Integration with existing delivery apps could pre-compute and push parking suggestions before the driver reaches the area.
The same idle-compute pattern might apply to other time-sensitive urban services such as rideshare drop-offs or parcel handoffs.
Widespread use would require new data-sharing agreements between map providers, AV fleets, and delivery platforms.
Accuracy would likely vary by city density and image recency, suggesting targeted updates to cached imagery as a practical next step.

Load-bearing premise

Pre-cached satellite and street-view imagery together with a vision-language model can reliably locate merchant entrances and legal parking zones without creating safety or regulatory problems when run during low-risk vehicle states.

What would settle it

A field test measuring the fraction of VLM outputs that correctly flag legal parking zones within a short walking distance of actual merchant entrances across varied urban blocks, while also checking whether inference ever overruns the duration of low-risk AV states.

read the original abstract

Finding parking consumes a disproportionate share of food delivery time, yet no system addresses precise parking-spot selection relative to merchant entrances. We propose ParkSense, a framework that repurposes idle compute during low-risk AV states -- queuing at red lights, traffic congestion, parking-lot crawl -- to run a Vision-Language Model (VLM) on pre-cached satellite and street view imagery, identifying entrances and legal parking zones. We formalize the Delivery-Aware Precision Parking (DAPP) problem, show that a quantized 7B VLM completes inference in 4-8 seconds on HW4-class hardware, and estimate annual per-driver income gains of 3,000-8,000 USD in the U.S. Five open research directions are identified at this unexplored intersection of autonomous driving, computer vision, and last-mile logistics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ParkSense formalizes a practical delivery parking problem and ties it to idle AV compute with VLMs, but offers no experiments or data to support the core claims.

read the letter

The paper introduces the Delivery-Aware Precision Parking problem and a framework called ParkSense that uses idle compute in autonomous vehicles to run vision-language models on cached satellite and street view images. This aims to help delivery drivers find better parking spots near merchant entrances. What stands out is the practical angle. Parking eats up a lot of time in food delivery, and linking that to spare cycles in AVs during congestion or at lights is a straightforward connection between autonomous tech and gig economy work. The mention of 4-8 second inference on specific hardware gives a concrete sense of feasibility for the compute part. The main weakness is the absence of any testing. The proposal talks about VLM performance and income boosts of thousands of dollars, but there are no benchmarks, datasets, or error rates shown. Without that, it's impossible to tell if the VLM can actually pick out entrances reliably or if the imagery is current enough. The safety side of running this in low-risk states also gets no real discussion. The economic estimates in particular seem to rest on assumptions about time saved without any measurement of actual parking behavior or VLM success rates. This kind of work would appeal to researchers bridging computer vision, autonomous driving, and urban logistics. Someone looking for fresh applications of VLMs might find the problem setup helpful. I think it deserves peer review. The idea is new enough that referees could help shape it into something with experiments, even if the current version is light on evidence.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes ParkSense, a framework that repurposes idle AV compute during low-risk states (e.g., red-light queuing) to run a quantized 7B VLM on pre-cached satellite and street-view imagery, thereby identifying merchant entrances and legal parking zones. It formalizes the Delivery-Aware Precision Parking (DAPP) problem, states that inference completes in 4-8 seconds on HW4-class hardware, estimates annual per-driver income gains of $3,000–$8,000 in the U.S., and lists five open research directions at the intersection of autonomous driving, computer vision, and last-mile logistics.

Significance. If the VLM identification capability were empirically validated, the work would open a promising new direction by turning existing AV hardware into a practical tool for delivery efficiency, with clear economic upside. The formalization of DAPP and the explicit enumeration of open research questions are constructive contributions that could seed follow-on studies. At present, however, the absence of any supporting data or experiments keeps the significance prospective rather than demonstrated.

major comments (2)

[Abstract] Abstract: the central claim that a VLM on pre-cached imagery can reliably identify merchant entrances and legal parking zones (thereby solving DAPP and supporting the $3k–$8k income estimates) is presented without any datasets, accuracy metrics, failure-case analysis, or comparison to baselines. This assumption is load-bearing for every quantitative claim in the paper.
[Abstract] Abstract: the reported 4–8 s inference time for the quantized 7B VLM on HW4 hardware is stated without benchmark methodology, exact model variant, input resolution, or hardware configuration details, making the timing claim impossible to assess or reproduce.

minor comments (2)

The economic projections would benefit from an explicit derivation or cited data sources rather than appearing as round-number estimates.
A short related-work subsection would help readers situate the DAPP formalization against prior parking-assistance and last-mile logistics literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. ParkSense is a conceptual framework and problem formalization paper rather than an empirical study; the quantitative elements are estimates and the VLM component is presented as an open research direction. We address the two major comments below and will revise the manuscript accordingly to clarify scope and add missing details.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that a VLM on pre-cached imagery can reliably identify merchant entrances and legal parking zones (thereby solving DAPP and supporting the $3k–$8k income estimates) is presented without any datasets, accuracy metrics, failure-case analysis, or comparison to baselines. This assumption is load-bearing for every quantitative claim in the paper.

Authors: We agree that the manuscript contains no empirical validation of VLM performance on merchant-entrance or parking-zone detection. The work is explicitly positioned as a proposal that formalizes the Delivery-Aware Precision Parking (DAPP) problem, describes a system architecture, and enumerates five open research questions, one of which is precisely the empirical validation of VLM-based identification on cached imagery. The $3,000–$8,000 income figures are high-level projections derived from published statistics on delivery parking time waste; they are not measurements obtained from a deployed ParkSense system. We will revise the abstract, introduction, and conclusion to state more explicitly that all quantitative claims are prospective and contingent on future validation of the VLM component. revision: yes
Referee: [Abstract] Abstract: the reported 4–8 s inference time for the quantized 7B VLM on HW4 hardware is stated without benchmark methodology, exact model variant, input resolution, or hardware configuration details, making the timing claim impossible to assess or reproduce.

Authors: The referee correctly notes that the timing claim is presented without sufficient methodological detail. The 4–8 s range reflects preliminary internal measurements on representative HW4-class hardware using a 4-bit quantized 7B VLM (LLaVA-1.5 variant) at 336×336 vision-encoder resolution, but these specifics are not documented in the current manuscript. We will add a short subsection (or appendix) that reports the exact model checkpoint, quantization scheme, input preprocessing pipeline, hardware configuration, and measurement protocol (end-to-end latency including cached-image loading). revision: yes

Circularity Check

0 steps flagged

No circularity: conceptual proposal without derivations or self-referential predictions

full rationale

The manuscript is a high-level framework proposal that formalizes the DAPP problem, describes repurposing idle AV compute for VLM inference on cached imagery, reports a timing benchmark for a quantized 7B model, and states income estimates. No equations, fitted parameters, or derivation chains appear in the provided text. The income figures are presented as estimates rather than outputs of any model trained or fitted within the paper. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The work is self-contained as a conceptual contribution; absence of empirical validation is a separate correctness concern, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a high-level system proposal with no mathematical derivations, fitted parameters, or new postulated entities; income gains are presented as estimates without disclosed calculation details or data sources.

pith-pipeline@v0.9.0 · 5437 in / 1231 out tokens · 46165 ms · 2026-05-10T18:17:31.773820+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formalize the Delivery-Aware Precision Parking (DAPP) problem... quantized 7B VLM completes inference in 4-8 seconds on HW4-class hardware
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

min Cwalk + Cpark + Crisk subject to L(p,t) legal

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

Rodrigue, The Geography of Transport Systems, 5th ed., Routledge, 2020

J.-P. Rodrigue, The Geography of Transport Systems, 5th ed., Routledge, 2020

work page 2020
[2]

From the last mile to the last 800 feet,

P. Butrina et al., “From the last mile to the last 800 feet,” Trans. Res. Record, vol. 2609, 2017

work page 2017
[3]

Shoup, The High Cost of Free Parking, Planners Press, 2005

D. Shoup, The High Cost of Free Parking, Planners Press, 2005

work page 2005
[4]

Why delivery drivers park illegally,

CityLogistics, “Why delivery drivers park illegally,” citylogistics.info, 2023

work page 2023
[5]

Delivery firms’ big ticket item: Parking fines,

Associated Press, “Delivery firms’ big ticket item: Parking fines,” NBC News, 2006

work page 2006
[6]

Delivery drivers’ parking behavior,

A. Ranjbari et al., “Delivery drivers’ parking behavior,” Trans. Res. Record, 2023

work page 2023
[7]

Curb availability information reduces cruising,

M. Xu et al., “Curb availability information reduces cruising,” Nature Sci. Rep., vol. 12, 2022

work page 2022
[8]

FSD V14.1: Parking at destination,

NotATeslaApp, “FSD V14.1: Parking at destination,” notateslaapp.com, Oct. 2025

work page 2025
[9]

FSD V14.2 parking tests: Mixed results,

Tesla Oracle, “FSD V14.2 parking tests: Mixed results,” teslaoracle.com, Nov. 2025

work page 2025
[10]

Nuro Driver,

Nuro, “Nuro Driver,” nuro.ai, 2025

work page 2025
[11]

Geocoding API,

Google, “Geocoding API,” developers.google.com/maps. Accessed Apr. 2026

work page 2026
[12]

RL for parking decisions in last-mile delivery,

J. Muriel et al., “RL for parking decisions in last-mile delivery,” Trans. Res. Part C, 2024

work page 2024
[13]

Vehicle as a Service (VaaS),

Q. Chen et al., “Vehicle as a Service (VaaS),” IEEE Commun. Surveys Tut., vol. 26, 2024

work page 2024
[14]

Last-meter delivery: Streets to doorsteps,

R. Xiao, “Last-meter delivery: Streets to doorsteps,” MIT, 2024

work page 2024
[15]

Network structure and city size,

D. Levinson, “Network structure and city size,” PLoS ONE, vol. 7, 2012

work page 2012
[16]

Predicting spatiotemporal legality of on-street parking,

H. Ai et al., “Predicting spatiotemporal legality of on-street parking,” Annals of GIS, vol. 25, 2019

work page 2019
[17]

Road vehicles—Functional safety,

ISO 26262, “Road vehicles—Functional safety,” 2018

work page 2018
[18]

Maps Platform pricing,

Google, “Maps Platform pricing,” developers.google.com. Accessed Apr. 2026

work page 2026
[19]

Autonomy Day presentation,

Tesla, “Autonomy Day presentation,” Apr. 2019

work page 2019
[20]

Third-party estimates (3–5× HW3); no official Tesla spec published

work page
[21]

DRIVE Orin / Thor,

NVIDIA, “DRIVE Orin / Thor,” developer.nvidia.com/drive

work page
[22]

Jetson AGX Orin datasheet,

NVIDIA, “Jetson AGX Orin datasheet,” developer.nvidia.com

work page
[23]

NACTO, Urban Street Design Guide, 2013

work page 2013
[24]

Rise of gig work in the U.S.,

A. Garin et al., “Rise of gig work in the U.S.,” NBER Working Paper, 2025

work page 2025

[1] [1]

Rodrigue, The Geography of Transport Systems, 5th ed., Routledge, 2020

J.-P. Rodrigue, The Geography of Transport Systems, 5th ed., Routledge, 2020

work page 2020

[2] [2]

From the last mile to the last 800 feet,

P. Butrina et al., “From the last mile to the last 800 feet,” Trans. Res. Record, vol. 2609, 2017

work page 2017

[3] [3]

Shoup, The High Cost of Free Parking, Planners Press, 2005

D. Shoup, The High Cost of Free Parking, Planners Press, 2005

work page 2005

[4] [4]

Why delivery drivers park illegally,

CityLogistics, “Why delivery drivers park illegally,” citylogistics.info, 2023

work page 2023

[5] [5]

Delivery firms’ big ticket item: Parking fines,

Associated Press, “Delivery firms’ big ticket item: Parking fines,” NBC News, 2006

work page 2006

[6] [6]

Delivery drivers’ parking behavior,

A. Ranjbari et al., “Delivery drivers’ parking behavior,” Trans. Res. Record, 2023

work page 2023

[7] [7]

Curb availability information reduces cruising,

M. Xu et al., “Curb availability information reduces cruising,” Nature Sci. Rep., vol. 12, 2022

work page 2022

[8] [8]

FSD V14.1: Parking at destination,

NotATeslaApp, “FSD V14.1: Parking at destination,” notateslaapp.com, Oct. 2025

work page 2025

[9] [9]

FSD V14.2 parking tests: Mixed results,

Tesla Oracle, “FSD V14.2 parking tests: Mixed results,” teslaoracle.com, Nov. 2025

work page 2025

[10] [10]

Nuro Driver,

Nuro, “Nuro Driver,” nuro.ai, 2025

work page 2025

[11] [11]

Geocoding API,

Google, “Geocoding API,” developers.google.com/maps. Accessed Apr. 2026

work page 2026

[12] [12]

RL for parking decisions in last-mile delivery,

J. Muriel et al., “RL for parking decisions in last-mile delivery,” Trans. Res. Part C, 2024

work page 2024

[13] [13]

Vehicle as a Service (VaaS),

Q. Chen et al., “Vehicle as a Service (VaaS),” IEEE Commun. Surveys Tut., vol. 26, 2024

work page 2024

[14] [14]

Last-meter delivery: Streets to doorsteps,

R. Xiao, “Last-meter delivery: Streets to doorsteps,” MIT, 2024

work page 2024

[15] [15]

Network structure and city size,

D. Levinson, “Network structure and city size,” PLoS ONE, vol. 7, 2012

work page 2012

[16] [16]

Predicting spatiotemporal legality of on-street parking,

H. Ai et al., “Predicting spatiotemporal legality of on-street parking,” Annals of GIS, vol. 25, 2019

work page 2019

[17] [17]

Road vehicles—Functional safety,

ISO 26262, “Road vehicles—Functional safety,” 2018

work page 2018

[18] [18]

Maps Platform pricing,

Google, “Maps Platform pricing,” developers.google.com. Accessed Apr. 2026

work page 2026

[19] [19]

Autonomy Day presentation,

Tesla, “Autonomy Day presentation,” Apr. 2019

work page 2019

[20] [20]

Third-party estimates (3–5× HW3); no official Tesla spec published

work page

[21] [21]

DRIVE Orin / Thor,

NVIDIA, “DRIVE Orin / Thor,” developer.nvidia.com/drive

work page

[22] [22]

Jetson AGX Orin datasheet,

NVIDIA, “Jetson AGX Orin datasheet,” developer.nvidia.com

work page

[23] [23]

NACTO, Urban Street Design Guide, 2013

work page 2013

[24] [24]

Rise of gig work in the U.S.,

A. Garin et al., “Rise of gig work in the U.S.,” NBER Working Paper, 2025

work page 2025