pith. sign in

arxiv: 2604.07912 · v1 · submitted 2026-04-09 · 💻 cs.CV · cs.RO

ParkSense: Where Should a Delivery Driver Park? Leveraging Idle AV Compute and Vision-Language Models

Pith reviewed 2026-05-10 18:17 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords ParkSenseDelivery-Aware Precision ParkingVision-Language ModelsAutonomous VehiclesLast-mile logisticsIdle compute repurposingFood delivery efficiencyParking spot identification
0
0 comments X

The pith

Autonomous vehicles can repurpose idle compute during stops to identify optimal legal parking spots for delivery drivers using vision-language models on cached imagery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a system called ParkSense that activates vision-language models on pre-cached satellite and street-view images while an autonomous vehicle is in low-risk states such as waiting at lights or in congestion. This targets the Delivery-Aware Precision Parking problem, where drivers need spots close to merchant entrances without violating rules. A sympathetic reader would care because food delivery time is heavily spent searching for parking, and the approach claims to convert otherwise wasted AV resources into actionable guidance. The work shows a 7B quantized model runs in 4-8 seconds on available hardware and projects yearly earnings gains of several thousand dollars per driver in the US. It closes by naming five open research directions at the meeting point of autonomous driving, computer vision, and logistics.

Core claim

ParkSense repurposes idle compute during low-risk AV states such as red-light queuing or parking-lot crawling to run a vision-language model on pre-cached satellite and street-view imagery, thereby identifying merchant entrances and legal parking zones. The authors formalize this as the Delivery-Aware Precision Parking (DAPP) problem, demonstrate that a quantized 7B VLM finishes inference in 4-8 seconds on HW4-class hardware, and estimate annual per-driver income gains of 3,000-8,000 USD in the United States.

What carries the argument

The ParkSense framework, which activates a vision-language model on pre-cached imagery only during low-risk AV states to solve the Delivery-Aware Precision Parking (DAPP) problem by locating entrances and legal zones.

If this is right

  • Delivery drivers would complete more orders per hour by reducing time spent circling for parking.
  • Autonomous vehicles could contribute to last-mile logistics efficiency using only existing onboard hardware.
  • The formal DAPP problem definition allows systematic optimization of routing that accounts for final parking constraints.
  • Quantized vision-language models become viable for on-vehicle use within the time windows of traffic stops or congestion.
  • Estimated annual earnings increases of 3,000-8,000 USD per driver create a direct economic incentive for adoption in the U.S. market.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Integration with existing delivery apps could pre-compute and push parking suggestions before the driver reaches the area.
  • The same idle-compute pattern might apply to other time-sensitive urban services such as rideshare drop-offs or parcel handoffs.
  • Widespread use would require new data-sharing agreements between map providers, AV fleets, and delivery platforms.
  • Accuracy would likely vary by city density and image recency, suggesting targeted updates to cached imagery as a practical next step.

Load-bearing premise

Pre-cached satellite and street-view imagery together with a vision-language model can reliably locate merchant entrances and legal parking zones without creating safety or regulatory problems when run during low-risk vehicle states.

What would settle it

A field test measuring the fraction of VLM outputs that correctly flag legal parking zones within a short walking distance of actual merchant entrances across varied urban blocks, while also checking whether inference ever overruns the duration of low-risk AV states.

read the original abstract

Finding parking consumes a disproportionate share of food delivery time, yet no system addresses precise parking-spot selection relative to merchant entrances. We propose ParkSense, a framework that repurposes idle compute during low-risk AV states -- queuing at red lights, traffic congestion, parking-lot crawl -- to run a Vision-Language Model (VLM) on pre-cached satellite and street view imagery, identifying entrances and legal parking zones. We formalize the Delivery-Aware Precision Parking (DAPP) problem, show that a quantized 7B VLM completes inference in 4-8 seconds on HW4-class hardware, and estimate annual per-driver income gains of 3,000-8,000 USD in the U.S. Five open research directions are identified at this unexplored intersection of autonomous driving, computer vision, and last-mile logistics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes ParkSense, a framework that repurposes idle AV compute during low-risk states (e.g., red-light queuing) to run a quantized 7B VLM on pre-cached satellite and street-view imagery, thereby identifying merchant entrances and legal parking zones. It formalizes the Delivery-Aware Precision Parking (DAPP) problem, states that inference completes in 4-8 seconds on HW4-class hardware, estimates annual per-driver income gains of $3,000–$8,000 in the U.S., and lists five open research directions at the intersection of autonomous driving, computer vision, and last-mile logistics.

Significance. If the VLM identification capability were empirically validated, the work would open a promising new direction by turning existing AV hardware into a practical tool for delivery efficiency, with clear economic upside. The formalization of DAPP and the explicit enumeration of open research questions are constructive contributions that could seed follow-on studies. At present, however, the absence of any supporting data or experiments keeps the significance prospective rather than demonstrated.

major comments (2)
  1. [Abstract] Abstract: the central claim that a VLM on pre-cached imagery can reliably identify merchant entrances and legal parking zones (thereby solving DAPP and supporting the $3k–$8k income estimates) is presented without any datasets, accuracy metrics, failure-case analysis, or comparison to baselines. This assumption is load-bearing for every quantitative claim in the paper.
  2. [Abstract] Abstract: the reported 4–8 s inference time for the quantized 7B VLM on HW4 hardware is stated without benchmark methodology, exact model variant, input resolution, or hardware configuration details, making the timing claim impossible to assess or reproduce.
minor comments (2)
  1. The economic projections would benefit from an explicit derivation or cited data sources rather than appearing as round-number estimates.
  2. A short related-work subsection would help readers situate the DAPP formalization against prior parking-assistance and last-mile logistics literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. ParkSense is a conceptual framework and problem formalization paper rather than an empirical study; the quantitative elements are estimates and the VLM component is presented as an open research direction. We address the two major comments below and will revise the manuscript accordingly to clarify scope and add missing details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that a VLM on pre-cached imagery can reliably identify merchant entrances and legal parking zones (thereby solving DAPP and supporting the $3k–$8k income estimates) is presented without any datasets, accuracy metrics, failure-case analysis, or comparison to baselines. This assumption is load-bearing for every quantitative claim in the paper.

    Authors: We agree that the manuscript contains no empirical validation of VLM performance on merchant-entrance or parking-zone detection. The work is explicitly positioned as a proposal that formalizes the Delivery-Aware Precision Parking (DAPP) problem, describes a system architecture, and enumerates five open research questions, one of which is precisely the empirical validation of VLM-based identification on cached imagery. The $3,000–$8,000 income figures are high-level projections derived from published statistics on delivery parking time waste; they are not measurements obtained from a deployed ParkSense system. We will revise the abstract, introduction, and conclusion to state more explicitly that all quantitative claims are prospective and contingent on future validation of the VLM component. revision: yes

  2. Referee: [Abstract] Abstract: the reported 4–8 s inference time for the quantized 7B VLM on HW4 hardware is stated without benchmark methodology, exact model variant, input resolution, or hardware configuration details, making the timing claim impossible to assess or reproduce.

    Authors: The referee correctly notes that the timing claim is presented without sufficient methodological detail. The 4–8 s range reflects preliminary internal measurements on representative HW4-class hardware using a 4-bit quantized 7B VLM (LLaVA-1.5 variant) at 336×336 vision-encoder resolution, but these specifics are not documented in the current manuscript. We will add a short subsection (or appendix) that reports the exact model checkpoint, quantization scheme, input preprocessing pipeline, hardware configuration, and measurement protocol (end-to-end latency including cached-image loading). revision: yes

Circularity Check

0 steps flagged

No circularity: conceptual proposal without derivations or self-referential predictions

full rationale

The manuscript is a high-level framework proposal that formalizes the DAPP problem, describes repurposing idle AV compute for VLM inference on cached imagery, reports a timing benchmark for a quantized 7B model, and states income estimates. No equations, fitted parameters, or derivation chains appear in the provided text. The income figures are presented as estimates rather than outputs of any model trained or fitted within the paper. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The work is self-contained as a conceptual contribution; absence of empirical validation is a separate correctness concern, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a high-level system proposal with no mathematical derivations, fitted parameters, or new postulated entities; income gains are presented as estimates without disclosed calculation details or data sources.

pith-pipeline@v0.9.0 · 5437 in / 1231 out tokens · 46165 ms · 2026-05-10T18:17:31.773820+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

  1. [1]

    Rodrigue, The Geography of Transport Systems, 5th ed., Routledge, 2020

    J.-P. Rodrigue, The Geography of Transport Systems, 5th ed., Routledge, 2020

  2. [2]

    From the last mile to the last 800 feet,

    P. Butrina et al., “From the last mile to the last 800 feet,” Trans. Res. Record, vol. 2609, 2017

  3. [3]

    Shoup, The High Cost of Free Parking, Planners Press, 2005

    D. Shoup, The High Cost of Free Parking, Planners Press, 2005

  4. [4]

    Why delivery drivers park illegally,

    CityLogistics, “Why delivery drivers park illegally,” citylogistics.info, 2023

  5. [5]

    Delivery firms’ big ticket item: Parking fines,

    Associated Press, “Delivery firms’ big ticket item: Parking fines,” NBC News, 2006

  6. [6]

    Delivery drivers’ parking behavior,

    A. Ranjbari et al., “Delivery drivers’ parking behavior,” Trans. Res. Record, 2023

  7. [7]

    Curb availability information reduces cruising,

    M. Xu et al., “Curb availability information reduces cruising,” Nature Sci. Rep., vol. 12, 2022

  8. [8]

    FSD V14.1: Parking at destination,

    NotATeslaApp, “FSD V14.1: Parking at destination,” notateslaapp.com, Oct. 2025

  9. [9]

    FSD V14.2 parking tests: Mixed results,

    Tesla Oracle, “FSD V14.2 parking tests: Mixed results,” teslaoracle.com, Nov. 2025

  10. [10]

    Nuro Driver,

    Nuro, “Nuro Driver,” nuro.ai, 2025

  11. [11]

    Geocoding API,

    Google, “Geocoding API,” developers.google.com/maps. Accessed Apr. 2026

  12. [12]

    RL for parking decisions in last-mile delivery,

    J. Muriel et al., “RL for parking decisions in last-mile delivery,” Trans. Res. Part C, 2024

  13. [13]

    Vehicle as a Service (VaaS),

    Q. Chen et al., “Vehicle as a Service (VaaS),” IEEE Commun. Surveys Tut., vol. 26, 2024

  14. [14]

    Last-meter delivery: Streets to doorsteps,

    R. Xiao, “Last-meter delivery: Streets to doorsteps,” MIT, 2024

  15. [15]

    Network structure and city size,

    D. Levinson, “Network structure and city size,” PLoS ONE, vol. 7, 2012

  16. [16]

    Predicting spatiotemporal legality of on-street parking,

    H. Ai et al., “Predicting spatiotemporal legality of on-street parking,” Annals of GIS, vol. 25, 2019

  17. [17]

    Road vehicles—Functional safety,

    ISO 26262, “Road vehicles—Functional safety,” 2018

  18. [18]

    Maps Platform pricing,

    Google, “Maps Platform pricing,” developers.google.com. Accessed Apr. 2026

  19. [19]

    Autonomy Day presentation,

    Tesla, “Autonomy Day presentation,” Apr. 2019

  20. [20]

    Third-party estimates (3–5× HW3); no official Tesla spec published

  21. [21]

    DRIVE Orin / Thor,

    NVIDIA, “DRIVE Orin / Thor,” developer.nvidia.com/drive

  22. [22]

    Jetson AGX Orin datasheet,

    NVIDIA, “Jetson AGX Orin datasheet,” developer.nvidia.com

  23. [23]

    NACTO, Urban Street Design Guide, 2013

  24. [24]

    Rise of gig work in the U.S.,

    A. Garin et al., “Rise of gig work in the U.S.,” NBER Working Paper, 2025