pith. sign in

arxiv: 2605.05941 · v1 · submitted 2026-05-07 · 💻 cs.CV

RAWild: Sensor-Agnostic RAW Object Detection via Physics-Guided Curve and Grid Modeling

Pith reviewed 2026-05-08 14:19 UTC · model grok-4.3

classification 💻 cs.CV
keywords acrossdetectionobjectsensorsensor-agnosticdepthsframeworkgeneralization
0
0 comments X

The pith

RAWild achieves sensor-agnostic RAW object detection by using physics-guided global-local tone mapping driven by RAW priors plus a simulation pipeline, delivering SOTA results across heterogeneous sensors and bit depths.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

RAW images from cameras contain more detailed light information than the processed JPEGs most systems use, but each sensor type produces different raw numbers because of its exposure settings, color filters, and bit depth. The method splits the sensor differences into two parts: a global curve that fixes overall brightness and contrast, and a local grid that adjusts colors in different image regions. Both parts are guided by statistical patterns found in real RAW data. To train the system without needing every possible sensor, the authors built a simulator that generates fake RAW images matching many real sensor behaviors. Experiments on several RAW datasets with bit depths from 10 to 24 bits show the single trained model performs better than previous approaches when tested on single datasets, mixed datasets, and under robustness challenges.

Core claim

By factoring sensor-induced variations into a global tonal correction and a spatially adaptive local color adjustment, both driven by RAW distribution priors, our framework enables a single network to train jointly across heterogeneous sensors.

Load-bearing premise

That sensor-induced variations can be accurately and completely factored into global tonal correction plus spatially adaptive local color adjustment driven by RAW distribution priors, and that the physics-based simulation pipeline produces data realistic enough for cross-sensor generalization.

read the original abstract

Camera sensor RAW data offers intrinsic advantages for object detection, including deeper bit depth, preserved physical information, and freedom from image signal processor (ISP) distortions. However, varying exposure conditions, spectral sensitivities, and bit depths across devices introduce substantially larger domain gaps than sRGB, making sensor-agnostic generalization a fundamental challenge. In this study, we present \textbf{RAWild}, a physics-guided global-local tone mapping framework for sensor-agnostic RAW object detection. By factoring sensor-induced variations into a global tonal correction and a spatially adaptive local color adjustment, both driven by RAW distribution priors, our framework enables a single network to train jointly across heterogeneous sensors. To further support cross-sensor generalization, we construct a physics-based RAW simulation pipeline that synthesizes realistic sensor outputs spanning diverse spectral sensitivities, illuminants, and sensor non-idealities. Extensive experiments across multiple RAW benchmarks covering bit depths from 10 to 24 demonstrate state-of-the-art (SOTA) performance under single-dataset, mixed-dataset, and challenging robustness settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents RAWild, a physics-guided global-local tone mapping framework for sensor-agnostic RAW object detection. Sensor-induced variations are factored into a global tonal correction and a spatially adaptive local color adjustment, both driven by RAW distribution priors, enabling joint training of a single detector across heterogeneous sensors with varying exposures, spectral sensitivities, and bit depths. A physics-based RAW simulation pipeline is introduced to synthesize realistic data spanning diverse spectral sensitivities, illuminants, bit depths (10-24), and non-idealities. Experiments on multiple RAW benchmarks claim state-of-the-art performance under single-dataset, mixed-dataset, and robustness settings.

Significance. If the claims hold, the work addresses an important challenge in computer vision by enabling direct use of RAW data for object detection, preserving physical information and avoiding ISP distortions. The physics-based simulation pipeline is a notable strength for supporting cross-sensor generalization and data synthesis, provided it is empirically validated against real sensor distributions.

major comments (3)
  1. [Abstract and simulation pipeline section] Abstract and simulation pipeline section: The central claim that the physics-based simulation produces data realistic enough for cross-sensor generalization after global-local corrections is load-bearing, yet the manuscript provides no quantitative validation (e.g., KL divergence, Wasserstein distance, or per-channel histogram comparisons) between simulated and real RAW captures from the same sensors. Unmodeled effects such as read-noise statistics or microlens variations could remain and undermine joint training.
  2. [Framework description] Framework description: The decomposition of sensor variations into global tonal correction plus spatially adaptive local color adjustment driven by RAW priors is presented as complete and parameter-light, but no ablation studies are referenced that isolate the contribution of each component or test whether the priors are derived independently of the evaluation data. This risks circularity in the reported generalization gains.
  3. [Experimental results] Experimental results: The SOTA claims across single-dataset, mixed-dataset, and robustness settings lack reported error bars, statistical significance tests, or detailed baseline comparisons (e.g., against prior RAW or ISP-based detectors). Without these, it is impossible to confirm that the performance improvements are attributable to the proposed corrections rather than dataset specifics.
minor comments (2)
  1. [Abstract] Clarify the exact computation of 'RAW distribution priors' (e.g., whether they are per-image histograms, global statistics, or learned) in the main text, as the abstract uses the term without definition.
  2. Ensure consistency in bit-depth reporting (10-24) between the abstract, simulation pipeline description, and experimental tables.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback. We will revise the manuscript to incorporate quantitative validation of the simulation pipeline, ablation studies for the framework components, and enhanced statistical reporting in the experiments. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract and simulation pipeline section] Abstract and simulation pipeline section: The central claim that the physics-based simulation produces data realistic enough for cross-sensor generalization after global-local corrections is load-bearing, yet the manuscript provides no quantitative validation (e.g., KL divergence, Wasserstein distance, or per-channel histogram comparisons) between simulated and real RAW captures from the same sensors. Unmodeled effects such as read-noise statistics or microlens variations could remain and undermine joint training.

    Authors: We agree that quantitative validation would strengthen the central claim. The manuscript currently supports the simulation's utility through qualitative visual comparisons and downstream detection gains across sensors. In revision, we will add quantitative metrics including KL divergence, Wasserstein distance, and per-channel histogram comparisons between simulated and real RAW data from matching sensors. We will also expand the discussion of simulation limitations to explicitly address potential unmodeled effects such as read-noise statistics and microlens variations, including any mitigation strategies or sensitivity analysis. revision: yes

  2. Referee: [Framework description] Framework description: The decomposition of sensor variations into global tonal correction plus spatially adaptive local color adjustment driven by RAW priors is presented as complete and parameter-light, but no ablation studies are referenced that isolate the contribution of each component or test whether the priors are derived independently of the evaluation data. This risks circularity in the reported generalization gains.

    Authors: We concur that isolating component contributions via ablations is essential. The revised manuscript will include new ablation experiments that separately evaluate the global tonal correction and the spatially adaptive local adjustment. We will also clarify the procedure for deriving RAW distribution priors, confirming they are computed exclusively from training splits with no overlap to evaluation data, thereby eliminating any risk of circularity. revision: yes

  3. Referee: [Experimental results] Experimental results: The SOTA claims across single-dataset, mixed-dataset, and robustness settings lack reported error bars, statistical significance tests, or detailed baseline comparisons (e.g., against prior RAW or ISP-based detectors). Without these, it is impossible to confirm that the performance improvements are attributable to the proposed corrections rather than dataset specifics.

    Authors: We acknowledge the need for greater statistical rigor. In the revised experiments section, we will report error bars as standard deviations over multiple random seeds, include statistical significance tests (e.g., paired t-tests against baselines), and expand baseline comparisons to include additional prior RAW-specific and ISP-based detectors. These additions will help attribute performance gains more clearly to the physics-guided corrections. revision: yes

Circularity Check

0 steps flagged

No circularity: framework and simulation presented as independent of target data

full rationale

The abstract and claims describe a physics-guided global-local tone mapping driven by RAW distribution priors together with a separate physics-based simulation pipeline that synthesizes outputs across spectral sensitivities, illuminants, and non-idealities. No quoted equations, self-citations, or descriptions show any prediction or result reducing by construction to fitted parameters from the same evaluation data, nor any uniqueness theorem or ansatz imported solely from the authors' prior work. The central claim of joint training across heterogeneous sensors therefore rests on externally verifiable physics modeling rather than definitional or statistical self-reference, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unverified effectiveness of RAW distribution priors and the realism of the physics-based simulation pipeline; no explicit free parameters or invented entities are detailed in the abstract.

axioms (1)
  • domain assumption Sensor-induced variations can be factored into global tonal correction and spatially adaptive local color adjustment driven by RAW distribution priors
    Invoked to enable joint training across heterogeneous sensors.
invented entities (1)
  • Physics-based RAW simulation pipeline no independent evidence
    purpose: Synthesize realistic sensor outputs spanning spectral sensitivities, illuminants, and non-idealities
    Introduced to support cross-sensor generalization training.

pith-pipeline@v0.9.0 · 5502 in / 1311 out tokens · 66954 ms · 2026-05-08T14:19:23.162977+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.