RAWild: Sensor-Agnostic RAW Object Detection via Physics-Guided Curve and Grid Modeling
Pith reviewed 2026-05-08 14:19 UTC · model grok-4.3
The pith
RAWild achieves sensor-agnostic RAW object detection by using physics-guided global-local tone mapping driven by RAW priors plus a simulation pipeline, delivering SOTA results across heterogeneous sensors and bit depths.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By factoring sensor-induced variations into a global tonal correction and a spatially adaptive local color adjustment, both driven by RAW distribution priors, our framework enables a single network to train jointly across heterogeneous sensors.
Load-bearing premise
That sensor-induced variations can be accurately and completely factored into global tonal correction plus spatially adaptive local color adjustment driven by RAW distribution priors, and that the physics-based simulation pipeline produces data realistic enough for cross-sensor generalization.
read the original abstract
Camera sensor RAW data offers intrinsic advantages for object detection, including deeper bit depth, preserved physical information, and freedom from image signal processor (ISP) distortions. However, varying exposure conditions, spectral sensitivities, and bit depths across devices introduce substantially larger domain gaps than sRGB, making sensor-agnostic generalization a fundamental challenge. In this study, we present \textbf{RAWild}, a physics-guided global-local tone mapping framework for sensor-agnostic RAW object detection. By factoring sensor-induced variations into a global tonal correction and a spatially adaptive local color adjustment, both driven by RAW distribution priors, our framework enables a single network to train jointly across heterogeneous sensors. To further support cross-sensor generalization, we construct a physics-based RAW simulation pipeline that synthesizes realistic sensor outputs spanning diverse spectral sensitivities, illuminants, and sensor non-idealities. Extensive experiments across multiple RAW benchmarks covering bit depths from 10 to 24 demonstrate state-of-the-art (SOTA) performance under single-dataset, mixed-dataset, and challenging robustness settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents RAWild, a physics-guided global-local tone mapping framework for sensor-agnostic RAW object detection. Sensor-induced variations are factored into a global tonal correction and a spatially adaptive local color adjustment, both driven by RAW distribution priors, enabling joint training of a single detector across heterogeneous sensors with varying exposures, spectral sensitivities, and bit depths. A physics-based RAW simulation pipeline is introduced to synthesize realistic data spanning diverse spectral sensitivities, illuminants, bit depths (10-24), and non-idealities. Experiments on multiple RAW benchmarks claim state-of-the-art performance under single-dataset, mixed-dataset, and robustness settings.
Significance. If the claims hold, the work addresses an important challenge in computer vision by enabling direct use of RAW data for object detection, preserving physical information and avoiding ISP distortions. The physics-based simulation pipeline is a notable strength for supporting cross-sensor generalization and data synthesis, provided it is empirically validated against real sensor distributions.
major comments (3)
- [Abstract and simulation pipeline section] Abstract and simulation pipeline section: The central claim that the physics-based simulation produces data realistic enough for cross-sensor generalization after global-local corrections is load-bearing, yet the manuscript provides no quantitative validation (e.g., KL divergence, Wasserstein distance, or per-channel histogram comparisons) between simulated and real RAW captures from the same sensors. Unmodeled effects such as read-noise statistics or microlens variations could remain and undermine joint training.
- [Framework description] Framework description: The decomposition of sensor variations into global tonal correction plus spatially adaptive local color adjustment driven by RAW priors is presented as complete and parameter-light, but no ablation studies are referenced that isolate the contribution of each component or test whether the priors are derived independently of the evaluation data. This risks circularity in the reported generalization gains.
- [Experimental results] Experimental results: The SOTA claims across single-dataset, mixed-dataset, and robustness settings lack reported error bars, statistical significance tests, or detailed baseline comparisons (e.g., against prior RAW or ISP-based detectors). Without these, it is impossible to confirm that the performance improvements are attributable to the proposed corrections rather than dataset specifics.
minor comments (2)
- [Abstract] Clarify the exact computation of 'RAW distribution priors' (e.g., whether they are per-image histograms, global statistics, or learned) in the main text, as the abstract uses the term without definition.
- Ensure consistency in bit-depth reporting (10-24) between the abstract, simulation pipeline description, and experimental tables.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive feedback. We will revise the manuscript to incorporate quantitative validation of the simulation pipeline, ablation studies for the framework components, and enhanced statistical reporting in the experiments. Our point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract and simulation pipeline section] Abstract and simulation pipeline section: The central claim that the physics-based simulation produces data realistic enough for cross-sensor generalization after global-local corrections is load-bearing, yet the manuscript provides no quantitative validation (e.g., KL divergence, Wasserstein distance, or per-channel histogram comparisons) between simulated and real RAW captures from the same sensors. Unmodeled effects such as read-noise statistics or microlens variations could remain and undermine joint training.
Authors: We agree that quantitative validation would strengthen the central claim. The manuscript currently supports the simulation's utility through qualitative visual comparisons and downstream detection gains across sensors. In revision, we will add quantitative metrics including KL divergence, Wasserstein distance, and per-channel histogram comparisons between simulated and real RAW data from matching sensors. We will also expand the discussion of simulation limitations to explicitly address potential unmodeled effects such as read-noise statistics and microlens variations, including any mitigation strategies or sensitivity analysis. revision: yes
-
Referee: [Framework description] Framework description: The decomposition of sensor variations into global tonal correction plus spatially adaptive local color adjustment driven by RAW priors is presented as complete and parameter-light, but no ablation studies are referenced that isolate the contribution of each component or test whether the priors are derived independently of the evaluation data. This risks circularity in the reported generalization gains.
Authors: We concur that isolating component contributions via ablations is essential. The revised manuscript will include new ablation experiments that separately evaluate the global tonal correction and the spatially adaptive local adjustment. We will also clarify the procedure for deriving RAW distribution priors, confirming they are computed exclusively from training splits with no overlap to evaluation data, thereby eliminating any risk of circularity. revision: yes
-
Referee: [Experimental results] Experimental results: The SOTA claims across single-dataset, mixed-dataset, and robustness settings lack reported error bars, statistical significance tests, or detailed baseline comparisons (e.g., against prior RAW or ISP-based detectors). Without these, it is impossible to confirm that the performance improvements are attributable to the proposed corrections rather than dataset specifics.
Authors: We acknowledge the need for greater statistical rigor. In the revised experiments section, we will report error bars as standard deviations over multiple random seeds, include statistical significance tests (e.g., paired t-tests against baselines), and expand baseline comparisons to include additional prior RAW-specific and ISP-based detectors. These additions will help attribute performance gains more clearly to the physics-guided corrections. revision: yes
Circularity Check
No circularity: framework and simulation presented as independent of target data
full rationale
The abstract and claims describe a physics-guided global-local tone mapping driven by RAW distribution priors together with a separate physics-based simulation pipeline that synthesizes outputs across spectral sensitivities, illuminants, and non-idealities. No quoted equations, self-citations, or descriptions show any prediction or result reducing by construction to fitted parameters from the same evaluation data, nor any uniqueness theorem or ansatz imported solely from the authors' prior work. The central claim of joint training across heterogeneous sensors therefore rests on externally verifiable physics modeling rather than definitional or statistical self-reference, making the derivation self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Sensor-induced variations can be factored into global tonal correction and spatially adaptive local color adjustment driven by RAW distribution priors
invented entities (1)
-
Physics-based RAW simulation pipeline
no independent evidence
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.