Physics-informed Active Polarimetric 3D Imaging for Specular Surfaces

Florian Willomitzer; Hyelim Yang; Jiazhang Wang; Tianyi Wang

arxiv: 2602.19470 · v2 · submitted 2026-02-23 · 💻 cs.CV · physics.optics

Physics-informed Active Polarimetric 3D Imaging for Specular Surfaces

Jiazhang Wang , Hyelim Yang , Tianyi Wang , Florian Willomitzer This is my paper

Pith reviewed 2026-05-15 21:01 UTC · model grok-4.3

classification 💻 cs.CV physics.optics

keywords specular surfaces3D imagingpolarimetric imagingstructured illuminationsurface normalssingle-shot reconstructiondeep learning

0 comments

The pith

A network fuses polarization cues with structured light patterns to measure surface normals of specular objects from one image.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to create a single-shot method for recovering the three-dimensional shape of shiny surfaces that works in moving or handheld scenarios. Existing optical approaches either demand repeated captures, which breaks down for dynamic scenes, or rely on flat-view assumptions that lose accuracy on curved or finely detailed reflectors. The authors train a dual-encoder network that receives both polarization signals, treated as orientation hints, and the geometric distortions produced by projected patterns; the encoders modulate each other's features so the model can untangle their combined nonlinear effects and output surface normals directly. If this coupling is resolved reliably, the result is fast, accurate normal maps without multi-shot hardware or orthographic simplifications.

Core claim

The central claim is that polarization-derived orientation priors and structured-illumination geometry can be jointly processed by a dual-encoder architecture with mutual feature modulation, allowing the network to resolve their nonlinear coupling and directly infer accurate surface normals of complex specular objects in a single shot.

What carries the argument

Dual-encoder network with mutual feature modulation that treats polarization signals as orientation priors and integrates them with geometric cues from projected patterns.

If this is right

Surface normals can be recovered accurately from a single exposure even when the object has high spatial-frequency structure or large curvature.
Inference runs fast enough for dynamic or handheld 3D scanning applications.
The orthographic imaging assumption is no longer required for polarimetric 3D methods on specular surfaces.
Practical in-line inspection becomes feasible without synchronized multi-shot hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same mutual-modulation idea could be tested on other complementary sensor pairs, such as thermal and visible light, to see whether the architecture generalizes beyond polarization.
If inference speed remains low, the approach might support real-time video-rate normal tracking for moving specular parts.
Extending the training data to include partial occlusions or inter-reflections would clarify how far the single-shot robustness extends in cluttered scenes.

Load-bearing premise

Polarization measurements supply usable orientation information that, when modulated inside the network with data from structured illumination, lets the model separate their joint nonlinear influence on surface shape without repeated captures or flat-view assumptions.

What would settle it

Run the method on a high-curvature specular test object with known ground-truth normals from multi-shot deflectometry; if the single-shot normal error exceeds the noise floor of the reference measurement on fine surface detail, the claim does not hold.

read the original abstract

3D imaging of specular surfaces remains challenging in real-world scenarios, such as in-line inspection or hand-held scanning, requiring fast and accurate measurement of complex geometries. Optical metrology techniques such as deflectometry achieve high accuracy but typically rely on multi-shot acquisition, making them unsuitable for dynamic environments. Fourier-based single-shot approaches alleviate this constraint, yet their performance deteriorates when measuring surfaces with high spatial frequency structure or large curvature. Alternatively, polarimetric 3D imaging in computer vision operates in a single-shot fashion and exhibits robustness to geometric complexity. However, its accuracy is fundamentally limited by the orthographic imaging assumption. In this paper, we propose a physics-informed deep learning framework for single-shot 3D imaging of complex specular surfaces. Polarization cues provide orientation priors that assist in interpreting geometric information encoded by structured illumination. These complementary cues are processed through a dual-encoder architecture with mutual feature modulation, allowing the network to resolve their nonlinear coupling and directly infer surface normals. The proposed method achieves accurate and robust normal estimation in single-shot with fast inference, enabling practical 3D imaging of complex specular surfaces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a dual-encoder network with mutual feature modulation to fuse polarization priors and structured illumination for single-shot normal estimation on specular surfaces, but the abstract supplies no metrics, ablations, or implementation details to support the claims.

read the letter

The main point is a new architecture that takes polarization cues as orientation priors and combines them with structured illumination through dual encoders and mutual feature modulation to recover surface normals in one shot. This targets the practical gap between multi-shot deflectometry, which is accurate but slow, and single-shot polarimetry, which struggles with the orthographic assumption on curved specular parts. The idea of letting the two modalities help each other resolve their nonlinear coupling makes sense for dynamic or handheld use cases like in-line inspection.

Referee Report

2 major / 1 minor

Summary. The paper proposes a physics-informed deep learning framework for single-shot 3D imaging of complex specular surfaces. Polarization cues supply orientation priors that are fused with structured-illumination information via a dual-encoder architecture employing mutual feature modulation; the network is claimed to resolve the nonlinear coupling between these cues and directly regress surface normals, overcoming the multi-shot requirement of deflectometry and the orthographic assumption of conventional polarimetry.

Significance. If the central claim holds with quantitative validation, the work would enable practical, fast normal estimation for dynamic or in-line inspection scenarios where existing optical metrology fails, directly addressing a long-standing limitation in specular-surface metrology.

major comments (2)

[Abstract] Abstract: the claim that the method 'achieves accurate and robust normal estimation' is unsupported by any error metrics, ablation results, or baseline comparisons; without these data the performance advantage over Fourier single-shot or standard polarimetric methods cannot be evaluated.
[Method] Method section (dual-encoder mutual feature modulation): no equation or derivation shows that the modulation operation is obtained from the radiometric or polarimetric forward model rather than being a generic learned interaction (e.g., cross-attention or concatenation); this leaves the resolution of nonlinear cue coupling as an unverified architectural assumption.

minor comments (1)

[Abstract] Abstract: the phrase 'physics-informed' is used without specifying which physical constraints are explicitly embedded in the loss or architecture versus learned from data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our physics-informed framework. We address each major comment below and indicate the corresponding revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the method 'achieves accurate and robust normal estimation' is unsupported by any error metrics, ablation results, or baseline comparisons; without these data the performance advantage over Fourier single-shot or standard polarimetric methods cannot be evaluated.

Authors: The abstract serves as a high-level summary. The full manuscript contains quantitative evaluations, including mean angular error metrics, ablation studies on the dual-encoder and modulation components, and direct comparisons to Fourier single-shot deflectometry and conventional polarimetric baselines, demonstrating the claimed accuracy and robustness. To address the concern, we will revise the abstract to incorporate key quantitative results and performance highlights. revision: yes
Referee: [Method] Method section (dual-encoder mutual feature modulation): no equation or derivation shows that the modulation operation is obtained from the radiometric or polarimetric forward model rather than being a generic learned interaction (e.g., cross-attention or concatenation); this leaves the resolution of nonlinear cue coupling as an unverified architectural assumption.

Authors: The mutual feature modulation is specifically motivated by the physical forward model: polarization orientation priors from the specular reflection model are used to modulate the feature extraction from structured-illumination patterns, thereby resolving the nonlinear ambiguities in normal estimation that arise under the orthographic assumption. While the manuscript describes this motivation in prose, we agree that an explicit equation would strengthen the physics-informed claim. We will add a derivation in the revised Method section that formally links the modulation operator to the radiometric and polarimetric image formation equations. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the proposed framework

full rationale

The paper describes a physics-informed deep learning architecture (dual-encoder with mutual feature modulation) that processes polarization orientation priors and structured-illumination cues to infer surface normals. No closed-form derivation, first-principles prediction, or mathematical chain is presented that reduces to its own inputs by construction. The central claim rests on empirical performance of a trained network rather than any fitted parameter renamed as a prediction or any self-citation load-bearing uniqueness theorem. No equations or sections in the provided text exhibit self-definitional loops, ansatz smuggling, or renaming of known results. The method is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim rests on the domain assumption that polarization provides usable orientation priors and that a learned dual-encoder can resolve their coupling with illumination patterns; no new physical entities are postulated.

free parameters (1)

Network weights and hyperparameters
Learned parameters that map the combined cues to normals; their values are not derived from first principles.

axioms (1)

domain assumption Polarization cues supply reliable orientation priors for specular surfaces
Invoked to justify why polarization assists interpretation of structured illumination geometry.

pith-pipeline@v0.9.0 · 5499 in / 1249 out tokens · 30870 ms · 2026-05-15T21:01:00.730484+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

dual-encoder architecture with mutual feature modulation... Feature-wise Linear Modulation (FiLM) layers are employed to adaptively fuse polarimetric cues and geometric correspondence features
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Polarization cues provide orientation priors... resolve their nonlinear coupling

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.