pith. sign in

arxiv: 2606.12635 · v1 · pith:J6UFH3B6new · submitted 2026-06-10 · 💻 cs.CV

CD-RCM: Generalizable Continuous-Depth Novel View Synthesis for Reflectance Confocal Microscopy

Pith reviewed 2026-06-27 09:42 UTC · model grok-4.3

classification 💻 cs.CV
keywords reflectance confocal microscopynovel view synthesiscontinuous depth3D interpolationskin imagingoptical biopsyfeedforward modelisotropic volume
0
0 comments X

The pith

A feedforward network synthesizes realistic unseen depths from sparse RCM skin stacks to create isotropic 3D volumes without per-patient retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CD-RCM as the first RCM-specific novel-view synthesis method that interpolates intermediate sections in reflectance confocal microscopy z-stacks. These stacks are anisotropic because lateral resolution is roughly six times finer than axial resolution set by optical sectioning, and shallower layers obscure deeper ones due to the axial imaging geometry. The model uses a tailored architecture and training framework that accounts for this depth-resolved occlusive physics to predict continuous depths up to 200 micrometers. This produces isotropic volumes that support arbitrary-direction sectioning, including cross-sectional views resembling histopathology. A reader would care because it converts limited sparse scans into continuous 3D representations usable for interpretation without additional optimization time per case.

Core claim

CD-RCM is a feedforward model that predicts realistic unseen depths from sparsely sampled RCM stacks by explicitly modeling the depth-resolved and occlusive nature of RCM imaging, enabling continuous-depth visualization and isotropic 3D volumes from en-face images while achieving sub-second inference without per-patient optimization.

What carries the argument

A feedforward neural network architecture trained to account for RCM's depth-resolved occlusive imaging physics, which predicts novel intermediate depths from sparse axial stacks.

If this is right

  • The resulting volumes become isotropic, removing the sixfold difference between lateral and axial resolution.
  • Arbitrary-direction sectioning becomes possible, including histopathology-like cross-sections.
  • Novel-view synthesis runs in sub-second time on standard hardware.
  • No per-patient optimization is required once the network is trained on representative data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same depth-resolved occlusion modeling might apply to other optical sectioning techniques that acquire en-face planes at successive depths.
  • Clinical workflows could shift from dense axial sampling to sparser stacks followed by synthesis, reducing total acquisition time.
  • If the synthesized depths preserve cellular detail, they could support quantitative measurements such as cell counting or layer thickness across continuous depths.

Load-bearing premise

A single network trained on RCM data will generalize to new patients and produce accurate interpolations without per-patient optimization.

What would settle it

Test CD-RCM on RCM stacks from patients held out from training, synthesize depths at locations where real images were acquired, and measure reconstruction error; high error rates would show the generalization fails.

Figures

Figures reproduced from arXiv: 2606.12635 by Jennifer Dy, Kivanc Kose, Milind Rajadhyaksha, Tooba Imtiaz.

Figure 1
Figure 1. Figure 1: Comparison of imaging paradigms. (a) Surface-level imaging reconstructs opaque objects from multiple posed viewpoints with geometric parallax. (b) CT acquires transmission projections from multiple angles and reconstructs 3D structure from X-ray attenuation. (c) RCM performs axial optical sectioning, capturing discrete en-face slices of internal tissue at successive depths. [Kerbl et al., 2023], enabling i… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of CD-RCM. Sparse input RCM slices and their Plücker ray embeddings are mapped to a unified token space and processed by a decoder-only transformer. Target ray tokens condition the synthesis; updated target tokens are decoded to produce ˆIt. Training uses a weighted combination of photometric, LPIPS, and skin-specific perceptual feature loss LSPF. combination of photometric reconstruction loss and… view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison of novel-depth synthesis methods. We compare classical interpolation approaches with CD-RCM (with and without skin-specific perceptual supervision). CD-RCM best reconstructs fine cellular structures and depth-dependent texture transitions, producing results closest to the ground truth. Insets highlight regions where interpolation methods exhibit structural distortion and/or incorrect… view at source ↗
Figure 4
Figure 4. Figure 4: Cross-sectional and arbitrary-plane visualization of densified stacks. We densify the input stacks using CD-RCM to match the axial and lateral resolution of the actual sample, enabling isotropic 3D visualization. Sagittal, coronal, and oblique cross-sections show markedly smoother structural transitions compared to the sparsely sampled input stacks. More broadly, our work illustrates how adapting modern ne… view at source ↗
read the original abstract

Reflectance confocal microscopy (RCM) provides noninvasive, cellular-resolution "optical biopsies" of human skin \emph{in vivo} by acquiring en-face images at successive depths, forming a sparse z-stack. Due to optical limitations, these stacks are anisotropic 3D volumes with lateral resolution (0.5 $\mu$m) $\sim$6 times higher compared to axial resolution, which is defined by the optical sectioning (3 $\mu$m), limiting the interpretation of tissue. Our goal is to provide continuous-depth visualization by interpolating intermediate sections and making the 3D volume isotropic. Such a representation permits arbitrary-direction sectioning, including histopathology-like cross-sectional examination, without requiring per-patient optimization. To that end, we introduce the first RCM-specific novel-view synthesis (NVS) approach, CD-RCM, a feedforward model that predicts realistic, unseen depths from sparsely sampled RCM stacks. Classical neural rendering methods focus on reconstruction from surface-level multi-view observations. In contrast to surface-level camera views, RCM can acquire optically sectioned en-face images of tissue beyond the surface up to 200 $\mu$m. However, during visualization of the RCM stacks, observations of the shallower sections (towards the surface) obscure the deeper ones. This unique axial imaging geometry and layer-dependent anatomical organization motivated our development of a tailored architectural and training framework that explicitly accounts for RCM's depth-resolved, occlusive imaging physics. Experiments demonstrate that CD-RCM achieves high-fidelity novel-view synthesis with sub-second inference time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces CD-RCM, the first RCM-specific novel-view synthesis method: a feedforward network that interpolates unseen depths in sparse RCM z-stacks to produce continuous, isotropic 3D volumes. It explicitly models the depth-resolved occlusive axial imaging physics of RCM (en-face sections up to 200 μm) rather than surface multi-view geometry, enabling arbitrary-direction sectioning without per-patient optimization and claiming high-fidelity results at sub-second inference time.

Significance. If the generalization and fidelity claims hold, the work would provide a practical tool for converting anisotropic RCM stacks into histopathology-like cross-sections, improving in-vivo tissue interpretation in dermatology. The feedforward design and explicit handling of RCM occlusion are notable strengths for clinical deployment; the absence of per-patient fine-tuning distinguishes it from typical neural rendering pipelines.

major comments (2)
  1. [Abstract] Abstract: the statement that 'experiments demonstrate that CD-RCM achieves high-fidelity novel-view synthesis' supplies no datasets, quantitative metrics, baselines, patient counts, or error analysis, leaving the central empirical claim without visible support.
  2. [Abstract] Abstract: the title and abstract assert generalizability to new patients without per-patient optimization, yet no information is given on the number of patients, skin-type/pigmentation diversity, layer-thickness variation, or whether evaluation used patient-wise hold-out splits (versus intra-patient splits); this directly affects whether the training distribution captures the claimed anatomical and optical variations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We agree that additional details would strengthen the presentation of our claims and will revise the abstract accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the statement that 'experiments demonstrate that CD-RCM achieves high-fidelity novel-view synthesis' supplies no datasets, quantitative metrics, baselines, patient counts, or error analysis, leaving the central empirical claim without visible support.

    Authors: We agree that the abstract does not include these supporting details. The manuscript body reports the full experimental protocol, including the RCM datasets, quantitative metrics, baselines, and error analysis. We will revise the abstract to add a concise clause summarizing the evaluation setup and key fidelity results. revision: yes

  2. Referee: [Abstract] Abstract: the title and abstract assert generalizability to new patients without per-patient optimization, yet no information is given on the number of patients, skin-type/pigmentation diversity, layer-thickness variation, or whether evaluation used patient-wise hold-out splits (versus intra-patient splits); this directly affects whether the training distribution captures the claimed anatomical and optical variations.

    Authors: We agree that the abstract omits these dataset characteristics. The manuscript uses patient-wise hold-out evaluation on a multi-patient cohort that includes variation in skin types and layer thicknesses. We will revise the abstract to briefly note the patient cohort size, diversity considerations, and patient-wise splits to better support the generalizability claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical feedforward model is self-contained

full rationale

The paper introduces CD-RCM as a neural network architecture trained on RCM stacks to perform novel-view synthesis, with claims resting on empirical training and held-out evaluation rather than any closed mathematical derivation. No equations reduce a prediction to a fitted input by construction, no self-citations serve as load-bearing uniqueness theorems, and no ansatzes are smuggled via prior work. The central generalization claim is presented as an empirical result evaluated on data, not a definitional identity, making the derivation chain independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on empirical performance of a neural network with standard training assumptions; no explicit free parameters, axioms beyond domain generalization, or invented entities are stated in the abstract.

axioms (1)
  • domain assumption A feedforward neural network trained on RCM stacks will generalize across patients without per-patient optimization.
    Invoked when claiming the model works on new stacks without additional tuning.

pith-pipeline@v0.9.1-grok · 5825 in / 1117 out tokens · 24828 ms · 2026-06-27T09:42:06.135654+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references · 8 canonical work pages · 3 internal anchors

  1. [1]

    Training Deep Nets with Sublinear Memory Cost

    Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. Training deep nets with sublinear memory cost.arXiv preprint arXiv:1604.06174,

  2. [2]

    Alex Henry, Prudhvi Raj Dachapally, Shubham Shantaram Pawar, and Yuxuan Chen

    doi: 10.3109/9780203091562. Alex Henry, Prudhvi Raj Dachapally, Shubham Shantaram Pawar, and Yuxuan Chen. Query-key normalization for transformers. InFindings of the Association for Computational Linguistics: EMNLP 2020, pages 4246–4253,

  3. [3]

    doi: https://doi

    ISSN 0022-202X. doi: https://doi. org/10.1046/j.0022-202x.2001.01337.x. URL https://www.sciencedirect.com/science/ article/pii/S0022202X15412552. Tooba Imtiaz, Lucy Chai, Kathryn Heal, Xuan Luo, Jungyeon Park, Jennifer Dy, and John Flynn. Lvt: Large-scale scene reconstruction via local view transformers. InProceedings of the SIGGRAPH Asia 2025 Conference ...

  4. [4]

    doi: 10.3390/life13122268

    ISSN 2075-1729. doi: 10.3390/life13122268. URL https: //www.mdpi.com/2075-1729/13/12/2268. Thomas Martin Lehmann, Claudia Gonner, and Klaus Spitzer. Survey: Interpolation methods in medical image processing.IEEE transactions on medical imaging, 18(11):1049–1075,

  5. [5]

    Liu, Adam K

    Jonathan T.C. Liu, Adam K. Glaser, Chetan Poudel, and Joshua C. Vaughan. Nondestructive 3d pathology with light-sheet fluorescence microscopy for translational research and clinical assays.Annual Review of Analytical Chemistry, 16(V olume 16, 2023):231–252,

  6. [6]

    Decoupled Weight Decay Regularization

    ISSN 1936-1335. doi: https://doi.org/10.1146/annurev-anchem-091222-092734. URL https://www. annualreviews.org/content/journals/10.1146/annurev-anchem-091222-092734. Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101,

  7. [7]

    URL https://onlinelibrary.wiley.com/doi/abs/10.1002/lsm.22600

    doi: https://doi.org/10.1002/lsm.22600. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/lsm.22600. N. A. Richarz, A. Boada, A. Jaka, J. Bassas, C. Ferrándiz, J. M. Carrascosa, and O. Yélamos. Challenges for new adopters in pre-surgical margin assessment by handheld reflectance confocal microscope of basal cell carcinoma: A prospective single-center st...

  8. [8]

    Johannes Schindelin, Ignacio Arganda-Carreras, Erwin Frise, Verena Kaynig, Mark Longair, Tobias Pietzsch, Stephan Preibisch, Curtis Rueden, Stephan Saalfeld, Benjamin Schmid, et al

    doi: 10.5826/dpc.1204a162. Johannes Schindelin, Ignacio Arganda-Carreras, Erwin Frise, Verena Kaynig, Mark Longair, Tobias Pietzsch, Stephan Preibisch, Curtis Rueden, Stephan Saalfeld, Benjamin Schmid, et al. Fiji: an open-source platform for biological-image analysis.Nature methods, 9(7):676–682,

  9. [9]

    DINOv3

    Oriane Siméoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104,

  10. [10]

    doi: https://doi.org/10

    ISSN 0092-8674. doi: https://doi.org/10. 1016/j.cell.2024.03.035. URL https://www.sciencedirect.com/science/article/pii/ S0092867424003519. Monika Wojarska, Klaudia Kokot, Paulina Bernecka, Natalia Doma´nska, Agata Libik, Dana Bunevich, Dominika Nowakowska, Magdalena Dzido, Wiktoria Borzyszkowska, Wojciech Kazimierczak, and Jerzy Jankau. In vivo confocal ...

  11. [11]

    doi: 10.3390/ jcm14165779

    ISSN 2077-0383. doi: 10.3390/ jcm14165779. URLhttps://www.mdpi.com/2077-0383/14/16/5779. Ruyi Zha, Tao Jun Lin, Yuanhao Cai, Jiwen Cao, Yanhao Zhang, and Hongdong Li. R2-gaussian: Rectifying radiative gaussian splatting for tomographic reconstruction. InAdvances in Neural Information Processing Systems (NeurIPS),

  12. [12]

    We employ a cosine learning rate schedule with linear warmup, using 2000 warmup steps and a peak learning rate of 4×10 −4 for both training resolutions

    with β1 = 0.9, β2 = 0.95, and a weight decay of 0.05 applied to all parameters except LayerNorm layers. We employ a cosine learning rate schedule with linear warmup, using 2000 warmup steps and a peak learning rate of 4×10 −4 for both training resolutions. The loss weights in Eq. 12 are set to λ= 0.5 and γ= 0.05 . The full model has 170.8M trainable param...

  13. [13]

    [2016], and mixed-precision training with BF16 data type

    within the attention blocks, gradient checkpointing Chen et al. [2016], and mixed-precision training with BF16 data type. B Baselines Methods We adopt three well-established classical interpolation techniques widely adopted for processing medical imaging data as the baselines in our experiments Enjilela et al. [2019], Lehmann et al. [2002]. Note that simp...