VOIC: Visible-Occluded Integrated Guidance for 3D Semantic Scene Completion
Pith reviewed 2026-05-16 20:31 UTC · model grok-4.3
The pith
A dual-decoder network separates visible and occluded region supervision to improve monocular 3D semantic scene completion.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VOIC decouples SSC into visible-region semantic perception and occluded-region scene completion. It first builds a base 3D voxel representation by fusing image features with depth-derived occupancy. The visible decoder generates high-fidelity geometric and semantic priors from this base. The occlusion decoder then leverages those priors together with cross-modal interaction to perform coherent global scene reasoning. This structure is supported by an offline VRLE step that extracts purified visible voxel labels from dense 3D ground truth.
What carries the argument
The Visible-Occluded Interactive Completion Network (VOIC), a dual-decoder architecture in which the visible decoder supplies high-fidelity priors to the occlusion decoder for global reasoning.
If this is right
- Higher geometric completion and semantic segmentation accuracy than existing monocular SSC methods.
- Reduced feature dilution and error propagation between visible and occluded regions.
- State-of-the-art results on the SemanticKITTI and SSCBench-KITTI360 benchmarks.
- More coherent global scene reasoning by feeding visible priors into the occlusion decoder.
Where Pith is reading between the lines
- The same visible-occluded split could be tested on multi-view or LiDAR-assisted SSC pipelines to see if the accuracy lift persists when more input data is available.
- Explicit separation of supervision regions might reduce the data volume needed for training occluded reasoning modules.
- The dual-decoder design offers a natural way to add uncertainty estimates that flag which voxels come from visible priors versus pure inference.
Load-bearing premise
Offline extraction of visible-region voxel labels from dense 3D ground truth cleanly separates supervision without introducing selection bias or losing information needed for coherent global reasoning.
What would settle it
Retraining an otherwise identical model with combined visible-occluded supervision instead of the separated VRLE labels and checking whether geometric and semantic scores on SemanticKITTI fall below the reported VOIC numbers.
Figures
read the original abstract
Camera-based 3D Semantic Scene Completion (SSC) is a critical task for autonomous driving and robotic scene understanding. It aims to infer a complete 3D volumetric representation of both semantics and geometry from a single image. Existing methods typically focus on end-to-end 2D-to-3D feature lifting and voxel completion. However, they often overlook the interference between high-confidence visible-region perception and low-confidence occluded-region reasoning caused by single-image input, which can lead to feature dilution and error propagation. To address these challenges, we introduce an offline Visible Region Label Extraction (VRLE) strategy that explicitly separates and extracts voxel-level supervision for visible regions from dense 3D ground truth. This strategy purifies the supervisory space for two complementary sub-tasks: visible-region perception and occluded-region reasoning. Building on this idea, we propose the Visible-Occluded Interactive Completion Network (VOIC), a novel dual-decoder framework that explicitly decouples SSC into visible-region semantic perception and occluded-region scene completion. VOIC first constructs a base 3D voxel representation by fusing image features with depth-derived occupancy. The visible decoder focuses on generating high-fidelity geometric and semantic priors, while the occlusion decoder leverages these priors together with cross-modal interaction to perform coherent global scene reasoning. Extensive experiments on the SemanticKITTI and SSCBench-KITTI360 benchmarks demonstrate that VOIC outperforms existing monocular SSC methods in both geometric completion and semantic segmentation accuracy, achieving state-of-the-art performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes VOIC, a dual-decoder Visible-Occluded Interactive Completion Network for monocular 3D Semantic Scene Completion. It introduces an offline VRLE strategy to extract voxel-level visible-region labels from dense 3D ground truth, decoupling high-confidence visible perception from occluded-region reasoning via a base voxel representation fused from image features and depth occupancy, with the visible decoder producing priors for the occlusion decoder. Experiments claim SOTA geometric and semantic performance on SemanticKITTI and SSCBench-KITTI360 benchmarks.
Significance. If the performance gains are shown to arise from the architectural decoupling rather than privileged label extraction, the work offers a concrete approach to mitigating feature dilution and error propagation in single-image SSC. The explicit separation of visible and occluded sub-tasks with cross-modal interaction could improve global coherence in autonomous driving scenes, provided the method generalizes beyond the specific benchmarks.
major comments (3)
- [§3.2] §3.2 (VRLE strategy): The description of offline visible-region label extraction from dense 3D GT does not specify the exact procedure for computing visibility masks or handling boundary/low-confidence voxels. This leaves open the possibility of selection bias, where only regions already well-reconstructed by LiDAR/multi-view fusion receive supervision, undermining the claim that VRLE provides a clean separation for the dual-decoder interaction.
- [§4] §4 (Experiments): The manuscript reports SOTA results but provides insufficient detail on loss functions, network hyperparameters, and full ablation studies isolating the contribution of visible-decoder priors to the occlusion decoder. Without these, it is difficult to confirm that the geometric and semantic gains are load-bearing architectural improvements rather than artifacts of the VRLE supervision.
- [§3.1] §3.1 (Base voxel representation): The fusion of image features with depth-derived occupancy is presented as the starting point for both decoders, but no analysis is given on how errors in the initial depth estimation propagate through the visible-to-occlusion prior transfer, which is central to the interference-mitigation claim.
minor comments (2)
- [§3] Notation for the visible and occlusion decoders is introduced without a clear diagram or equation reference in the main text; a small schematic in Figure 2 would improve readability.
- [Abstract, §1] The abstract and §1 use the term 'parameter-free' in passing for certain priors; this should be removed or clarified since network hyperparameters and loss weights are explicitly listed as free parameters.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below with clarifications and commit to revisions that strengthen the manuscript without misrepresenting the current work.
read point-by-point responses
-
Referee: [§3.2] §3.2 (VRLE strategy): The description of offline visible-region label extraction from dense 3D GT does not specify the exact procedure for computing visibility masks or handling boundary/low-confidence voxels. This leaves open the possibility of selection bias, where only regions already well-reconstructed by LiDAR/multi-view fusion receive supervision, undermining the claim that VRLE provides a clean separation for the dual-decoder interaction.
Authors: We agree that §3.2 currently provides only a high-level overview of VRLE and omits the precise algorithmic steps. In the revised manuscript we will expand this section to detail the visibility mask computation: rays are cast from the camera center through each voxel using known intrinsics and extrinsics; a voxel is labeled visible only if its first intersection along the ray lies within the image frustum and depth range. Boundary voxels are handled by a 3-voxel dilation followed by a confidence threshold (0.7) derived from the dense GT occupancy variance; voxels below this threshold are excluded from visible supervision. To address selection bias, we will add a supporting analysis and table showing that VRLE labels are extracted uniformly across the entire dense GT volume, independent of any LiDAR reconstruction quality metric. revision: yes
-
Referee: [§4] §4 (Experiments): The manuscript reports SOTA results but provides insufficient detail on loss functions, network hyperparameters, and full ablation studies isolating the contribution of visible-decoder priors to the occlusion decoder. Without these, it is difficult to confirm that the geometric and semantic gains are load-bearing architectural improvements rather than artifacts of the VRLE supervision.
Authors: We acknowledge the need for greater experimental transparency. The revised §4 will include the complete loss formulation (voxel-wise cross-entropy for semantics weighted at 1.0, binary cross-entropy for geometry at 0.5, plus a consistency term between decoders), all hyperparameters (Adam optimizer, learning rate 1e-4 with cosine decay, batch size 4, 40 epochs), and additional ablation tables. These will isolate the visible-decoder priors by comparing the full VOIC model against variants that (i) remove prior transfer, (ii) replace priors with random features, and (iii) use only VRLE supervision without the dual-decoder interaction, thereby demonstrating that the reported gains stem from the architectural decoupling. revision: yes
-
Referee: [§3.1] §3.1 (Base voxel representation): The fusion of image features with depth-derived occupancy is presented as the starting point for both decoders, but no analysis is given on how errors in the initial depth estimation propagate through the visible-to-occlusion prior transfer, which is central to the interference-mitigation claim.
Authors: We thank the referee for identifying this gap. While the base voxel construction is described, the manuscript lacks explicit propagation analysis. In the revision we will add a new paragraph in §3.1 and a corresponding experiment in §4 that injects controlled Gaussian noise (σ = 0.1–0.5 m) into the depth maps and measures the resulting degradation in both visible and occluded predictions. The results will show that the visible-to-occlusion prior transfer reduces error propagation relative to single-decoder baselines, directly supporting the interference-mitigation claim. revision: yes
Circularity Check
No circularity detected; VRLE and dual-decoder architecture are independent of their inputs
full rationale
The paper defines VRLE as an offline preprocessing step that extracts visible voxel labels directly from dense 3D ground truth to create separate supervision signals for the two decoders. This extraction is a fixed, deterministic operation on external labels and does not redefine or predict any quantity that the network is later asked to output. The visible decoder then produces learned priors that are fed forward to the occlusion decoder; this is a standard architectural interaction trained end-to-end rather than a tautology in which the output is forced to equal the input by construction. No equations, uniqueness theorems, or self-citations are presented as load-bearing premises that collapse the claimed gains back to the training labels themselves. Benchmark results on held-out SemanticKITTI and SSCBench-KITTI360 test sets therefore constitute independent evaluation rather than a re-statement of the supervision pipeline.
Axiom & Free-Parameter Ledger
free parameters (1)
- network hyperparameters and loss weights
axioms (1)
- domain assumption Dense 3D ground truth labels are available and accurate for extracting visible-region supervision
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
VOIC explicitly decouples SSC into visible-region semantic perception and occluded-region scene completion... VRLE... produces a binary visibility mask M_vis... Y_vis = Y ⊙ M_vis
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
voxel grid... 256×256×32... three spatial dimensions implicit
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Monoscene: Monocular 3d semantic scene completion,
A.-Q. Cao and R. De Charette, “Monoscene: Monocular 3d semantic scene completion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3991–4001
work page 2022
-
[2]
S3cnet: A sparse semantic scene completion network for lidar point clouds,
R. Cheng, C. Agia, Y . Ren, X. Li, and L. Bingbing, “S3cnet: A sparse semantic scene completion network for lidar point clouds,” in Conference on Robot Learning, 2021, pp. 2148–2161
work page 2021
-
[3]
Multi-path sensory substitution device navigates the blind and visually impaired individuals,
Z. Han, S. Li, X. Wang, X. Hu, R. Higashita, and J. Liu, “Multi-path sensory substitution device navigates the blind and visually impaired individuals,”Displays, p. 103200, 2025
work page 2025
-
[4]
LODE: Locally Conditioned Eikonal Implicit Scene Completion from Sparse LiDAR,
P. Li, R. Zhao, Y . Shi, H. Zhao, J. Yuan, G. Zhou, and Y .-Q. Zhang, “LODE: Locally Conditioned Eikonal Implicit Scene Completion from Sparse LiDAR,” in2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 8269–8276
work page 2023
-
[5]
Semcity: Semantic scene generation with triplane diffusion,
J. Lee, S. Lee, C. Jo, W. Im, J. Seon, and S.-E. Yoon, “Semcity: Semantic scene generation with triplane diffusion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 28 337–28 347
work page 2024
-
[6]
V oxformer: Sparse voxel transformer for camera- based 3d semantic scene completion,
Y . Li, Z. Yu, C. Choy, C. Xiao, J. M. Alvarez, S. Fidler, C. Feng, and A. Anandkumar, “V oxformer: Sparse voxel transformer for camera- based 3d semantic scene completion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9087–9098
work page 2023
-
[7]
Symphonize 3d semantic scene completion with contextual instance queries,
H. Jiang, T. Cheng, N. Gao, H. Zhang, T. Lin, W. Liu, and X. Wang, “Symphonize 3d semantic scene completion with contextual instance queries,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 258–20 267
work page 2024
-
[8]
Semantickitti: A dataset for semantic scene understanding of lidar sequences,
J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9297–9307
work page 2019
-
[9]
Sscbench: A large-scale 3d semantic scene completion benchmark for autonomous driving,
Y . Li, S. Li, X. Liu, M. Gong, K. Li, N. Chen, Z. Wang, Z. Li, T. Jiang, and F. Yu, “Sscbench: A large-scale 3d semantic scene completion benchmark for autonomous driving,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024, pp. 13 333– 13 340
work page 2024
-
[10]
Semantic scene completion from a single depth image,
S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser, “Semantic scene completion from a single depth image,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1746–1754
work page 2017
-
[11]
3d sketch-aware semantic scene completion via semi-supervised structure prior,
X. Chen, K.-Y . Lin, C. Qian, G. Zeng, and H. Li, “3d sketch-aware semantic scene completion via semi-supervised structure prior,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 4193–4202
work page 2020
-
[12]
Rgbd based dimensional decomposition residual network for 3d semantic scene completion,
J. Li, Y . Liu, D. Gong, Q. Shi, X. Yuan, C. Zhao, and I. Reid, “Rgbd based dimensional decomposition residual network for 3d semantic scene completion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7693–7702
work page 2019
-
[13]
Cascaded context pyra- mid for full-resolution 3d semantic scene completion,
P. Zhang, W. Liu, Y . Lei, H. Lu, and X. Yang, “Cascaded context pyra- mid for full-resolution 3d semantic scene completion,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7801–7810
work page 2019
-
[14]
Openoccupancy: A large scale benchmark for surrounding semantic occupancy perception,
X. Wang, Z. Zhu, W. Xu, Y . Zhang, Y . Wei, X. Chi, Y . Ye, D. Du, J. Lu, and X. Wang, “Openoccupancy: A large scale benchmark for surrounding semantic occupancy perception,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17 850–17 859
work page 2023
-
[15]
Lmscnet: Lightweight multiscale 3d semantic completion,
L. Roldao, R. De Charette, and A. Verroust-Blondet, “Lmscnet: Lightweight multiscale 3d semantic completion,” in2020 International Conference on 3D Vision (3DV), 2020, pp. 111–119
work page 2020
-
[16]
X. Yan, J. Gao, J. Li, R. Zhang, Z. Li, R. Huang, and S. Cui, “Sparse single sweep lidar point cloud segmentation via learning contextual shape priors from scene completion,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, 2021, pp. 3101–3109
work page 2021
-
[17]
A multi-phase camera-LiDAR fusion network for 3D semantic segmentation with weak supervision,
X. Chang, H. Pan, W. Sun, and H. Gao, “A multi-phase camera-LiDAR fusion network for 3D semantic segmentation with weak supervision,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 8, pp. 3737–3746, 2023
work page 2023
-
[18]
LiDAR-camera continuous fusion in voxelized grid for semantic scene completion,
Z. Lu, B. Cao, and Q. Hu, “LiDAR-camera continuous fusion in voxelized grid for semantic scene completion,”IEEE Transactions on Circuits and Systems for Video Technology, 2024
work page 2024
-
[19]
Occformer: Dual-path transformer for vision-based 3d semantic occupancy prediction,
Y . Zhang, Z. Zhu, and D. Du, “Occformer: Dual-path transformer for vision-based 3d semantic occupancy prediction,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 9433–9443. 10
work page 2023
-
[20]
Ndc- scene: Boost monocular 3d semantic scene completion in normalized device coordinates space,
J. Yao, C. Li, K. Sun, Y . Cai, H. Li, W. Ouyang, and H. Li, “Ndc- scene: Boost monocular 3d semantic scene completion in normalized device coordinates space,” in2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 9421–9431
work page 2023
-
[21]
Tri-perspective view for vision-based 3d semantic occupancy prediction,
Y . Huang, W. Zheng, Y . Zhang, J. Zhou, and J. Lu, “Tri-perspective view for vision-based 3d semantic occupancy prediction,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9223–9232
work page 2023
-
[22]
Not all voxels are equal: Hardness-aware semantic scene completion with self- distillation,
S. Wang, J. Yu, W. Li, W. Liu, X. Liu, J. Chen, and J. Zhu, “Not all voxels are equal: Hardness-aware semantic scene completion with self- distillation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 14 792–14 801
work page 2024
-
[23]
Instance-aware monocular 3D semantic scene completion,
H. Xiao, H. Xu, W. Kang, and Y . Li, “Instance-aware monocular 3D semantic scene completion,”IEEE Transactions on Intelligent Trans- portation Systems, vol. 25, no. 7, pp. 6543–6554, 2024
work page 2024
-
[24]
Mixssc: Forward- backward mixture for vision-based 3d semantic scene completion,
M. Wang, Y . Ding, Y . Liu, Y . Qin, R. Li, and Z. Tang, “Mixssc: Forward- backward mixture for vision-based 3d semantic scene completion,”IEEE Transactions on Circuits and Systems for Video Technology, 2025
work page 2025
-
[25]
Hierarchical Temporal Context Learning for Camera-Based Semantic Scene Completion,
B. Li, J. Deng, W. Zhang, Z. Liang, D. Du, X. Jin, and W. Zeng, “Hierarchical Temporal Context Learning for Camera-Based Semantic Scene Completion,” inComputer Vision – ECCV 2024, A. Leonardis, E. Ricci, S. Roth, O. Russakovsky, T. Sattler, and G. Varol, Eds., Cham, 2025, vol. 15062, pp. 131–148
work page 2024
-
[26]
J. Lin, J. Zhou, W. Xu, R. Xu, C. Wang, S. Chen, K. Fu, Y . Shao, L. Guo, and S. Xu, “CurriFlow: Curriculum-Guided Depth Fusion with Optical Flow-Based Temporal Alignment for 3D Semantic Scene Completion,” Oct. 2025
work page 2025
-
[27]
One Step Closer: Creating the Future to Boost Monocular Semantic Scene Completion,
H. Lu, Y . Su, X. Zhang, and H. Hu, “One Step Closer: Creating the Future to Boost Monocular Semantic Scene Completion,” Jul. 2025
work page 2025
-
[28]
Unleashing Semantic and Geometric Priors for 3D Scene Completion,
S. Chen, W. Sui, B. Zhang, Z. Boukhers, J. See, and C. Yang, “Unleashing Semantic and Geometric Priors for 3D Scene Completion,” Aug. 2025
work page 2025
-
[29]
F. Gao, Y . Chen, K. Wang, P. Zhou, and J. Lu, “MVFormer: UNet-like Transformer with Mix-V oxel Attention for Camera-Based 3D Semantic Scene Completion,”IEEE Transactions on Circuits and Systems for Video Technology, 2025
work page 2025
-
[30]
Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance,
D.-H. Pham, D.-D. Nguyen, A. Pham, T. Ho, P. Nguyen, K. Nguyen, and R. Nguyen, “Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, 2025, pp. 6514–6522
work page 2025
-
[31]
SPHERE: Semantic-PHysical Engaged REpre- sentation for 3D Semantic Scene Completion,
Z. Yang and Y . Peng, “SPHERE: Semantic-PHysical Engaged REpre- sentation for 3D Semantic Scene Completion,” inProceedings of the 33rd ACM International Conference on Multimedia, Dublin Ireland, Oct. 2025, pp. 7681–7690
work page 2025
-
[32]
Memory-Augmented Re-Completion for 3D Semantic Scene Completion,
Y .-W. Tseng, S.-P. Yang, J.-C. Wu, I.-B. Liao, Y .-H. Li, H.-H. Shuai, and W.-H. Cheng, “Memory-Augmented Re-Completion for 3D Semantic Scene Completion,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, 2025, pp. 7446–7454
work page 2025
-
[33]
Mask dino: Towards a unified transformer-based framework for object detection and segmentation,
F. Li, H. Zhang, H. Xu, S. Liu, L. Zhang, L. M. Ni, and H.-Y . Shum, “Mask dino: Towards a unified transformer-based framework for object detection and segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3041–3050
work page 2023
-
[34]
Deformable DETR: Deformable Transformers for End-to-End Object Detection,
X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable DETR: Deformable Transformers for End-to-End Object Detection,” Mar. 2021
work page 2021
-
[35]
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,”IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017
work page 2017
-
[36]
Beverse: Unified perception and prediction in birds-eye-view for vision-centric autonomous driving,
Y . Zhang, Z. Zhu, W. Zheng, J. Huang, G. Huang, J. Zhou, and J. Lu, “Beverse: Unified perception and prediction in birds-eye-view for vision- centric autonomous driving,”arXiv preprint arXiv:2205.09743, 2022
-
[37]
Bevdepth: Acquisition of reliable depth for multi-view 3d object detec- tion,
Y . Li, Z. Ge, G. Yu, J. Yang, Z. Wang, Y . Shi, J. Sun, and Z. Li, “Bevdepth: Acquisition of reliable depth for multi-view 3d object detec- tion,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, 2023, pp. 1477–1485
work page 2023
-
[38]
BEVDet: High- performance Multi-camera 3D Object Detection in Bird-Eye-View,
J. Huang, G. Huang, Z. Zhu, Y . Ye, and D. Du, “BEVDet: High- performance Multi-camera 3D Object Detection in Bird-Eye-View,” Jun. 2022
work page 2022
-
[39]
Mobilestereonet: Towards lightweight deep networks for stereo matching,
F. Shamsafar, S. Woerz, R. Rahim, and A. Zell, “Mobilestereonet: Towards lightweight deep networks for stereo matching,” inProceedings of the Ieee/Cvf Winter Conference on Applications of Computer Vision, 2022, pp. 2417–2426
work page 2022
-
[40]
Decoupled Weight Decay Regularization,
I. Loshchilov and F. Hutter, “Decoupled Weight Decay Regularization,” Jan. 2019
work page 2019
-
[41]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778
work page 2016
-
[42]
Camera-based 3d semantic scene completion with sparse guidance network,
J. Mei, Y . Yang, M. Wang, J. Zhu, J. Ra, Y . Ma, L. Li, and Y . Liu, “Camera-based 3d semantic scene completion with sparse guidance network,”IEEE Transactions on Image Processing, 2024
work page 2024
-
[43]
Context and geometry aware voxel transformer for semantic scene completion,
Z. Yu, R. Zhang, J. Ying, J. Yu, X. Hu, L. Luo, S.-Y . Cao, and H.- L. Shen, “Context and geometry aware voxel transformer for semantic scene completion,”Advances in Neural Information Processing Systems, vol. 37, pp. 1531–1555, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.