PoInit-of-View: Poisoning Initialization of Views Transfers Across Multiple 3D Reconstruction Systems

Weijie Wang , Songlong Xing , Zhengyu Zhao , Nicu Sebe , Bruno Lepri

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:53 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords reconstructionsystemspoinit-of-viewpoisoningviewsacrossadversarialcross-view

0 comments

The pith

PoInit-of-View poisons SfM initialization by optimizing cross-view gradient inconsistencies to disrupt keypoint detection and feature matching, yielding transferable degradation in rendered 3D reconstruction quality across systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

3D reconstruction systems often start by estimating camera positions and 3D points from a set of input images using structure-from-motion. The attack adds small, carefully chosen noise to those images so that the same 3D point projects to locations whose gradients point in inconsistent directions across views. This breaks the keypoint detectors and matchers that SfM relies on, producing wrong camera poses and bad 3D point clouds. The corrupted initialization then propagates through later stages such as NeRF or 3DGS training, resulting in blurry or distorted output views. Experiments show the poisoned views transfer better than simply attacking a single reconstruction pipeline end-to-end.

Core claim

we propose PoInit-of-View, which optimizes adversarial perturbations to intentionally introduce cross-view gradient inconsistencies at projections of corresponding 3D points. These inconsistencies disrupt keypoint detection and feature matching, thereby corrupting pose estimation and triangulation within SfM, eventually resulting in low-quality rendered views.

Load-bearing premise

That cross-view gradient inconsistencies optimized on one SfM implementation will reliably collapse correspondences in the SfM modules of unseen target reconstruction systems without any access to those systems' parameters or training data.

Figures

Figures reproduced from arXiv: 2604.16540 by Bruno Lepri, Nicu Sebe, Songlong Xing, Weijie Wang, Zhengyu Zhao.

**Figure 1.** Figure 1: Our PoInit-of-View method injects imperceptible perturbations into input views (top) to induce cross-view inconsistency (see details in [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Framework of PoInit-of-View. for subsequent dense reconstruction or neural optimization; without a reliable SfM initialization, downstream optimizers often fail to converge to a coherent scene. Formally, let {Ii} N i=1 denote the input views. For image i, we denote keypoints by pi,k ∈ R 2 and their associated descriptors by ϕ(pi,k) ∈ R d . Candidate correspondences between a keypoint pi,k in image i and p… view at source ↗

**Figure 3.** Figure 3: Cross-view gradient inconsistency causes geometric [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of clean and poisoned reconstructions across multiple scenes on [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 6.** Figure 6: visualizes the clean, poisoned, and amplifieddifference views to qualitatively assess the subtlety of Clean Poisoned Amplified Diff ( ×8) [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 8.** Figure 8: Gradient-difference comparison on T&T. Our perturbation remains imperceptible while inducing clear edge-aligned discrepancies, unlike random noise, indicating targeted disruption of SfM-relevant structures. 0.0 0.2 0.26 0.4 0.6 0.8 1.0 1.1 Weight of Lth 40 60 80 Registered images (%) Lth = g + d/ r 20 40 60 80 Triangulated keypoints (k) Registered images (%) Triangulated keypoints (k) The estimated Lth … view at source ↗

**Figure 9.** Figure 9: Impact of cross-view inconsistency on SfM. Increasing [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

**Figure 7.** Figure 7: Ablation on perturbation budget ε, showing that larger perturbations cause greater degradation in registration and reconstruction. Effect of Structured Perturbations vs. Random Noise. We compute gradient-difference maps using the Sobel [14] operator between the clean and poisoned images for visualization, as shown in [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

Poisoning input views of 3D reconstruction systems has been recently studied. However, we identify that existing studies simply backpropagate adversarial gradients through the 3D reconstruction pipeline as a whole, without uncovering the new vulnerability rooted in specific modules of the 3D reconstruction pipeline. In this paper, we argue that the structure-from-motion (SfM) initialization, as the geometric core of many widely used reconstruction systems, can be targeted to achieve transferable poisoning effects across diverse 3D reconstruction systems. To this end, we propose PoInit-of-View, which optimizes adversarial perturbations to intentionally introduce cross-view gradient inconsistencies at projections of corresponding 3D points. These inconsistencies disrupt keypoint detection and feature matching, thereby corrupting pose estimation and triangulation within SfM, eventually resulting in low-quality rendered views. We also provide a theoretical analysis that connects cross-view inconsistency to correspondence collapse. Experimental results demonstrate the effectiveness of our PoInit-of-View on diverse 3D reconstruction systems and datasets, surpassing the single-view baseline by 25.1% in PSNR and 16.5% in SSIM in black-box transfer settings, such as 3DGS to NeRF.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review supplies insufficient detail to enumerate concrete free parameters, axioms, or invented entities; the method implicitly assumes standard differentiability of SfM components and the existence of corresponding 3D points across views.

axioms (1)

domain assumption SfM pipelines rely on keypoint detection and feature matching that are sensitive to small gradient inconsistencies at projected 3D points
Invoked to justify why the optimized perturbations corrupt pose estimation and triangulation

pith-pipeline@v0.9.0 · 5526 in / 1346 out tokens · 21496 ms · 2026-05-10T08:53:39.389342+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Medoid Prototype Alignment for Cross-Plant Unknown Attack Detection in Industrial Control Systems
cs.CR 2026-04 unverdicted novelty 6.0

Medoid prototype alignment detects unknown attacks across industrial plants by aligning domain-specific medoid summaries rather than raw samples, yielding 0.843 average accuracy on gas and water system transfers.

Reference graph

Works this paper leans on

58 extracted references · 2 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Barron, Ben Mildenhall, Matthew Tancik, Pe- ter Hedman, Ricardo Martin-Brualla, and Pratul P

Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Pe- ter Hedman, Ricardo Martin-Brualla, and Pratul P. Srini- vasan. Mip-nerf 360: Unbounded anti-aliased neural radi- ance fields. InCVPR, 2022. 1, 5

2022
[2]

Sg-nerf: Neural surface recon- struction with scene graph optimization

Yiyang Chen, Siyan Dong, Xulong Wang, Lulu Cai, Youyi Zheng, and Yanchao Yang. Sg-nerf: Neural surface recon- struction with scene graph optimization. InECCV, 2024. 1, 2

2024
[3]

Boosting adversarial at- tacks with momentum

Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial at- tacks with momentum. InCVPR, 2018. 2

2018
[4]

Viewfool: Evaluating the robustness of visual recognition to adversarial viewpoints

Yinpeng Dong, Shouwei Ruan, Hang Su, Caixin Kang, Xingxing Wei, and Jun Zhu. Viewfool: Evaluating the robustness of visual recognition to adversarial viewpoints. NeurIPS, 2022. 2

2022
[5]

Gausstrap: Stealthy poisoning attacks on 3d gaussian splat- ting for targeted scene confusion, 2025

Jiaxin Hong, Sixu Chen, Shuoyang Sun, Hongyao Yu, Hao Fang, Yuqi Tan, Bin Chen, Shuhan Qi, and Jiawei Li. Gausstrap: Stealthy poisoning attacks on 3d gaussian splat- ting for targeted scene confusion, 2025. 2

2025
[6]

Image quality metrics: Psnr vs

Alain Hore and Djamel Ziou. Image quality metrics: Psnr vs. ssim.ICPR, pages 2366–2369, 2010. 5, 7

2010
[7]

Targeted adversarial at- tacks on generalizable neural radiance fields

Andr ´as Horv´ath and Csaba M J´ozsa. Targeted adversarial at- tacks on generalizable neural radiance fields. InICCV, 2023. 1, 2

2023
[8]

Driveadapter: Breaking the coupling barrier of perception and planning in end-to-end autonomous driving

Xiaosong Jia, Yulu Gao, Li Chen, Junchi Yan, Patrick Langechuan Liu, and Hongyang Li. Driveadapter: Breaking the coupling barrier of perception and planning in end-to-end autonomous driving. InICCV), 2023. 1

2023
[9]

Vad: Vectorized scene representation for efficient autonomous driving

Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. InICCV, 2023. 1

2023
[10]

Nerfail: Neural radiance fields-based multi- view adversarial attack

Wenxiang Jiang, Hanwei Zhang, Xi Wang, Zhongwen Guo, and Hao Wang. Nerfail: Neural radiance fields-based multi- view adversarial attack. InAAAI, 2024. 2

2024
[11]

Mpam-3dgs: Multi- parametric adversarial manipulation for 3d gaussian splat- ting

Wenxiang Jiang, Hanwei Zhang, Weigang Wang, Zhongwen Guo, Tianao Zhang, and Hao Wang. Mpam-3dgs: Multi- parametric adversarial manipulation for 3d gaussian splat- ting. InICASSP. IEEE, 2025. 2

2025
[12]

Stealthattack: Robust 3d gaussian splatting poisoning via density-guided illusions

Bo-Hsu Ke, You-Zhe Xie, Yu-Lun Liu, and Wei-Chen Chiu. Stealthattack: Robust 3d gaussian splatting poisoning via density-guided illusions. InICCV, 2025. 1, 2

2025
[13]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics,

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics,
[14]

On the accuracy of the sobel edge detector

Josef Kittler. On the accuracy of the sobel edge detector. Image and Vision Computing, 1983. 8

1983
[15]

Tanks and temples: Benchmarking large-scale scene reconstruction

Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction. InACM Transactions on Graphics (TOG),
[16]

NerfBaselines: Consis- tent and Reproducible Evaluation of Novel View Synthesis Methods

Jonas Kulhanek and Torsten Sattler. NerfBaselines: Consis- tent and Reproducible Evaluation of Novel View Synthesis Methods. InNeurIPS, 2025. 6

2025
[17]

Freeinsert: Disentangled text-guided object insertion in 3d gaussian scene without spatial priors, 2025

Chenxi Li, Weijie Wang, Qiang Li, Bruno Lepri, Nicu Sebe, and Weizhi Nie. Freeinsert: Disentangled text-guided object insertion in 3d gaussian scene without spatial priors, 2025. 1

2025
[18]

Cross-modal and uncertainty-aware agglomeration for open- vocabulary 3d scene understanding

Jinlong Li, Cristiano Saltori, Fabio Poiesi, and Nicu Sebe. Cross-modal and uncertainty-aware agglomeration for open- vocabulary 3d scene understanding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19390–19400, 2025

2025
[19]

Token Reduction via Local and Global Contexts Optimization for Efficient Video Large Language Models

Jinlong Li, Liyuan Jiang, Haonan Zhang, and Nicu Sebe. Token reduction via local and global contexts optimization for efficient video large language models.arXiv preprint arXiv:2603.01400, 2026. 1

work page internal anchor Pith review Pith/arXiv arXiv 2026
[20]

Adv3d: Generating 3d adversarial examples in driving sce- narios with nerf

Leheng Li, Yiming Zhang, Jialei Wang, and Aimin Zhou. Adv3d: Generating 3d adversarial examples in driving sce- narios with nerf. InNeurIPS, 2023. 2

2023
[21]

Robotic visual instruction

Yanbang Li, Ziyang Gong, Haoyang Li, Xiaoqi Huang, Haolan Kang, Guangping Bai, and Xianzheng Ma. Robotic visual instruction. InCVPR, 2025. 1

2025
[22]

Barf: Bundle-adjusting neural radiance fields

Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Si- mon Lucey. Barf: Bundle-adjusting neural radiance fields. InICCV, 2021. 2

2021
[23]

Nerfool: Uncovering the vulnerability of generaliz- able neural radiance fields against adversarial perturbations

Yingyan Lin, Kai Zhang, Meng Wang, Zhiyuan Li, and Fan Yang. Nerfool: Uncovering the vulnerability of generaliz- able neural radiance fields against adversarial perturbations. InICML, 2023. 1, 2, 6

2023
[24]

Delving into transferable adversarial examples and black- box attacks

Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black- box attacks. InICLR, 2017. 2

2017
[25]

Distinctive image features from scale- invariant keypoints

David G Lowe. Distinctive image features from scale- invariant keypoints. InIJCV, 2004. 4

2004
[26]

Poison-splat: Computation cost attack on 3d gaussian splatting

Jiahao Lu, Yifan Zhang, Qiuhong Shen, Xinchao Wang, and Shuicheng Y AN. Poison-splat: Computation cost attack on 3d gaussian splatting. InICLR, 2025. 2

2025
[27]

Real-time simulated avatar from head-mounted sensors

Zhengyi Luo, Jinkun Cao, Rawal Khirodkar, Alexander Win- kler, Kris Kitani, and Weipeng Xu. Real-time simulated avatar from head-mounted sensors. InCVPR, 2024. 1

2024
[28]

Il2-nerf: Advancing adversarial robustness in generalizable nerfs

Nicole Meng, Rui Zhao, Xin Wang, and Hao Zhou. Il2-nerf: Advancing adversarial robustness in generalizable nerfs. In CVPR, 2025. 1, 2, 6

2025
[29]

Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 2021

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 2021. 2, 5

2021
[30]

Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans. Graph., 41(4):102:1– 102:15, 2022. 2, 5, 7

2022
[31]

T2td: Text-3d generation model based on prior knowledge guidance.IEEE TPAMI, 2024

Weizhi Nie, Ruidong Chen, Weijie Wang, Bruno Lepri, and Nicu Sebe. T2td: Text-3d generation model based on prior knowledge guidance.IEEE TPAMI, 2024. 1

2024
[32]

Rethinking depth estimation for multi- view stereo: A unified representation

Rui Peng, Rongjie Wang, Zhenyu Wang, Yawen Lai, and Ronggang Wang. Rethinking depth estimation for multi- view stereo: A unified representation. InCVPR, 2022. 2

2022
[33]

Unmix-nerf: Spectral un- mixing meets neural radiance fields.ICCV, 2025

Fabian Perez, Sara Rojas, Carlos Hinojosa, Hoover Rueda- Chac ˜A`gn, and Bernard Ghanem. Unmix-nerf: Spectral un- mixing meets neural radiance fields.ICCV, 2025. 2

2025
[34]

Structure-from-motion revisited

Johannes Lutz Sch ¨onberger and Jan-Michael Frahm. Structure-from-motion revisited. InCVPR, 2016. 2, 5

2016
[35]

Pixelwise view selection for unstructured multi-view stereo

Johannes L Sch ¨onberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. Pixelwise view selection for unstructured multi-view stereo. InECCV, 2016. 2, 5

2016
[36]

Deep- voxels: Learning persistent 3d feature embeddings

Vincent Sitzmann, Justus Thies, Felix Heide, Matthias Nießner, Gordon Wetzstein, and Michael Zollh ¨ofer. Deep- voxels: Learning persistent 3d feature embeddings. In CVPR, 2019. 2

2019
[37]

Geometry cloak: Preventing tgs-based 3d re- construction from copyrighted images.NeurIPS, 2024

Qi Song, Ziyuan Luo, Ka Chun Cheung, Simon See, and Renjie Wan. Geometry cloak: Preventing tgs-based 3d re- construction from copyrighted images.NeurIPS, 2024. 1, 2

2024
[38]

Neuralrecon: Real-time coherent 3d reconstruc- tion from monocular video

Jiaming Sun, Yiming Xie, Linghao Chen, Xiaowei Zhou, and Hujun Bao. Neuralrecon: Real-time coherent 3d reconstruc- tion from monocular video. InCVPR, 2021. 2

2021
[39]

Vgos: V oxel grid op- timization for view synthesis from sparse inputs

Jiakai Sun, Zhanjie Zhang, Jiafu Chen, Guangyuan Li, Boyan Ji, Lei Zhao, and Wei Xing. Vgos: V oxel grid op- timization for view synthesis from sparse inputs. InIJCAI,
[40]

S. Ullman. The interpretation of structure from motion.Pro- ceedings of the Royal Society of London. Series B. Biological Sciences, 203(1153):405–426, 1979. 1

1979
[41]

Benchmarking robustness in neural radiance fields

Chen Wang, Angtian Wang, Junbo Li, Alan Yuille, and Cihang Xie. Benchmarking robustness in neural radiance fields. InCVPR, 2024. 1

2024
[42]

Turn fake into real: Adversarial head turn attacks against deepfake detection

Weijie Wang, Zhengyu Zhao, Nicu Sebe, and Bruno Lepri. Turn fake into real: Adversarial head turn attacks against deepfake detection.arXiv preprint arXiv:2309.01104, 2023. 1

work page arXiv 2023
[43]

Uvmap-id: A controllable and personalized uv map generative model

Weijie Wang, Jichao Zhang, Chang Liu, Xia Li, Xingqian Xu, Humphrey Shi, Nicu Sebe, and Bruno Lepri. Uvmap-id: A controllable and personalized uv map generative model. In ACM MM, pages 10725–10734, 2024. 1

2024
[44]

Fully-geometric cross-attention for point cloud registration

Weijie Wang, Guofeng Mei, Jian Zhang, Nicu Sebe, Bruno Lepri, and Fabio Poiesi. Fully-geometric cross-attention for point cloud registration. In3DV. IEEE, 2025. 1

2025
[45]

Mvster: Epipo- lar transformer for efficient multi-view stereo

Xiaofeng Wang, Zheng Zhu, Guan Huang, Fangbo Qin, Yun Ye, Yijia He, Xu Chi, and Xingang Wang. Mvster: Epipo- lar transformer for efficient multi-view stereo. InECCV. Springer, 2022. 2

2022
[46]

Wild-gs: Real- time novel view synthesis from unconstrained photo collec- tions.NeurIPS, 2024

Jiacong Xu, Yiqun Mei, and Vishal Patel. Wild-gs: Real- time novel view synthesis from unconstrained photo collec- tions.NeurIPS, 2024. 2

2024
[47]

Imvid: Immersive volumetric videos for en- hanced vr engagement

Zhengxian Yang, Shi Pan, Shengqi Wang, Haoxiang Wang, Li Lin, Guanjun Li, Zhengqi Wen, Borong Lin, Jianhua Tao, and Tao Yu. Imvid: Immersive volumetric videos for en- hanced vr engagement. InCVPR, 2025. 1

2025
[48]

Mvsnet: Depth inference for unstructured multi-view stereo

Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. Mvsnet: Depth inference for unstructured multi-view stereo. InECCV, 2018. 2

2018
[49]

Fewviewgs: Gaussian splatting with few view matching and multi-stage training.NeurIPS, 2024

Ruihong Yin, Vladimir Yugay, Yue Li, Sezer Karaoglu, and Theo Gevers. Fewviewgs: Gaussian splatting with few view matching and multi-stage training.NeurIPS, 2024. 2

2024
[50]

Mip-splatting: Alias-free 3d gaussian splat- ting.CVPR, 2024

Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting.CVPR, 2024. 5, 6, 7, 8

2024
[51]

Vladimir Yugay, Theo Gevers, and Martin R. Oswald. Magic-slam: Multi-agent gaussian globally consistent slam. InCVPR, 2025. 1

2025
[52]

Gaussian splatting under attack: Investigating ad- versarial noise in 3d objects

Abdurrahman Zeybey, Mehmet Ergezer, and Tommy Nguyen. Gaussian splatting under attack: Investigating ad- versarial noise in 3d objects. InNeurIPS, 2024. 2

2024
[53]

Structural multiplane image: Bridging neural view synthesis and 3d reconstruction

Mingfang Zhang, Jinglu Wang, Xiao Li, Yifei Huang, Yoichi Sato, and Yan Lu. Structural multiplane image: Bridging neural view synthesis and 3d reconstruction. InCVPR, 2023. 2

2023
[54]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018. 5, 7

2018
[55]

On suc- cess and simplicity: A second look at transferable targeted attacks.NeurIPS, 2021

Zhengyu Zhao, Zhuoran Liu, and Martha Larson. On suc- cess and simplicity: A second look at transferable targeted attacks.NeurIPS, 2021. 2

2021
[56]

Revisiting transferable adversarial images: System- ization, evaluation, and new insights.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–16,

Zhengyu Zhao, Hanwei Zhang, Renjue Li, Ronan Sicre, Lau- rent Amsaleg, Michael Backes, Qi Li, Qian Wang, and Chao Shen. Revisiting transferable adversarial images: System- ization, evaluation, and new insights.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–16,
[57]

Physical 3d adversarial attacks against monocular depth estimation in autonomous driving

Junhao Zheng, Chenhao Lin, Jiahao Sun, Zhengyu Zhao, Qian Li, and Chao Shen. Physical 3d adversarial attacks against monocular depth estimation in autonomous driving. InCVPR, 2024. 2

2024
[58]

Stereo magnification: Learning view syn- thesis using multiplane images

Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. Stereo magnification: Learning view syn- thesis using multiplane images. InSIGGRAPH Asia, 2018. 2

2018