Not All Tasks Quantize Equally: Fisher-Guided Quantization for Visual Geometry Transformer

Chuanguang Yang; Jiehao Luo; Jintao Cheng; Weilun Feng; Wei Zhang; Yipu Zhang; Yongjun Xu; Zhulin An

arxiv: 2605.15828 · v2 · pith:5UT5WWSVnew · submitted 2026-05-15 · 💻 cs.CV

Not All Tasks Quantize Equally: Fisher-Guided Quantization for Visual Geometry Transformer

Yipu Zhang , Jintao Cheng , Weilun Feng , Jiehao Luo , Chuanguang Yang , Zhulin An , Yongjun Xu , Wei Zhang This is my paper

Pith reviewed 2026-05-25 06:34 UTC · model grok-4.3

classification 💻 cs.CV

keywords post-training quantizationvisual geometry transformerfisher information matrixmulti-task 3D reconstructiontransformer compressionsensitivity-aware quantizationdepth estimationcamera pose estimation

0 comments

The pith

Different geometry tasks in a shared transformer backbone have unequal sensitivity to quantization errors, so weighting calibration by per-task Fisher information preserves accuracy on the most affected outputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Feed-forward 3D reconstruction models such as VGGT run multiple tasks including depth estimation, camera pose prediction, and point cloud reconstruction through one shared transformer backbone. Because blocks and channels contribute differently to each task, quantization errors hit some outputs harder than others when all tasks are treated the same. The paper claims that the diagonal Fisher information matrix can measure these varying sensitivities and that feeding the ranks into the learnable affine transformations used during post-training calibration protects the critical parts for each task. A reader would care because the approach reduces the accuracy penalty when compressing billion-parameter models enough for on-device use without having to train separate networks per task.

Core claim

FGQ uses the diagonal Fisher information matrix to quantify the different sensitivities across tasks, blocks, and channels, and incorporates these sensitivities into the Learnable Affine Transformation during calibration to better preserve the channels and blocks most critical to each task, yielding up to 39 percent relative improvement under 4-bit quantization on VGGT across camera pose estimation, point map reconstruction, and depth estimation.

What carries the argument

Diagonal Fisher information matrix that ranks quantization sensitivity per task, block, and channel and then scales the learnable affine transformations applied during post-training quantization calibration.

If this is right

Quantized VGGT keeps higher accuracy on sensitive tasks such as depth estimation and pose prediction than uniform PTQ baselines.
A single shared backbone can be compressed once and still serve multiple downstream geometry tasks without large per-task drops.
The method delivers consistent gains across camera pose, point map, and depth outputs under 4-bit weights.
Calibration now explicitly protects task-critical channels rather than averaging sensitivity across all outputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same ranking step could be tested on other multi-task vision transformers whose backbone is shared across outputs with different error tolerances.
Choosing calibration images that better sample the statistics of each individual task might further strengthen the Fisher ranks.
If the Fisher diagonal already captures the dominant error modes, adding second-order cross terms between channels would likely add little extra benefit.

Load-bearing premise

The diagonal Fisher information matrix computed on calibration data accurately ranks which blocks and channels matter most for accuracy on each geometric task after quantization.

What would settle it

Applying the Fisher-guided scaling to the same VGGT model and calibration set produces no measurable reduction in the accuracy gap between sensitive and insensitive tasks relative to standard uniform calibration under identical 4-bit settings.

Figures

Figures reproduced from arXiv: 2605.15828 by Chuanguang Yang, Jiehao Luo, Jintao Cheng, Weilun Feng, Wei Zhang, Yipu Zhang, Yongjun Xu, Zhulin An.

**Figure 1.** Figure 1: FGQ effectively preserves the point map reconstruction fidelity under 4-bit quantization, surpassing previous state-of-the-art VGGT (QuantVGGT [Feng et al., 2025]) quantization. Abstract Feed-forward 3D reconstruction models, represented by Visual Geometry Grounded Transformer (VGGT), jointly predict multiple visual geometry tasks such as depth estimation, camera pose prediction, and point map reconstructi… view at source ↗

**Figure 2.** Figure 2: Normalized Performance Degradation (NPD) across various tasks. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Fisher information accurately predicts quantization sensitivity and reveals task-specific [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Normalized Speedup over the same W4A4 quantization settings. We evaluate the runtime efficiency of our W4A4 quantized model on a single NVIDIA A100 40G GPU. To measure the benefit of real low-bit execution, we use fake quantization as the baseline and report speedups from the real quantized implementation. Our implementation builds on the Triton kernel from FlatQuant [Sun et al., 2025b] and tailors it to … view at source ↗

**Figure 5.** Figure 5: Visual comparison results for VGGT on DTU [Jensen et al., 2014] for Ground Truth, [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Visual comparison results for VGGT on 7-Scenes [Shotton et al., 2013] for Ground Truth, [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Visual comparison results for π 3 on 7-Scenes [Shotton et al., 2013] for Ground Truth, RTN W4A4, and FGQ W4A4. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

read the original abstract

Feed-forward 3D reconstruction models, represented by Visual Geometry Grounded Transformer (VGGT), jointly predict multiple visual geometry tasks such as depth estimation, camera pose prediction, and point cloud reconstruction in a single forward pass. They have been widely adopted in 3D vision applications, but their billion-scale parameters bring substantial memory and computation overhead, posing challenges for on-device deployment. Post-Training Quantization (PTQ) is an effective technique to reduce this overhead. Existing PTQ methods for feed-forward 3D models mainly focus on handling heavy-tailed activation distributions and constructing diverse calibration datasets. However, we observe that feed-forward 3D models predict multiple geometric attributes through a shared backbone, where different transformer blocks and hidden channels contribute distinctly to each task, resulting in substantially different sensitivities to quantization errors across tasks, blocks, and channels. Consequently, treating all tasks equally over-emphasizes insensitive tasks and causes significant accuracy loss on the sensitive ones. To address this issue, we propose Fisher-Guided Quantization (FGQ) for feed-forward 3D reconstruction models. Specifically, FGQ uses the diagonal Fisher information matrix to quantify the different sensitivities across tasks, blocks, and channels, and incorporates these sensitivities into the Learnable Affine Transformation during calibration to better preserve the channels and blocks most critical to each task. Extensive experiments across camera pose estimation, point map reconstruction, and depth estimation show that FGQ consistently outperforms state-of-the-art quantization baselines on VGGT, achieving up to 39% relative improvement under the 4-bit quantization. Code is available at https://github.com/ypzhng/FGQ.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces Fisher-Guided Quantization (FGQ) for the Visual Geometry Grounded Transformer (VGGT), a feed-forward model for multiple 3D geometry tasks including depth estimation, camera pose prediction, and point cloud reconstruction. The key observation is that these tasks have different sensitivities to quantization due to varying contributions from shared transformer blocks and channels. FGQ leverages the diagonal Fisher information matrix computed on calibration data to measure these sensitivities and adjusts the learnable affine transformations accordingly to better protect task-critical elements. This leads to improved performance over standard PTQ methods, with reported gains of up to 39% relative improvement in 4-bit quantization settings across the tasks.

Significance. Should the central mechanism hold, FGQ offers a task-aware approach to PTQ that could benefit other multi-task vision transformers by avoiding uniform treatment that harms sensitive tasks. The empirical results on VGGT are promising, and the open-sourced code at the provided GitHub link enhances the work's value for the community by supporting reproducibility.

major comments (1)

[Method] Method section (Fisher-guided modulation of learnable affine transforms): The claim that the diagonal Fisher information matrix computed on calibration data produces a reliable per-task ranking of block and channel quantization sensitivity is load-bearing for the central contribution. However, the manuscript provides no direct validation (e.g., correlation between derived ranks and measured task-specific accuracy degradation when selectively increasing quantization error on high-Fisher components), leaving the stress-test concern unaddressed that local curvature may not predict effects of structured quantization perturbations and that the diagonal approximation ignores common channel correlations in transformer activations.

minor comments (1)

[Abstract] Abstract: the phrase 'up to 39% relative improvement' should specify the exact baseline method and error metric for clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. Below we address the major comment.

read point-by-point responses

Referee: The claim that the diagonal Fisher information matrix computed on calibration data produces a reliable per-task ranking of block and channel quantization sensitivity is load-bearing for the central contribution. However, the manuscript provides no direct validation (e.g., correlation between derived ranks and measured task-specific accuracy degradation when selectively increasing quantization error on high-Fisher components), leaving the stress-test concern unaddressed that local curvature may not predict effects of structured quantization perturbations and that the diagonal approximation ignores common channel correlations in transformer activations.

Authors: We agree that an explicit correlation analysis between Fisher-derived rankings and measured task-specific degradation under selective perturbations would strengthen the central claim. The current manuscript relies on the established role of (diagonal) Fisher information as a local sensitivity measure in the quantization and pruning literature, together with the observed 39% relative gains on sensitive tasks as indirect support. The diagonal approximation is adopted for tractability on billion-parameter models; the learnable affine transformations are intended to mitigate some effects of ignored correlations during calibration. To directly address the referee's stress-test concern we will add a new experiment subsection that (i) ranks blocks/channels by Fisher value per task, (ii) selectively increases quantization noise on high- versus low-Fisher elements, and (iii) reports the resulting task-specific accuracy degradation together with rank-correlation plots. This addition will be included in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: method is a heuristic incorporation of standard Fisher information into PTQ calibration

full rationale

The paper describes FGQ as using the diagonal Fisher information matrix (computed on calibration data) to rank task/block/channel sensitivities and modulate learnable affine transforms. This is presented as an empirical design choice rather than a derivation from first principles. No equations are shown that reduce a claimed prediction back to fitted inputs by construction, and no self-citation chain is invoked to justify uniqueness or load-bearing premises. Results are reported as empirical improvements on held-out geometric tasks, consistent with the reader's assessment of score 2.0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters are introduced beyond the standard learnable affine parameters already present in the baseline PTQ method. The central assumption is the domain-standard use of diagonal Fisher information as a sensitivity proxy; no new entities are postulated.

axioms (1)

domain assumption The diagonal of the Fisher information matrix computed on calibration data ranks the relative importance of transformer blocks and channels for each geometric task under quantization noise.
This ranking is used directly to modulate the learnable affine transforms; it is a standard but non-trivial modeling choice for importance estimation.

pith-pipeline@v0.9.0 · 5852 in / 1299 out tokens · 49624 ms · 2026-05-25T06:34:58.050018+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 6 internal anchors

[1]

Quantized visual geometry grounded transformer

Weilun Feng, Haotong Qin, Mingqiang Wu, Chuanguang Yang, Yuqi Li, Xiangqi Li, Zhulin An, Libo Huang, Yulun Zhang, Michele Magno, et al. Quantized visual geometry grounded transformer. arXiv preprint arXiv:2509.21302,

work page arXiv
[2]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. Gptq: Accurate post-training quantization for generative pre-trained transformers.arXiv preprint arXiv:2210.17323,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

g2vlm: Geometry grounded vision language model with unified 3d reconstruction and spatial reasoning.arXiv preprint arXiv:2511.21688,

Wenbo Hu, Jingli Lin, Yilin Long, Yunlong Ran, Lihan Jiang, Yifan Wang, Chenming Zhu, Runsen Xu, Tai Wang, and Jiangmiao Pang. g2vlm: Geometry grounded vision language model with unified 3d reconstruction and spatial reasoning.arXiv preprint arXiv:2511.21688,

work page arXiv
[4]

Vgd: Visual geometry gaussian splatting for feed-forward surround-view driving reconstruction.arXiv preprint arXiv:2510.19578,

Junhong Lin, Kangli Wang, Shunzhou Wang, Songlin Fan, Ge Li, and Wei Gao. Vgd: Visual geometry gaussian splatting for feed-forward surround-view driving reconstruction.arXiv preprint arXiv:2510.19578,

work page arXiv
[5]

SpinQuant: LLM quantization with learned rotations

Zechun Liu, Changsheng Zhao, Igor Fedorov, Bilge Soran, Dhruv Choudhary, Raghuraman Krish- namoorthi, Vikas Chandra, Yuandong Tian, and Tijmen Blankevoort. Spinquant: Llm quantization with learned rotations.arXiv preprint arXiv:2405.16406,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Tail-aware post-training quantization for 3d geometry models.arXiv preprint arXiv:2602.01741,

Sicheng Pan, Chen Tang, Shuzhao Xie, Ke Yang, Weixiang Zhang, Jiawei Li, Bin Chen, Shu-Tao Xia, and Zhi Wang. Tail-aware post-training quantization for 3d geometry models.arXiv preprint arXiv:2602.01741,

work page arXiv
[8]

FastVGGT: Training-Free Acceleration of Visual Geometry Transformer

You Shen, Zhipeng Zhang, Yansong Qu, Xiawu Zheng, Jiayi Ji, Shengchuan Zhang, and Liu- juan Cao. Fastvggt: Training-free acceleration of visual geometry transformer.arXiv preprint arXiv:2509.02560,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Litevggt: Boosting vanilla vggt via geometry-aware cached token merging

Zhijian Shu, Cheng Lin, Tao Xie, Wei Yin, Ben Li, Zhiyuan Pu, Weize Li, Yao Yao, Xun Cao, Xiaoyang Guo, et al. Litevggt: Boosting vanilla vggt via geometry-aware cached token merging. arXiv preprint arXiv:2512.04939,

work page arXiv
[10]

Avggt: Rethinking global attention for accelerating vggt.arXiv preprint arXiv:2512.02541, 2025a

Xianbing Sun, Zhikai Zhu, Zhengyu Lou, Bo Yang, Jinyang Tang, Liqing Zhang, He Wang, and Jianfu Zhang. Avggt: Rethinking global attention for accelerating vggt.arXiv preprint arXiv:2512.02541, 2025a. Yuxuan Sun, Ruikang Liu, Haoli Bai, Han Bao, Kang Zhao, Yuening Li, Jiaxin Hu, Xianzhi Yu, Lu Hou, Chun Yuan, et al. Flatquant: Flatness matters for llm quan...

work page arXiv
[11]

Fisher-aware quantization for detr detectors with critical-category objectives.arXiv preprint arXiv:2407.03442,

Huanrui Yang, Yafeng Huang, Zhen Dong, Denis A Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Yuan Du, Kurt Keutzer, and Shanghang Zhang. Fisher-aware quantization for detr detectors with critical-category objectives.arXiv preprint arXiv:2407.03442,

work page arXiv
[12]

Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction

Sizhe Yang, Linning Xu, Hao Li, Juncheng Mu, Jia Zeng, Dahua Lin, and Jiangmiao Pang. Robo3r: Enhancing robotic manipulation with accurate feed-forward 3d reconstruction.arXiv preprint arXiv:2602.10101,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Versaq-3d: A reconfigurable accelerator enabling feed-forward and generalizable 3d reconstruction via versatile quantization.arXiv preprint arXiv:2601.20317,

Yipu Zhang, Jintao Cheng, Xingyu Liu, Zeyu Li, Carol Jingyi Li, Jin Wu, Lin Jiang, Yuan Xie, Jiang Xu, and Wei Zhang. Versaq-3d: A reconfigurable accelerator enabling feed-forward and generalizable 3d reconstruction via versatile quantization.arXiv preprint arXiv:2601.20317,

work page arXiv
[14]

Stereo Magnification: Learning View Synthesis using Multiplane Images

Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. Stereo magnification: Learning view synthesis using multiplane images.arXiv preprint arXiv:1805.09817,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

The same argument applies to discrete labels by replacing integrals with sums

11 A Proof of Proposition 3.1 We prove the identity for a continuous label space. The same argument applies to discrete labels by replacing integrals with sums. Fix an input x. Since pz(y|x) is a normalized conditional distribution, Z pz(y|x)dy= 1.(17) Under the stated regularity conditions, we may differentiate Eq.(17) with respect to z. Differentiating ...

work page 2021
[16]

Table 7 summarizes FGQ’s performance onπ3

D FGQ Evaluation Result onπ 3 We further extend our method to π3 [Wang et al., 2025b], the most recent feed-forward 3D recon- struction model. Table 7 summarizes FGQ’s performance onπ3. On camera pose estimation, RTN collapses under 4/4 quantization, with AUC@3 on Co3Dv2 dropping from 0.5140 (FP16) to 0.0246, showing that π3 is highly sensitive to quantiz...

work page 2021
[17]

and 7-Scenes [Shotton et al., 2013] (Fig. 6). Moreover, we provide the visualization result for π3 [Wang et al., 2025b] on 7-Scenes [Shotton et al., 2013] in Fig. 7 to show FGQ’s improvement over RTN. 16 Ground Truth𝛑3 RTN W4A4𝛑3 FGQ W4A4 (Ours) Figure 7: Visual comparison results for π3 on 7-Scenes [Shotton et al., 2013] for Ground Truth, RTN W4A4, and F...

work page 2013

[1] [1]

Quantized visual geometry grounded transformer

Weilun Feng, Haotong Qin, Mingqiang Wu, Chuanguang Yang, Yuqi Li, Xiangqi Li, Zhulin An, Libo Huang, Yulun Zhang, Michele Magno, et al. Quantized visual geometry grounded transformer. arXiv preprint arXiv:2509.21302,

work page arXiv

[2] [2]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. Gptq: Accurate post-training quantization for generative pre-trained transformers.arXiv preprint arXiv:2210.17323,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

g2vlm: Geometry grounded vision language model with unified 3d reconstruction and spatial reasoning.arXiv preprint arXiv:2511.21688,

Wenbo Hu, Jingli Lin, Yilin Long, Yunlong Ran, Lihan Jiang, Yifan Wang, Chenming Zhu, Runsen Xu, Tai Wang, and Jiangmiao Pang. g2vlm: Geometry grounded vision language model with unified 3d reconstruction and spatial reasoning.arXiv preprint arXiv:2511.21688,

work page arXiv

[4] [4]

Vgd: Visual geometry gaussian splatting for feed-forward surround-view driving reconstruction.arXiv preprint arXiv:2510.19578,

Junhong Lin, Kangli Wang, Shunzhou Wang, Songlin Fan, Ge Li, and Wei Gao. Vgd: Visual geometry gaussian splatting for feed-forward surround-view driving reconstruction.arXiv preprint arXiv:2510.19578,

work page arXiv

[5] [5]

SpinQuant: LLM quantization with learned rotations

Zechun Liu, Changsheng Zhao, Igor Fedorov, Bilge Soran, Dhruv Choudhary, Raghuraman Krish- namoorthi, Vikas Chandra, Yuandong Tian, and Tijmen Blankevoort. Spinquant: Llm quantization with learned rotations.arXiv preprint arXiv:2405.16406,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Tail-aware post-training quantization for 3d geometry models.arXiv preprint arXiv:2602.01741,

Sicheng Pan, Chen Tang, Shuzhao Xie, Ke Yang, Weixiang Zhang, Jiawei Li, Bin Chen, Shu-Tao Xia, and Zhi Wang. Tail-aware post-training quantization for 3d geometry models.arXiv preprint arXiv:2602.01741,

work page arXiv

[8] [8]

FastVGGT: Training-Free Acceleration of Visual Geometry Transformer

You Shen, Zhipeng Zhang, Yansong Qu, Xiawu Zheng, Jiayi Ji, Shengchuan Zhang, and Liu- juan Cao. Fastvggt: Training-free acceleration of visual geometry transformer.arXiv preprint arXiv:2509.02560,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Litevggt: Boosting vanilla vggt via geometry-aware cached token merging

Zhijian Shu, Cheng Lin, Tao Xie, Wei Yin, Ben Li, Zhiyuan Pu, Weize Li, Yao Yao, Xun Cao, Xiaoyang Guo, et al. Litevggt: Boosting vanilla vggt via geometry-aware cached token merging. arXiv preprint arXiv:2512.04939,

work page arXiv

[10] [10]

Avggt: Rethinking global attention for accelerating vggt.arXiv preprint arXiv:2512.02541, 2025a

Xianbing Sun, Zhikai Zhu, Zhengyu Lou, Bo Yang, Jinyang Tang, Liqing Zhang, He Wang, and Jianfu Zhang. Avggt: Rethinking global attention for accelerating vggt.arXiv preprint arXiv:2512.02541, 2025a. Yuxuan Sun, Ruikang Liu, Haoli Bai, Han Bao, Kang Zhao, Yuening Li, Jiaxin Hu, Xianzhi Yu, Lu Hou, Chun Yuan, et al. Flatquant: Flatness matters for llm quan...

work page arXiv

[11] [11]

Fisher-aware quantization for detr detectors with critical-category objectives.arXiv preprint arXiv:2407.03442,

Huanrui Yang, Yafeng Huang, Zhen Dong, Denis A Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Yuan Du, Kurt Keutzer, and Shanghang Zhang. Fisher-aware quantization for detr detectors with critical-category objectives.arXiv preprint arXiv:2407.03442,

work page arXiv

[12] [12]

Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction

Sizhe Yang, Linning Xu, Hao Li, Juncheng Mu, Jia Zeng, Dahua Lin, and Jiangmiao Pang. Robo3r: Enhancing robotic manipulation with accurate feed-forward 3d reconstruction.arXiv preprint arXiv:2602.10101,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Versaq-3d: A reconfigurable accelerator enabling feed-forward and generalizable 3d reconstruction via versatile quantization.arXiv preprint arXiv:2601.20317,

Yipu Zhang, Jintao Cheng, Xingyu Liu, Zeyu Li, Carol Jingyi Li, Jin Wu, Lin Jiang, Yuan Xie, Jiang Xu, and Wei Zhang. Versaq-3d: A reconfigurable accelerator enabling feed-forward and generalizable 3d reconstruction via versatile quantization.arXiv preprint arXiv:2601.20317,

work page arXiv

[14] [14]

Stereo Magnification: Learning View Synthesis using Multiplane Images

Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. Stereo magnification: Learning view synthesis using multiplane images.arXiv preprint arXiv:1805.09817,

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

The same argument applies to discrete labels by replacing integrals with sums

11 A Proof of Proposition 3.1 We prove the identity for a continuous label space. The same argument applies to discrete labels by replacing integrals with sums. Fix an input x. Since pz(y|x) is a normalized conditional distribution, Z pz(y|x)dy= 1.(17) Under the stated regularity conditions, we may differentiate Eq.(17) with respect to z. Differentiating ...

work page 2021

[16] [16]

Table 7 summarizes FGQ’s performance onπ3

D FGQ Evaluation Result onπ 3 We further extend our method to π3 [Wang et al., 2025b], the most recent feed-forward 3D recon- struction model. Table 7 summarizes FGQ’s performance onπ3. On camera pose estimation, RTN collapses under 4/4 quantization, with AUC@3 on Co3Dv2 dropping from 0.5140 (FP16) to 0.0246, showing that π3 is highly sensitive to quantiz...

work page 2021

[17] [17]

and 7-Scenes [Shotton et al., 2013] (Fig. 6). Moreover, we provide the visualization result for π3 [Wang et al., 2025b] on 7-Scenes [Shotton et al., 2013] in Fig. 7 to show FGQ’s improvement over RTN. 16 Ground Truth𝛑3 RTN W4A4𝛑3 FGQ W4A4 (Ours) Figure 7: Visual comparison results for π3 on 7-Scenes [Shotton et al., 2013] for Ground Truth, RTN W4A4, and F...

work page 2013