Stream3D: Sequential Multi-View 3D Generation via Evidential Memory
Pith reviewed 2026-05-21 04:46 UTC · model grok-4.3
The pith
Stream3D turns any frozen view-conditioned 3D generator into a streaming system by keeping a fixed-size evidential memory of past frames.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Stream3D is the first training-free streaming mechanism that converts a frozen view-conditioned 3D generator into a streaming generator with constant cross-chunk memory by maintaining a compact evidential memory that selectively caches the most informative historical frames based on a proposed evidence score mechanism; as the stream progresses the memory dynamically updates to retain a fixed number of frames, preventing linear memory growth and degradation over long sequences without any retraining, architectural modifications, or auxiliary losses.
What carries the argument
A compact evidential memory that scores incoming frames and retains only a fixed number of the highest-scoring ones to supply context to the generator.
If this is right
- Arbitrarily long monocular streams can be processed without memory footprint growing linearly with length.
- Temporal consistency is preserved across the entire generated 3D sequence.
- Any pre-trained view-conditioned 3D generator can be used as-is without retraining or code changes.
- Photometric and geometric metrics improve over KV-cache reuse and flow-based feature editing on both realistic and synthetic benchmarks.
Where Pith is reading between the lines
- The selective-memory idea could be tested on other sequential generation tasks that currently suffer from context explosion.
- An online variant might adapt the evidence threshold according to observed scene change rate.
- Integration with real-time capture pipelines could enable continuous 3D reconstruction for robotics or AR without full history storage.
Load-bearing premise
The evidence score mechanism can reliably pick a fixed set of frames that is sufficient to stop inconsistency from accumulating across arbitrarily long sequences.
What would settle it
Running Stream3D on a very long monocular video and measuring whether geometric or photometric consistency metrics begin to degrade after several hundred frames despite the memory update rule.
Figures
read the original abstract
View-conditioned 3D generators such as SAM 3D, TRELLIS and Hunyuan3D produce high-quality object reconstructions from a single view, but real-world visual observation often arrives as long monocular streams. Naively applying these generators to each streaming frame independently leads to severe temporal inconsistency in the generated results. To address this problem, we propose Stream3D, the first training-free streaming mechanism that turns a frozen view-conditioned 3D generator into a streaming generator with constant cross-chunk memory. Stream3D achieves this by maintaining a compact evidential memory, which selectively caches the most informative historical frames based on a proposed evidence score mechanism. As the stream progresses, the memory dynamically updates to retain a fixed number of informative frames, preventing the memory footprint from growing linearly with sequence length. This also prevents degradation over long sequences and keeps the underlying generator completely unchanged without retraining, architectural modifications, or auxiliary losses. Evaluated on both realistic and synthetic streaming benchmarks, Stream3D outperforms latent-transport baselines, including KV-cache reuse and flow-based feature editing, across both photometric and geometric metrics. More details can be found at: https://anonymous-submission-20.github.io/streaming3D.github.io/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Stream3D, a training-free streaming mechanism that converts a frozen view-conditioned 3D generator (e.g., SAM 3D, TRELLIS) into a streaming generator for long monocular sequences. It maintains a fixed-size evidential memory that selectively caches the most informative historical frames via a proposed evidence score, dynamically updating to prevent linear memory growth and temporal inconsistency without retraining, architectural changes, or auxiliary losses. The method is evaluated on realistic and synthetic streaming benchmarks, where it reportedly outperforms latent-transport baselines (KV-cache reuse, flow-based feature editing) on photometric and geometric metrics.
Significance. If the evidence score reliably selects frames that sustain consistency without long-term drift, the result would be significant for deploying high-quality 3D generators in streaming settings such as video processing or robotics. The training-free property, constant cross-chunk memory, and preservation of the original generator are clear strengths that address a practical limitation in sequential 3D generation.
major comments (2)
- [§4] §4 (Experiments): The reported benchmarks do not include quantitative results or error accumulation analysis on sequences whose length exceeds the fixed memory capacity by an order of magnitude or more. This is load-bearing for the central claim that the evidential memory prevents degradation over arbitrarily long streams, as local evidence scoring may discard frames whose utility emerges only after many steps.
- [§3.2] §3.2 (Evidence Score Mechanism): No bound, stability analysis, or ablation is provided showing that the evidence score ranks frames by long-term utility for future views rather than immediate reconstruction quality or feature novelty. Without this, the assumption that a fixed number of retained frames suffices for consistency over unbounded sequences remains unverified.
minor comments (2)
- [Abstract] Abstract: The claim of outperformance on photometric and geometric metrics is stated without any numerical values, error bars, dataset sizes, or specific baseline scores, reducing the ability to gauge the practical improvement.
- [§3] The manuscript would benefit from a clearer notation table or pseudocode for the memory update rule and evidence score computation to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for identifying key areas where additional evidence would strengthen the central claims of the paper. We address each major comment below and commit to revisions that directly respond to the concerns while remaining faithful to the scope and contributions of the work.
read point-by-point responses
-
Referee: [§4] §4 (Experiments): The reported benchmarks do not include quantitative results or error accumulation analysis on sequences whose length exceeds the fixed memory capacity by an order of magnitude or more. This is load-bearing for the central claim that the evidential memory prevents degradation over arbitrarily long streams, as local evidence scoring may discard frames whose utility emerges only after many steps.
Authors: We agree that longer-sequence evaluation is necessary to substantiate the claim of robustness over arbitrarily long streams. Our existing benchmarks already include sequences several times longer than the memory capacity and show stable photometric and geometric metrics, but we will add new quantitative results on sequences exceeding the memory size by an order of magnitude (e.g., 500–1000 frames with memory size 20–50). These will include error-accumulation curves plotted against frame index to demonstrate absence of drift. The revised manuscript will report these experiments in Section 4. revision: yes
-
Referee: [§3.2] §3.2 (Evidence Score Mechanism): No bound, stability analysis, or ablation is provided showing that the evidence score ranks frames by long-term utility for future views rather than immediate reconstruction quality or feature novelty. Without this, the assumption that a fixed number of retained frames suffices for consistency over unbounded sequences remains unverified.
Authors: The evidence score is constructed to balance immediate reconstruction quality with forward-looking information gain, which is why it outperforms pure novelty or reconstruction-error baselines in the reported ablations. We will add a targeted ablation in the revised Section 3.2 that measures long-term consistency when frames are selected by the evidence score versus immediate-quality-only or novelty-only alternatives, using held-out future views as the evaluation criterion. A formal stability bound or convergence analysis would require additional theoretical assumptions not developed in the current work; we will explicitly note this limitation and flag it as future work while emphasizing the empirical support from both synthetic and real streaming benchmarks. revision: partial
- A formal mathematical bound or stability analysis proving that the evidence score ranks frames by long-term utility rather than immediate quality or novelty.
Circularity Check
No significant circularity; Stream3D mechanism is an independent algorithmic proposal
full rationale
The paper presents Stream3D as a training-free streaming mechanism that maintains a fixed-size evidential memory updated via a proposed evidence score to cache informative historical frames from a frozen view-conditioned 3D generator. This is described as a novel algorithmic contribution evaluated empirically on external realistic and synthetic benchmarks, with outperformance reported against baselines such as KV-cache reuse. No equations, derivations, or first-principles results are indicated that reduce the claimed consistency or performance to fitted parameters, self-definitions, or self-citation chains by construction. The central claim relies on the independent design of the evidence score and memory update rule rather than any tautological equivalence to inputs, making the derivation self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption An evidence score can be computed that reliably ranks historical frames by informativeness for 3D consistency without training.
invented entities (1)
-
evidential memory
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
maintaining a compact evidential memory, which selectively caches the most informative historical frames based on a proposed evidence score mechanism... token-level evidence is aggregated into frame-level ownership scores, and the top-K frames are selected
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the cached evidence score M[q, j] is monotonically non-decreasing over time... non-degradation property in evidence space
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Openlrm: Open-source large reconstruction models
3DTopia. Openlrm: Open-source large reconstruction models. https://github.com/3DTopia/ OpenLRM, 2023
work page 2023
-
[2]
T. Anciukeviˇcius, Z. Xu, M. Fisher, P. Henderson, H. Bilen, N. J. Mitra, and P. Guerrero. Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12608–12618, 2023
work page 2023
-
[3]
S. Bahmani, I. Skorokhodov, V . Rong, G. Wetzstein, L. Guibas, P. Wonka, S. Tulyakov, J. J. Park, A. Tagliasacchi, and D. B. Lindell. 4d-fy: Text-to-4d generation using hybrid score distillation sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7996–8006, 2024
work page 2024
-
[4]
O. Bar-Tal, L. Yariv, Y . Lipman, and T. Dekel. Multidiffusion: Fusing diffusion paths for controlled image generation. 2023
work page 2023
- [5]
-
[6]
E. R. Chan, C. Z. Lin, M. A. Chan, K. Nagano, B. Pan, S. De Mello, O. Gallo, L. Guibas, J. Tremblay, S. Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022
work page 2022
-
[7]
R. Chen, Y . Chen, N. Jiao, and K. Jia. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22246–22256, 2023
work page 2023
-
[8]
X. Chen, Y . Chen, Y . Xiu, A. Geiger, and A. Chen. Ttt3r: 3d reconstruction as test-time training.arXiv preprint arXiv:2509.26645, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[9]
SAM 3D: 3Dfy Anything in Images
X. Chen, F.-J. Chu, P. Gleize, K. J. Liang, A. Sax, H. Tang, W. Wang, M. Guo, T. Hardin, X. Li, et al. Sam 3d: 3dfy anything in images.arXiv preprint arXiv:2511.16624, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [10]
- [11]
-
[12]
A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse. Monoslam: Real-time single camera slam.IEEE transactions on pattern analysis and machine intelligence, 29(6):1052–1067, 2007
work page 2007
- [13]
- [14]
- [15]
-
[16]
C. Forster, M. Pizzoli, and D. Scaramuzza. Svo: Fast semi-direct monocular visual odometry. In2014 IEEE international conference on robotics and automation (ICRA), pages 15–22. IEEE, 2014
work page 2014
-
[17]
J. Gao, T. Shen, Z. Wang, W. Chen, K. Yin, D. Li, O. Litany, Z. Gojcic, and S. Fidler. Get3d: A generative model of high quality 3d textured shapes learned from images. InAdvances in Neural Information Processing Systems, volume 35, pages 31841–31854, 2022
work page 2022
-
[18]
R. Henschel, L. Khachatryan, D. Hayrapetyan, H. Poghosyan, V . Tadevosyan, Z. Wang, S. Navasardyan, and H. Shi. Streamingt2v: Consistent, dynamic, and extendable long video generation from text.arXiv preprint arXiv:2403.14773, 2024
-
[19]
Y . Hong, K. Zhang, J. Gu, S. Bi, Y . Zhou, D. Liu, F. Liu, K. Sunkavalli, T. Bui, and H. Tan. Lrm: Large reconstruction model for single image to 3d.arXiv preprint arXiv:2311.04400, 2023. 11
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [20]
- [21]
-
[22]
V . Jampani, K.-K. Maninis, A. Engelhardt, A. Karpur, K. Truong, K. Sargent, S. Popov, A. Araujo, R. Martin Brualla, K. Patel, et al. Navi: Category-agnostic image collections with high-quality 3d shape and pose annotations.Advances in Neural Information Processing Systems, 36:76061–76084, 2023
work page 2023
- [23]
-
[24]
Shap-E: Generating Conditional 3D Implicit Functions
H. Jun and A. Nichol. Shap-e: Generating conditional 3d implicit functions.arXiv preprint arXiv:2305.02463, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [25]
-
[26]
J. Kim, J. Kang, J. Choi, and B. Han. Fifo-diffusion: Generating infinite videos from text without training. InAdvances in Neural Information Processing Systems, 2024
work page 2024
-
[27]
G. Klein and D. Murray. Parallel tracking and mapping for small ar workspaces. In2007 6th IEEE and ACM international symposium on mixed and augmented reality, pages 225–234. IEEE, 2007
work page 2007
-
[28]
X. Kong, S. Liu, X. Lyu, M. Taher, X. Qi, and A. J. Davison. Eschernet: A generative model for scalable view synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9503–9513, 2024
work page 2024
-
[29]
V . Kulikov, M. Kleiner, I. Huberman-Spiegelglas, and T. Michaeli. Flowedit: Inversion-free text-based editing using pre-trained flow models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19721–19730, 2025
work page 2025
- [30]
- [31]
-
[32]
B. Li, D. Wu, J. Li, S. Zhou, Z. Zeng, L. Li, and H. Zha. Mv-sam3d: Adaptive multi-view fusion for layout-aware 3d generation.arXiv preprint arXiv:2603.11633, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
- [33]
- [34]
- [35]
- [36]
-
[37]
C.-H. Lin, J. Gao, L. Tang, T. Takikawa, X. Zeng, X. Huang, K. Kreis, S. Fidler, M.-Y . Liu, and T.-Y . Lin. Magic3d: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 300–309, 2023
work page 2023
-
[38]
H. Lin, S. Chen, J. Liew, D. Y . Chen, Z. Li, G. Shi, J. Feng, and B. Kang. Depth anything 3: Recovering the visual space from any views.arXiv preprint arXiv:2511.10647, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
M. Liu, C. Xu, H. Jin, L. Chen, M. Varma T, Z. Xu, and H. Su. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization.Advances in Neural Information Processing Systems, 36:22226–22246, 2023. 12
work page 2023
-
[40]
R. Liu, R. Wu, B. Van Hoorick, P. Tokmakov, S. Zakharov, and C. V ondrick. Zero-1-to-3: Zero-shot one image to 3d object. InProceedings of the IEEE/CVF international conference on computer vision, pages 9298–9309, 2023
work page 2023
-
[41]
Y . Liu, S. Dong, S. Wang, Y . Yin, Y . Yang, Q. Fan, and B. Chen. Slam3r: Real-time dense scene reconstruction from monocular rgb videos. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 16651–16662, 2025
work page 2025
-
[42]
Y . Liu, C. Lin, Z. Zeng, X. Long, L. Liu, T. Komura, and W. Wang. Syncdreamer: Generating multiview- consistent images from a single-view image.arXiv preprint arXiv:2309.03453, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[43]
X. Long, Y .-C. Guo, C. Lin, Y . Liu, Z. Dou, L. Liu, Y . Ma, S.-H. Zhang, M. Habermann, C. Theobalt, et al. Wonder3d: Single image to 3d using cross-domain diffusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9970–9980, 2024
work page 2024
-
[44]
VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold
D. Maggio, H. Lim, and L. Carlone. Vggt-slam: Dense rgb slam optimized on the sl (4) manifold.arXiv preprint arXiv:2505.12549, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[45]
H. Matsuki, R. Murai, P. H. J. Kelly, and A. J. Davison. Gaussian splatting slam. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18039–18048, 2024
work page 2024
- [46]
-
[47]
B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. InEuropean Conference on Computer Vision, pages 405–421, 2020
work page 2020
-
[48]
R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos. Orb-slam: A versatile and accurate monocular slam system.IEEE transactions on robotics, 31(5):1147–1163, 2015
work page 2015
-
[49]
R. Mur-Artal and J. D. Tardós. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras.IEEE transactions on robotics, 33(5):1255–1262, 2017
work page 2017
-
[50]
R. A. Newcombe, S. J. Lovegrove, and A. J. Davison. Dtam: Dense tracking and mapping in real-time. In 2011 international conference on computer vision, pages 2320–2327. IEEE, 2011
work page 2011
-
[51]
Point-E: A System for Generating 3D Point Clouds from Complex Prompts
A. Nichol, H. Jun, P. Dhariwal, P. Mishkin, and M. Chen. Point-e: A system for generating 3d point clouds from complex prompts.arXiv preprint arXiv:2212.08751, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[52]
DreamFusion: Text-to-3D using 2D Diffusion
B. Poole, A. Jain, J. T. Barron, and B. Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion.arXiv preprint arXiv:2209.14988, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
- [53]
-
[54]
L. Qiu, G. Chen, X. Gu, Q. Zuo, M. Xu, Y . Wu, W. Yuan, Z. Dong, L. Bo, and X. Han. Richdreamer: A generalizable normal-depth diffusion model for detail richness in text-to-3d. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9914–9925, 2024
work page 2024
-
[55]
J. Ren, K. Xie, A. Mirzaei, H. Liang, X. Zeng, K. Kreis, Z. Liu, A. Torralba, S. Fidler, S. W. Kim, et al. L4gm: Large 4d gaussian reconstruction model.Advances in Neural Information Processing Systems, 37:56828–56858, 2024
work page 2024
-
[56]
R. Shi, H. Chen, Z. Zhang, M. Liu, C. Xu, X. Wei, L. Chen, C. Zeng, and H. Su. Zero123++: A single image to consistent multi-view diffusion base model.arXiv preprint arXiv:2310.15110, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[57]
Y . Shi, P. Wang, J. Ye, M. Long, K. Li, and X. Yang. Mvdream: Multi-view diffusion for 3d generation. arXiv preprint arXiv:2308.16512, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [58]
-
[59]
J. Sun, Y . Xie, L. Chen, X. Zhou, and H. Bao. Neuralrecon: Real-time coherent 3d reconstruction from monocular video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15598–15607, 2021. 13
work page 2021
- [60]
-
[61]
J. Tang, Z. Chen, X. Chen, T. Wang, G. Zeng, and Z. Liu. Lgm: Large multi-view gaussian model for high-resolution 3d content creation. InEuropean Conference on Computer Vision, pages 1–18. Springer, 2024
work page 2024
-
[62]
J. Tang, J. Ren, H. Zhou, Z. Liu, and G. Zeng. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation.arXiv preprint arXiv:2309.16653, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[63]
J. Tang, T. Wang, B. Zhang, T. Zhang, R. Yi, L. Ma, and D. Chen. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22819–22829, 2023
work page 2023
-
[64]
TripoSR: Fast 3D Object Reconstruction from a Single Image
D. Tochilkin, D. Pankratz, Z. Liu, Z. Huang, A. Letts, Y . Li, D. Liang, C. Laforte, V . Jampani, and Y .-P. Cao. Triposr: Fast 3d object reconstruction from a single image.arXiv preprint arXiv:2403.02151, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[65]
V . V oleti, C.-H. Yao, M. Boss, A. Letts, D. Pankratz, D. Tochilkin, C. Laforte, R. Rombach, and V . Jampani. Sv3d: Novel multi-view synthesis and 3d generation from a single image using latent video diffusion. arXiv preprint arXiv:2403.12008, 2024
-
[66]
H. Wang and L. Agapito. 3d reconstruction with spatial memory. In2025 International Conference on 3D Vision (3DV), pages 78–89. IEEE, 2025
work page 2025
-
[67]
H. Wang, X. Du, J. Li, R. A. Yeh, and G. Shakhnarovich. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12619–12629, 2023
work page 2023
- [68]
- [69]
-
[70]
Q. Wang, Y . Zhang, A. Holynski, A. A. Efros, and A. Kanazawa. Continuous 3d perception model with persistent state. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
work page 2025
-
[71]
S. Wang, V . Leroy, Y . Cabon, B. Chidlovskii, and J. Revaud. Dust3r: Geometric 3d vision made easy. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20697–20709, 2024
work page 2024
-
[72]
Z. Wang, C. Lu, Y . Wang, F. Bao, C. Li, H. Su, and J. Zhu. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation.Advances in neural information processing systems, 36:8406–8441, 2023
work page 2023
- [73]
- [74]
-
[75]
K. Wu, F. Liu, Z. Cai, R. Yan, H. Wang, Y . Hu, Y . Duan, and K. Ma. Unique3d: High-quality and efficient 3d mesh generation from a single image.Advances in Neural Information Processing Systems, 37:125116–125141, 2024
work page 2024
-
[76]
S. Wu, Y . Lin, F. Zhang, Y . Zeng, J. Xu, P. Torr, X. Cao, and Y . Yao. Direct3d: Scalable image-to-3d generation via 3d latent diffusion transformer.Advances in Neural Information Processing Systems, 37:121859–121881, 2024
work page 2024
- [77]
-
[78]
Native and Compact Structured Latents for 3D Generation
J. Xiang, X. Chen, S. Xu, R. Wang, Z. Lv, Y . Deng, H. Zhu, Y . Dong, H. Zhao, N. J. Yuan, et al. Native and compact structured latents for 3d generation.arXiv preprint arXiv:2512.14692, 2025
work page internal anchor Pith review arXiv 2025
- [79]
-
[80]
J. Xu, W. Cheng, Y . Gao, X. Wang, S. Gao, and Y . Shan. Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models.arXiv preprint arXiv:2404.07191, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.