{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2024:MULTON7LSJF7UQ2C6SGB5KVG2C","short_pith_number":"pith:MULTON7L","schema_version":"1.0","canonical_sha256":"65173737eb924bfa4342f48c1eaaa6d0879d83747c551af85354ca098416db99","source":{"kind":"arxiv","id":"2408.13912","version":2},"attestation_state":"computed","paper":{"title":"Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Splatt3R turns any uncalibrated stereo image pair into a 3D Gaussian splat without camera parameters or depth.","cross_cats":["cs.LG"],"primary_cat":"cs.CV","authors_text":"Brandon Smart, Chuanxia Zheng, Iro Laina, Victor Adrian Prisacariu","submitted_at":"2024-08-25T18:27:20Z","abstract_excerpt":"In this paper, we introduce Splatt3R, a pose-free, feed-forward method for in-the-wild 3D reconstruction and novel view synthesis from stereo pairs. Given uncalibrated natural images, Splatt3R can predict 3D Gaussian Splats without requiring any camera parameters or depth information. For generalizability, we build Splatt3R upon a ``foundation'' 3D geometry reconstruction method, MASt3R, by extending it to deal with both 3D structure and appearance. Specifically, unlike the original MASt3R which reconstructs only 3D point clouds, we predict the additional Gaussian attributes required to constr"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":false},"canonical_record":{"source":{"id":"2408.13912","kind":"arxiv","version":2},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CV","submitted_at":"2024-08-25T18:27:20Z","cross_cats_sorted":["cs.LG"],"title_canon_sha256":"a1eb89593ad13f9b14c836cfd4e9a5aad90098d153ea00dc22e4c71b2171f402","abstract_canon_sha256":"b3768bb736e07c4a570b15ed0a1454e7bcd611799b4b23ab0d040f7c0102c20f"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:46.007466Z","signature_b64":"klRH6yQS4Rxn0ayW8Y1PtHrJCnFAR+ySEdPyKFIoqGlYq/mpv7vaPQw+kkbQUWlh9jEPlwjuY7xgQke0J5KqBQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"65173737eb924bfa4342f48c1eaaa6d0879d83747c551af85354ca098416db99","last_reissued_at":"2026-05-17T23:38:46.006998Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:46.006998Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Splatt3R turns any uncalibrated stereo image pair into a 3D Gaussian splat without camera parameters or depth.","cross_cats":["cs.LG"],"primary_cat":"cs.CV","authors_text":"Brandon Smart, Chuanxia Zheng, Iro Laina, Victor Adrian Prisacariu","submitted_at":"2024-08-25T18:27:20Z","abstract_excerpt":"In this paper, we introduce Splatt3R, a pose-free, feed-forward method for in-the-wild 3D reconstruction and novel view synthesis from stereo pairs. Given uncalibrated natural images, Splatt3R can predict 3D Gaussian Splats without requiring any camera parameters or depth information. For generalizability, we build Splatt3R upon a ``foundation'' 3D geometry reconstruction method, MASt3R, by extending it to deal with both 3D structure and appearance. Specifically, unlike the original MASt3R which reconstructs only 3D point clouds, we predict the additional Gaussian attributes required to constr"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Given uncalibrated natural images, Splatt3R can predict 3D Gaussian Splats without requiring any camera parameters or depth information... We train Splatt3R on the ScanNet++ dataset and demonstrate excellent generalisation to uncalibrated, in-the-wild images.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That first optimizing only the 3D point cloud geometry loss and then switching to a novel view synthesis objective, combined with the proposed loss masking strategy, reliably avoids local minima that plague direct Gaussian splat training from stereo views.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Splatt3R is a feed-forward network that predicts 3D Gaussian splats directly from uncalibrated stereo image pairs by extending MASt3R with appearance attributes and a two-stage training procedure.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Splatt3R turns any uncalibrated stereo image pair into a 3D Gaussian splat without camera parameters or depth.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"a7fcd97fbc7899cb470816a7c2509e79c1186dc89e64e568d2011a2bb654dfe8"},"source":{"id":"2408.13912","kind":"arxiv","version":2},"verdict":{"id":"6ad3c819-3177-402e-94fb-4a9aab17f726","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T01:11:07.424871Z","strongest_claim":"Given uncalibrated natural images, Splatt3R can predict 3D Gaussian Splats without requiring any camera parameters or depth information... We train Splatt3R on the ScanNet++ dataset and demonstrate excellent generalisation to uncalibrated, in-the-wild images.","one_line_summary":"Splatt3R is a feed-forward network that predicts 3D Gaussian splats directly from uncalibrated stereo image pairs by extending MASt3R with appearance attributes and a two-stage training procedure.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That first optimizing only the 3D point cloud geometry loss and then switching to a novel view synthesis objective, combined with the proposed loss masking strategy, reliably avoids local minima that plague direct Gaussian splat training from stereo views.","pith_extraction_headline":"Splatt3R turns any uncalibrated stereo image pair into a 3D Gaussian splat without camera parameters or depth."},"references":{"count":69,"sample":[{"doi":"","year":1991,"title":"The plenoptic func- tion and the elements of early vision","work_id":"b201d23d-edb6-47e1-8672-cbe835689be2","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1982,"title":"Computational stereo","work_id":"261c9da1-2d5a-47f7-9f63-130c738e4400","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Mip-nerf 360: Unbounded anti-aliased neural radiance fields","work_id":"add0a268-f691-4637-8ac9-f40fc4330b28","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Porf: Pose residual field for accurate neural sur- face reconstruction","work_id":"0bbbd8c8-573b-405b-ad41-280e1133faa0","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Nope-nerf: Optimising neural ra- diance field with no pose prior","work_id":"6ae3b26d-e003-4469-a1f3-f1e1bbc572e2","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":69,"snapshot_sha256":"6257651846316561e089d66a0533be8131e2e3f3201b2ece8f41ac67132cecc8","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2408.13912","created_at":"2026-05-17T23:38:46.007073+00:00"},{"alias_kind":"arxiv_version","alias_value":"2408.13912v2","created_at":"2026-05-17T23:38:46.007073+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2408.13912","created_at":"2026-05-17T23:38:46.007073+00:00"},{"alias_kind":"pith_short_12","alias_value":"MULTON7LSJF7","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"MULTON7LSJF7UQ2C","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"MULTON7L","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":24,"internal_anchor_count":24,"sample":[{"citing_arxiv_id":"2605.19656","citing_title":"Cross-View Splatter: Feed-Forward View Synthesis with Georeferenced Images","ref_index":70,"is_internal_anchor":true},{"citing_arxiv_id":"2506.09885","citing_title":"The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images with Minimal 3D Knowledge","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2507.07982","citing_title":"Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling","ref_index":63,"is_internal_anchor":true},{"citing_arxiv_id":"2510.09881","citing_title":"LTGS: Long-Term Gaussian Scene Chronology From Sparse View Updates","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2511.02830","citing_title":"Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2511.20853","citing_title":"MODEST: Multi-Optics Depth-of-Field Stereo Dataset","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2512.04021","citing_title":"C3G: Learning Compact 3D Representations with 2K Gaussians","ref_index":61,"is_internal_anchor":true},{"citing_arxiv_id":"2512.13122","citing_title":"DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2601.08831","citing_title":"3AM: 3egment Anything with Geometric Consistency in Videos","ref_index":68,"is_internal_anchor":true},{"citing_arxiv_id":"2507.11539","citing_title":"Streaming 4D Visual Geometry Transformer","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09688","citing_title":"ConFixGS: Learning to Fix Feedforward 3D Gaussian Splatting with Confidence-Aware Diffusion Priors in Driving Scenes","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08739","citing_title":"ReorgGS: Equivalent Distribution Reorganization for 3D Gaussian Splatting","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2604.21182","citing_title":"WildSplatter: Feed-forward 3D Gaussian Splatting with Appearance Control from Unconstrained Images","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2604.20038","citing_title":"FluSplat: Sparse-View 3D Editing without Test-Time Optimization","ref_index":42,"is_internal_anchor":true},{"citing_arxiv_id":"2604.10573","citing_title":"Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images","ref_index":58,"is_internal_anchor":true},{"citing_arxiv_id":"2604.06830","citing_title":"VGGT-SLAM++","ref_index":66,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07287","citing_title":"SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07550","citing_title":"Mind the Gap: Geometrically Accurate Generative Reconstruction from Disjoint Views","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2511.10647","citing_title":"Depth Anything 3: Recovering the Visual Space from Any Views","ref_index":79,"is_internal_anchor":true},{"citing_arxiv_id":"2604.06740","citing_title":"LiveStre4m: Feed-Forward Live Streaming of Novel Views from Unposed Multi-View Video","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2604.04874","citing_title":"Free-Range Gaussians: Non-Grid-Aligned Generative 3D Gaussian Reconstruction","ref_index":48,"is_internal_anchor":true},{"citing_arxiv_id":"2604.14141","citing_title":"Geometric Context Transformer for Streaming 3D Reconstruction","ref_index":61,"is_internal_anchor":true},{"citing_arxiv_id":"2604.14025","citing_title":"Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective","ref_index":144,"is_internal_anchor":true},{"citing_arxiv_id":"2605.04435","citing_title":"Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes","ref_index":37,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/MULTON7LSJF7UQ2C6SGB5KVG2C","json":"https://pith.science/pith/MULTON7LSJF7UQ2C6SGB5KVG2C.json","graph_json":"https://pith.science/api/pith-number/MULTON7LSJF7UQ2C6SGB5KVG2C/graph.json","events_json":"https://pith.science/api/pith-number/MULTON7LSJF7UQ2C6SGB5KVG2C/events.json","paper":"https://pith.science/paper/MULTON7L"},"agent_actions":{"view_html":"https://pith.science/pith/MULTON7LSJF7UQ2C6SGB5KVG2C","download_json":"https://pith.science/pith/MULTON7LSJF7UQ2C6SGB5KVG2C.json","view_paper":"https://pith.science/paper/MULTON7L","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2408.13912&json=true","fetch_graph":"https://pith.science/api/pith-number/MULTON7LSJF7UQ2C6SGB5KVG2C/graph.json","fetch_events":"https://pith.science/api/pith-number/MULTON7LSJF7UQ2C6SGB5KVG2C/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/MULTON7LSJF7UQ2C6SGB5KVG2C/action/timestamp_anchor","attest_storage":"https://pith.science/pith/MULTON7LSJF7UQ2C6SGB5KVG2C/action/storage_attestation","attest_author":"https://pith.science/pith/MULTON7LSJF7UQ2C6SGB5KVG2C/action/author_attestation","sign_citation":"https://pith.science/pith/MULTON7LSJF7UQ2C6SGB5KVG2C/action/citation_signature","submit_replication":"https://pith.science/pith/MULTON7LSJF7UQ2C6SGB5KVG2C/action/replication_record"}},"created_at":"2026-05-17T23:38:46.007073+00:00","updated_at":"2026-05-17T23:38:46.007073+00:00"}