{"paper":{"title":"Sparse Code Uplifting for Efficient 3D Language Gaussian Splatting","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Sparse code uplifting from 2D images to 3D Gaussians delivers up to 400 times faster training for open-vocabulary scene understanding.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Lovre Antonio Budimir, Nandita Vijaykumar, Steve Ryhner, Sven Lon\\v{c}ari\\'c, Yushi Guan","submitted_at":"2026-05-13T14:35:31Z","abstract_excerpt":"3D Language Gaussian Splatting (3DLGS) augments 3D Gaussian Splatting with language-aligned visual features for open-vocabulary 3D scene understanding. A core challenge is efficiently associating high-dimensional vision-language embeddings with millions of 3D Gaussians while preserving efficient feature rendering for text-based querying. Existing methods either store dense features directly on Gaussians, causing high storage costs and slow rendering, or learn compact representations through expensive per-scene optimization with repeated feature rasterization. No existing method simultaneously "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Our method achieves up to 400× training speedup while being 3× more memory efficient during training compared to the state-of-the-art in rendering speed. Across multiple benchmarks, SCOUP matches or outperforms existing methods in open-vocabulary querying accuracy.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That sparse codebook coefficients learned entirely from 2D image regions can be uplifted to 3D Gaussians via weighted multi-view aggregation and Top-K filtering without substantial loss of semantic accuracy or the need for per-scene language optimization.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"SCOUP decouples 2D sparse code learning from 3D Gaussian optimization to deliver up to 400x training speedup and 3x better memory efficiency while matching accuracy on open-vocabulary 3D queries.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Sparse code uplifting from 2D images to 3D Gaussians delivers up to 400 times faster training for open-vocabulary scene understanding.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"454fefc7b7b2f4d3960379d71122ce926740c8aff21402d0c9655603f5a68839"},"source":{"id":"2605.13600","kind":"arxiv","version":1},"verdict":{"id":"9b550f01-b106-4a13-bbe0-6dbff00084df","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T19:22:41.364916Z","strongest_claim":"Our method achieves up to 400× training speedup while being 3× more memory efficient during training compared to the state-of-the-art in rendering speed. Across multiple benchmarks, SCOUP matches or outperforms existing methods in open-vocabulary querying accuracy.","one_line_summary":"SCOUP decouples 2D sparse code learning from 3D Gaussian optimization to deliver up to 400x training speedup and 3x better memory efficiency while matching accuracy on open-vocabulary 3D queries.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That sparse codebook coefficients learned entirely from 2D image regions can be uplifted to 3D Gaussians via weighted multi-view aggregation and Top-K filtering without substantial loss of semantic accuracy or the need for per-scene language optimization.","pith_extraction_headline":"Sparse code uplifting from 2D images to 3D Gaussians delivers up to 400 times faster training for open-vocabulary scene understanding."},"references":{"count":43,"sample":[{"doi":"","year":2026,"title":"GALA: Guided attention with language alignment for open vocabulary gaussian splatting","work_id":"e8b07e34-4e73-4283-8b36-e419ae34b2b2","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Barron, Ben Mildenhall, Dor Verbin, Pratul P","work_id":"0aedd240-1463-4d03-bd4e-c8aff93e2414","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Gaussianeditor: Swift and controllable 3d editing with gaussian splatting.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21476–21485, 2023","work_id":"05058a69-c172-4cf0-bec7-c096214c88c7","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1109/cvpr52733.2024.02022","year":2024,"title":"URL https://doi.org/10.1109/CVPR52733","work_id":"7efbc2dd-b0f2-4f71-bb1c-d2fcf110d805","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Occam’s lgs: An efficient approach for language gaussian splatting","work_id":"9baf9d6e-f4e6-46a2-abfe-cd7588f4a792","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":43,"snapshot_sha256":"a1f80dc2a830356b712bebeaed8e5861d118bcec65f35261555cbad8f6049d82","internal_anchors":2},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}