The authors release the 3DVQL benchmark for 3D multimodal visual query localization and show that a lift-and-attention fusion module outperforms prior fusion baselines on it.
hub
Pointnet: Deep learning on point sets for 3d classification and segmentation
14 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 14roles
background 2polarities
background 2representative citing papers
SparseSplat uses entropy-based probabilistic sampling and a specialized point cloud network to generate compact 3D Gaussian maps that retain high rendering quality with far fewer Gaussians than prior feed-forward methods.
CLIPoint3D is the first CLIP-based framework for few-shot unsupervised 3D point cloud domain adaptation that reports 3-16% accuracy gains on PointDA-10 and GraspNetPC-10.
POMA-3D learns self-supervised 3D scene representations from point maps and improves performance on geometric 3D tasks including navigation and scene retrieval.
UST-Hand is a self-supervised 3D hand pose estimation method using conditional normalizing flows for uncertainty-aware hypothesis sampling and probabilistic point cloud interactions to achieve up to 37.8% better MPVPE than prior self-supervised approaches on three datasets.
UniD-Shift decomposes 2D and 3D features into shared semantic and private modality-specific subspaces to enable unified semantic segmentation with improved accuracy and cross-domain generalization on SemanticKITTI and nuScenes.
FILTR predicts persistence diagrams from pretrained 3D encoders on the new DONUT benchmark, showing limited topological signals in encoders but successful approximation via learnable feed-forward.
A missing-pattern tree groups data by missing patterns for per-group clustering, followed by uncertainty-weighted ensemble and knowledge distillation to better exploit available pairs in incomplete multi-view clustering.
Chorus pretrains a shared 3D Gaussian scene encoder via multi-teacher distillation to capture holistic features from high-level semantics to fine-grained structure, with strong transfer on segmentation and point-cloud tasks using far fewer scenes.
SGSoft introduces a template-guided pipeline that fuses semantic and geometric features to learn dense correspondences across deformable 3D shapes with claimed SOTA generalization and real-time efficiency.
TARA uses temporal 3D point clouds and visit-level pseudo-labeling to achieve 100% identification accuracy for group-housed sows without RFID tags.
3D awareness emerges implicitly in MLLMs via self-supervised geometric constraints that create an information bottleneck, removing depth and pose dependencies at inference and cutting latency by 55%.
GeoPredict improves VLA manipulation accuracy by adding predictive kinematic trajectories and 3D Gaussian workspace geometry as training-time depth-rendering supervision.
A semi-supervised 3D object detection framework with a learnable module for adaptive pseudo-label selection via score fusion, context-aware thresholds, and soft supervision.
citing papers explorer
-
Towards Visual Query Localization in the 3D World
The authors release the 3DVQL benchmark for 3D multimodal visual query localization and show that a lift-and-attention fusion module outperforms prior fusion baselines on it.
-
SparseSplat: Towards Applicable Feed-Forward 3D Gaussian Splatting with Pixel-Unaligned Prediction
SparseSplat uses entropy-based probabilistic sampling and a specialized point cloud network to generate compact 3D Gaussian maps that retain high rendering quality with far fewer Gaussians than prior feed-forward methods.
-
CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation
CLIPoint3D is the first CLIP-based framework for few-shot unsupervised 3D point cloud domain adaptation that reports 3-16% accuracy gains on PointDA-10 and GraspNetPC-10.
-
POMA-3D: The Point Map Way to 3D Scene Understanding
POMA-3D learns self-supervised 3D scene representations from point maps and improves performance on geometric 3D tasks including navigation and scene retrieval.
-
UST-Hand: An Uncertainty-aware Spatiotemporal Point Cloud Interaction Network for 3D Self-supervised Hand Pose Estimation
UST-Hand is a self-supervised 3D hand pose estimation method using conditional normalizing flows for uncertainty-aware hypothesis sampling and probabilistic point cloud interactions to achieve up to 37.8% better MPVPE than prior self-supervised approaches on three datasets.
-
UniD-Shift: Towards Unified Semantic Segmentation via Interpretable Share-Private Multimodal Decomposition
UniD-Shift decomposes 2D and 3D features into shared semantic and private modality-specific subspaces to enable unified semantic segmentation with improved accuracy and cross-domain generalization on SemanticKITTI and nuScenes.
-
FILTR: Extracting Topological Features from Pretrained 3D Models
FILTR predicts persistence diagrams from pretrained 3D encoders on the new DONUT benchmark, showing limited topological signals in encoders but successful approximation via learnable feed-forward.
-
Missing Pattern Tree based Decision Grouping and Ensemble for Enhancing Pair Utilization in Deep Incomplete Multi-View Clustering
A missing-pattern tree groups data by missing patterns for per-group clustering, followed by uncertainty-weighted ensemble and knowledge distillation to better exploit available pairs in incomplete multi-view clustering.
-
Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding
Chorus pretrains a shared 3D Gaussian scene encoder via multi-teacher distillation to capture holistic features from high-level semantics to fine-grained structure, with strong transfer on segmentation and point-cloud tasks using far fewer scenes.
-
SGSoft: Learning Fused Semantic-Geometric Features for 3D Shape Correspondence via Template-Guided Soft Signals
SGSoft introduces a template-guided pipeline that fuses semantic and geometric features to learn dense correspondences across deformable 3D shapes with claimed SOTA generalization and real-time efficiency.
-
A Non-Invasive Alternative to RFID: Self-Sufficient 3D Identification of Group-Housed Livestock
TARA uses temporal 3D point clouds and visit-level pseudo-labeling to achieve 100% identification accuracy for group-housed sows without RFID tags.
-
3D-IDE: 3D Implicit Depth Emergent
3D awareness emerges implicitly in MLLMs via self-supervised geometric constraints that create an information bottleneck, removing depth and pose dependencies at inference and cutting latency by 55%.
-
GeoPredict: Leveraging Predictive Kinematics and 3D Gaussian Geometry for Precise VLA Manipulation
GeoPredict improves VLA manipulation accuracy by adding predictive kinematic trajectories and 3D Gaussian workspace geometry as training-time depth-rendering supervision.
-
Learning Adaptive Pseudo-Label Selection for Semi-Supervised 3D Object Detection
A semi-supervised 3D object detection framework with a learnable module for adaptive pseudo-label selection via score fusion, context-aware thresholds, and soft supervision.