SandboxVLM enhances VLMs' spatial intelligence by encoding 3D geometry with abstract bounding boxes in a four-stage zero-shot pipeline, yielding an 8.3% improvement on SAT Real benchmark.
arXiv preprint arXiv:2402.17766 (2024)
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4polarities
background 2representative citing papers
A two-stage framework enables multimodal LLMs to learn shared latent representations from pairwise modality data and achieve cross-modal generation when incorporating new modalities.
Affordance Agent Harness is a verification-gated orchestration system that unifies skills via an evidence store, episodic memory priors, an adaptive router, and a self-consistency verifier to improve accuracy-cost tradeoffs in open-world affordance grounding.
A systematic literature survey that categorizes deep learning architectures for point cloud classification, part segmentation, and semantic segmentation, evaluates them on benchmarks, and discusses innovations, limitations, and future directions.
citing papers explorer
-
Abstract 3D Perception for Spatial Intelligence in Vision-Language Models
SandboxVLM enhances VLMs' spatial intelligence by encoding 3D geometry with abstract bounding boxes in a four-stage zero-shot pipeline, yielding an 8.3% improvement on SAT Real benchmark.
-
Multimodal LLMs under Pairwise Modalities
A two-stage framework enables multimodal LLMs to learn shared latent representations from pairwise modality data and achieve cross-modal generation when incorporating new modalities.
-
Affordance Agent Harness: Verification-Gated Skill Orchestration
Affordance Agent Harness is a verification-gated orchestration system that unifies skills via an evidence store, episodic memory priors, an adaptive router, and a self-consistency verifier to improve accuracy-cost tradeoffs in open-world affordance grounding.
-
A Systematic Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation
A systematic literature survey that categorizes deep learning architectures for point cloud classification, part segmentation, and semantic segmentation, evaluates them on benchmarks, and discusses innovations, limitations, and future directions.