Fusing stereo vision features with text prompts that include object class and approximate volume via a projection layer improves volume regression over vision-only baselines on public datasets.
Sentence-bert: Sentence embeddings using siamese bert-networks.Proceedings of the 2019 Conference on Empirical Methods in Natural Lan- guage Processing, 2019
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Not Your Stereo-Typical Estimator: Combining Vision and Language for Volume Perception
Fusing stereo vision features with text prompts that include object class and approximate volume via a projection layer improves volume regression over vision-only baselines on public datasets.