3d vision and language pre- training with large-scale synthetic data

Dejie Yang, Zhu Xu, Wentao Mo, Qingchao Chen, Siyuan Huang, Yang Liu · 2024 · arXiv 2407.06084

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding

cs.CV · 2026-04-28 · unverdicted · novelty 7.0

OmniVTG creates a new large-scale open-world VTG dataset using iterative concept-gap filling and timestamped captioning, paired with a three-stage self-correction CoT paradigm that yields SOTA zero-shot results on four existing benchmarks.

Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding

cs.CV · 2025-12-19 · unverdicted · novelty 6.0

Chorus pretrains a shared 3D Gaussian scene encoder via multi-teacher distillation to capture holistic features from high-level semantics to fine-grained structure, with strong transfer on segmentation and point-cloud tasks using far fewer scenes.

citing papers explorer

Showing 2 of 2 citing papers.

OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding cs.CV · 2026-04-28 · unverdicted · none · ref 36
OmniVTG creates a new large-scale open-world VTG dataset using iterative concept-gap filling and timestamped captioning, paired with a three-stage self-correction CoT paradigm that yields SOTA zero-shot results on four existing benchmarks.
Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding cs.CV · 2025-12-19 · unverdicted · none · ref 59
Chorus pretrains a shared 3D Gaussian scene encoder via multi-teacher distillation to capture holistic features from high-level semantics to fine-grained structure, with strong transfer on segmentation and point-cloud tasks using far fewer scenes.

3d vision and language pre- training with large-scale synthetic data

fields

years

verdicts

representative citing papers

citing papers explorer