B4DL provides a new benchmark, scalable data generation pipeline, and MLLM architecture for direct spatio-temporal reasoning on raw 4D LiDAR data.
VideoLLaMA [ 36] is an instruction-tuned multi- modal model that integrates visual and auditory information using a vision-language and audio-language branch
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
B4DL: A Benchmark for 4D LiDAR LLM in Spatio-Temporal Understanding
B4DL provides a new benchmark, scalable data generation pipeline, and MLLM architecture for direct spatio-temporal reasoning on raw 4D LiDAR data.