CCTVBench exposes a large gap between standard QA accuracy and contrastive consistency in traffic video reasoning for multimodal LLMs and introduces C-TCD to narrow that gap.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
Current audio-language models fail to use clinical multimodal context for dysarthric speech recognition, but context-aware LoRA fine-tuning delivers large accuracy gains on the SAP dataset.
COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
citing papers explorer
-
CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs
CCTVBench exposes a large gap between standard QA accuracy and contrastive consistency in traffic video reasoning for multimodal LLMs and introduces C-TCD to narrow that gap.
-
When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition
Current audio-language models fail to use clinical multimodal context for dysarthric speech recognition, but context-aware LoRA fine-tuning delivers large accuracy gains on the SAP dataset.
-
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
- Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild