AdaVFM integrates neural architecture search into vision foundation model backbones and uses a cloud multimodal LLM agent to enable runtime-adaptive lightweight subnet execution, delivering up to 7.9% higher accuracy and 77.9% lower FLOPs than fixed-size baselines on edge devices.
Fiaz, Al- ham Fikri Aji, and Hisham Cholakkal
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
dataset 1polarities
use dataset 1representative citing papers
CoME-VL fuses contrastive and self-supervised vision encoders via entropy-guided multi-layer aggregation and RoPE cross-attention to improve vision-language model performance on benchmarks.
citing papers explorer
-
AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution
AdaVFM integrates neural architecture search into vision foundation model backbones and uses a cloud multimodal LLM agent to enable runtime-adaptive lightweight subnet execution, delivering up to 7.9% higher accuracy and 77.9% lower FLOPs than fixed-size baselines on edge devices.
-
CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning
CoME-VL fuses contrastive and self-supervised vision encoders via entropy-guided multi-layer aggregation and RoPE cross-attention to improve vision-language model performance on benchmarks.