pith. sign in

arxiv: 2606.11682 · v1 · pith:QRMTFCWXnew · submitted 2026-06-10 · 💻 cs.CV · cs.LG

Parameter-Efficient Adapter Tuning for Tabular-Image Multimodal Learning

classification 💻 cs.CV cs.LG
keywords adapterfine-tuningtabular-imagefullmultimodaltabularti-adapterwhile
0
0 comments X
read the original abstract

Tabular-image multimodal learning aims to improve predictive modeling by jointly using structured tabular attributes and visual data. Although pretrained encoders provide strong modality-specific representations, full fine-tuning can be computationally expensive, while keeping encoders frozen may limit task-specific adaptation. We propose the Tabular-Image Adapter (TI-Adapter), a modality-specific adapter-based fine-tuning framework for efficient multimodal adaptation. TI-Adapter freezes the pretrained tabular encoder and learns an adapter after the extracted tabular embedding, while adapting the image branch with embedding-level and bottleneck-level adapters instead of full fine-tuning. Experiments on 20 tabular-image datasets show that TI-Adapter achieves competitive or better predictive performance than full fine-tuning while using substantially fewer trainable parameters. Ablation studies further demonstrate the importance of adapter placement for balancing performance and practical efficiency.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.