LIMSSR reformulates incomplete multimodal learning as LLM-driven sequence-to-score reasoning with prompt-guided imputation and mask-aware aggregation, outperforming baselines on action quality assessment without complete training data.
ICML , pages=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 5verdicts
UNVERDICTED 5roles
background 1polarities
unclear 1representative citing papers
A two-stage fine-tuning strategy on pre-trained generative models enables effective texture filtering that outperforms prior methods on challenging cases.
Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.
SpecPL introduces spectral decomposition via frozen VAE and counterfactual high-frequency permutation to bridge modality asymmetry in VLM prompt learning, reaching 81.51% harmonic-mean accuracy on 11 benchmarks.
Orthogonal subspace decomposition via SVD on vision foundation model features preserves high-rank pre-trained knowledge by freezing principal components and adapting residuals, reducing overfitting for better generalization in AI-generated image detection.
citing papers explorer
-
LIMSSR: LLM-Driven Sequence-to-Score Reasoning under Training-Time Incomplete Multimodal Observations
LIMSSR reformulates incomplete multimodal learning as LLM-driven sequence-to-score reasoning with prompt-guided imputation and mask-aware aggregation, outperforming baselines on action quality assessment without complete training data.
-
Generative Texture Filtering
A two-stage fine-tuning strategy on pre-trained generative models enables effective texture filtering that outperforms prior methods on challenging cases.
-
Deep Pre-Alignment for VLMs
Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.
-
SpecPL: Disentangling Spectral Granularity for Prompt Learning
SpecPL introduces spectral decomposition via frozen VAE and counterfactual high-frequency permutation to bridge modality asymmetry in VLM prompt learning, reaching 81.51% harmonic-mean accuracy on 11 benchmarks.
-
Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection
Orthogonal subspace decomposition via SVD on vision foundation model features preserves high-rank pre-trained knowledge by freezing principal components and adapting residuals, reducing overfitting for better generalization in AI-generated image detection.