Introduces ML-FOP-SOAP optimizer using Fisher-Orthogonal Projection and hierarchical folding to mitigate modality competition in multimodal autoregressive training, reporting gains over AdamW on Janus and Emu3.
Ad- vances in Neural Information Processing Systems35, 17612–17625 (2022)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
STBIR fuses sketches and text via curriculum robustness, category optimization, and staged alignment to outperform prior methods on a new fine-grained benchmark dataset.
citing papers explorer
-
Second-Order Multi-Level Variance Correction for Modality Competition in Multimodal Models
Introduces ML-FOP-SOAP optimizer using Fisher-Orthogonal Projection and hierarchical folding to mitigate modality competition in multimodal autoregressive training, reporting gains over AdamW on Janus and Emu3.
-
Sketch and Text Synergy: Fusing Structural Contours and Descriptive Attributes for Fine-Grained Image Retrieval
STBIR fuses sketches and text via curriculum robustness, category optimization, and staged alignment to outperform prior methods on a new fine-grained benchmark dataset.