Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

Baigui Sun; Cheng Tan; Di Wu; Jun Xia; Lirong Wu; Luyuan Zhang; Siyuan Li; Stan Z. Li; Yang Liu; Zedong Wang

arxiv: 2401.00897 · v2 · pith:ZZPUEXBKnew · submitted 2023-12-31 · 💻 cs.CV · cs.AI

Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

Siyuan Li , Luyuan Zhang , Zedong Wang , Di Wu , Lirong Wu , Zicheng Liu , Jun Xia , Cheng Tan

show 3 more authors

Yang Liu Baigui Sun Stan Z. Li

This is my paper

classification 💻 cs.CV cs.AI

keywords maskedmodelinglearningself-supervisedtechniquesdatadeeprepresentation

0 comments

read the original abstract

As the deep learning revolution marches on, self-supervised learning has garnered increasing attention in recent years thanks to its remarkable representation learning ability and the low dependence on labeled data. Among these varied self-supervised techniques, masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked during training. This paradigm enables deep models to learn robust representations and has demonstrated exceptional performance in the context of computer vision, natural language processing, and other modalities. In this survey, we present a comprehensive review of the masked modeling framework and its methodology. We elaborate on the details of techniques within masked modeling, including diverse masking strategies, recovering targets, network architectures, and more. Then, we systematically investigate its wide-ranging applications across domains. Furthermore, we also explore the commonalities and differences between masked modeling methods in different fields. Toward the end of this paper, we conclude by discussing the limitations of current techniques and point out several potential avenues for advancing masked modeling research. A paper list project with this survey is available at \url{https://github.com/Lupin1998/Awesome-MIM}.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Deciphering Fingerprints of 3D Molecular Surfaces for Accurate Epitope Prediction
cs.LG 2026-06 unverdicted novelty 6.0

SurfBind applies a Transformer with patch-level surface modeling and binder-aware cross-attention to 3D molecular surfaces, reporting state-of-the-art epitope prediction on SAbDab and DB5.5 with generalization to unse...