Mirror-Fusion Attention for Reflection-Aware Self-Supervised Representation Learning

Jin Liu; Ruixin Li; Stefano Lodi; Yuling Shi

arxiv: 2607.00850 · v1 · pith:QNE5U4MWnew · submitted 2026-07-01 · 💻 cs.CV · cs.LG

Mirror-Fusion Attention for Reflection-Aware Self-Supervised Representation Learning

Ruixin Li , Jin Liu , Yuling Shi , Stefano Lodi This is my paper

Pith reviewed 2026-07-02 14:25 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords self-supervised learningmirror fusion attentionvision transformersreflection priormedical imagingface analysisequivariant representations

0 comments

The pith

Mirror-fusion attention lets self-supervised ViTs retain informative left-right differences in symmetric images by adding a soft reflection prior.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Most self-supervised methods enforce invariance to flips, which can discard useful left-right information in data like medical images and faces that are approximately bilateral. This paper proposes Mirror-Fusion-Augmented Self-Supervised Learning that builds mirror-paired views from an estimated symmetry axis and uses a Mirror-Fusion Attention module to allow selective interaction between mirrored tokens. The method adds reflection-consistency and token-alignment losses to the base SSL objective and is evaluated on CheXpert, BraTS, CelebA-HQ, and WFLW datasets. It reports better downstream performance, calibration, and robustness than MoCo-v3, DINO, MAE, and recent equivariant methods while using only about 2.7 percent more parameters. The results indicate that lightweight geometry-aware priors can enhance standard invariance-based SSL frameworks.

Core claim

MFASSL constructs mirror-paired views aligned to an estimated symmetry axis and introduces a lightweight Mirror-Fusion Attention module for adaptive token-level interaction between mirrored regions while preserving asymmetric cues. The base SSL objective is coupled with reflection-consistency and mid-layer token-alignment losses, resulting in improved downstream performance, calibration, and reflection robustness over baselines under matched ViT-B/16 settings with only approximately 2.7% additional parameters.

What carries the argument

The Mirror-Fusion Attention (MFA) module that enables adaptive interaction between mirrored regions in the transformer while keeping asymmetric information intact.

If this is right

Improved performance on downstream tasks for medical and facial image datasets.
Enhanced model calibration and robustness to reflections compared to standard SSL methods.
Consistent gains over equivariant SSL approaches with minimal added parameters.
Compatibility with existing SSL frameworks without requiring backbone redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach may generalize to other approximate symmetries beyond left-right reflection in different imaging domains.
Integrating such priors could reduce reliance on extensive data augmentation for symmetric data types.
Further tests on datasets with varying degrees of symmetry could reveal the limits of the reflection prior.
Combining this with other geometric priors might lead to more robust multi-view learning systems.

Load-bearing premise

The estimated symmetry axis produces reliable mirror-paired views that maintain informative correspondences without misalignment or artifacts.

What would settle it

Observing no performance improvement when the Mirror-Fusion Attention module is ablated or when mirror pairs are constructed with incorrect axes on bilateral datasets would challenge the claim.

Figures

Figures reproduced from arXiv: 2607.00850 by Jin Liu, Ruixin Li, Stefano Lodi, Yuling Shi.

**Figure 1.** Figure 1: MFASSL Architecture. (a) Pretraining Pipeline: standard augmented views provide the base SSL supervision, while mirror-paired crops are routed through MFA during pretraining. Their pre-fusion tokens yield symmetry-aware losses (Leq, Lmid) at layer 8, and their post-fusion representations are fed to the same base SSL objective. (b) Mirror-Fusion Attention (MFA): performs token-level fusion with a learnable … view at source ↗

read the original abstract

Most self-supervised learning (SSL) methods encourage invariance across augmentations, but strict flip invariance can suppress informative left--right correspondences in approximately bilateral data such as medical images and human faces. We propose Mirror-Fusion-Augmented Self-Supervised Learning (MFASSL), a Vision Transformer framework that injects a soft reflection prior into standard SSL without redesigning the backbone. MFASSL constructs mirror-paired views aligned to an estimated symmetry axis and introduces a lightweight Mirror-Fusion Attention (MFA) module for adaptive token-level interaction between mirrored regions while preserving asymmetric cues. The base SSL objective is further coupled with reflection-consistency and mid-layer token-alignment losses. Across CheXpert, BraTS, CelebA-HQ, and WFLW, MFASSL improves downstream performance, calibration, and reflection robustness over MoCo-v3, DINO, and MAE baselines under matched ViT-B/16 settings. It also achieves stronger and more consistent gains than recent equivariant SSL approaches with only approximately 2.7\% additional parameters. These results show that lightweight geometry-aware priors can effectively complement invariance-based SSL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a Mirror-Fusion Attention module and two reflection losses to ViT SSL to preserve left-right cues in bilateral data, with claimed gains at low cost, but the abstract leaves the symmetry axis assumption untested.

read the letter

The main contribution is the MFA module that lets tokens interact across estimated mirror pairs while trying to retain asymmetric information, plus reflection-consistency and mid-layer token-alignment losses added to standard SSL objectives.

This targets a practical gap: strict invariance in methods like MoCo-v3 or DINO can erase useful left-right differences in medical images or faces. The setup uses matched ViT-B/16 runs on CheXpert, BraTS, CelebA-HQ, and WFLW, reports better downstream performance, calibration, and reflection robustness than the baselines and some equivariant alternatives, and adds only 2.7% parameters. That combination of module and losses is new relative to the cited prior work.

The soft spot is the symmetry axis step. The whole claim rests on the axis producing clean mirror pairs without misalignment or artifacts, yet the abstract gives no accuracy numbers, no ablation on estimation error, and no failure cases on asymmetric or pathological images. Without those checks it is hard to attribute the gains cleanly to MFA and the losses rather than to the base objective or other factors. Experimental controls, splits, and significance tests are also not described.

This is for researchers working on SSL adaptations for medical or facial data where geometry priors can help. It shows straightforward engagement with the invariance problem.

Send it for peer review so the missing validations on axis reliability and the experimental details can be examined.

Referee Report

2 major / 1 minor

Summary. The paper proposes Mirror-Fusion-Augmented Self-Supervised Learning (MFASSL), a ViT-based SSL framework that constructs mirror-paired views aligned to an estimated symmetry axis, introduces a lightweight Mirror-Fusion Attention (MFA) module for token-level interaction between mirrored regions, and augments the base SSL objective with reflection-consistency and mid-layer token-alignment losses. It claims improved downstream performance, calibration, and reflection robustness over MoCo-v3, DINO, and MAE baselines (matched ViT-B/16) on CheXpert, BraTS, CelebA-HQ, and WFLW, plus stronger gains than recent equivariant SSL methods, at the cost of only ~2.7% additional parameters.

Significance. If the reported gains hold under controlled conditions and are attributable to the MFA module and reflection losses rather than base SSL or parameter overhead, the work offers a lightweight geometry-aware prior that complements invariance-based SSL for approximately bilateral data without backbone redesign. The small parameter overhead and explicit comparison to equivariant baselines are positive features.

major comments (2)

[Abstract] Abstract: the central empirical claim of improved performance, calibration, and reflection robustness cannot be verified because the text supplies no details on experimental controls, statistical significance testing, data splits, or potential post-hoc choices; this directly undermines attribution of gains to the MFA module and reflection losses.
[Abstract] The load-bearing assumption that an estimated symmetry axis yields reliable mirror-paired views preserving left-right correspondences without misalignment or artifacts is stated but receives no quantitative validation of axis accuracy, no ablation on estimation error, and no failure-mode analysis on asymmetric cases; without this, gains on CheXpert, BraTS, CelebA-HQ, and WFLW cannot be confidently linked to the proposed components rather than the base objective.

minor comments (1)

[Abstract] The abstract mentions 'approximately 2.7% additional parameters' but does not specify whether this count includes the symmetry-axis estimator or only the MFA module.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract and the symmetry axis assumption. We address each point below and commit to revisions that strengthen the presentation without altering the core claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claim of improved performance, calibration, and reflection robustness cannot be verified because the text supplies no details on experimental controls, statistical significance testing, data splits, or potential post-hoc choices; this directly undermines attribution of gains to the MFA module and reflection losses.

Authors: We agree the abstract is too terse on these points. The full manuscript details matched ViT-B/16 backbones, identical augmentation pipelines, official data splits (CheXpert, BraTS), and multiple random seeds with standard deviations in Sections 4.1–4.3; no post-hoc selection occurred. We will revise the abstract to include a brief clause referencing these controlled conditions and directing readers to the experimental section, thereby improving verifiability while respecting length limits. revision: yes
Referee: [Abstract] The load-bearing assumption that an estimated symmetry axis yields reliable mirror-paired views preserving left-right correspondences without misalignment or artifacts is stated but receives no quantitative validation of axis accuracy, no ablation on estimation error, and no failure-mode analysis on asymmetric cases; without this, gains on CheXpert, BraTS, CelebA-HQ, and WFLW cannot be confidently linked to the proposed components rather than the base objective.

Authors: We acknowledge the current manuscript lacks these validations. Section 3.1 describes the axis estimation (keypoint-based for faces, horizontal for medical images), but no accuracy metrics or error ablations are present. In revision we will add: (i) axis accuracy evaluation against available ground-truth annotations, (ii) ablation on perturbed axes measuring downstream impact, and (iii) performance breakdown on asymmetric subsets. These additions will directly support attribution to the MFA module and reflection losses. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and description introduce MFASSL as an extension of standard SSL frameworks (MoCo-v3, DINO, MAE) with a new MFA module, mirror-paired views via estimated symmetry axis, and added reflection-consistency and token-alignment losses. No equations, fitted parameters, or self-citations are shown that reduce the claimed performance gains to quantities defined by the inputs themselves or by construction. The central claims rest on empirical improvements under matched settings rather than any self-referential derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard SSL assumptions plus the new empirical components; no free parameters, axioms, or invented entities are explicitly quantified in the abstract.

axioms (1)

domain assumption A reliable symmetry axis can be estimated per image to align mirror pairs
The construction of mirror-paired views depends on this estimation step.

pith-pipeline@v0.9.1-grok · 5732 in / 1236 out tokens · 35224 ms · 2026-07-02T14:25:54.798971+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 8 canonical work pages · 4 internal anchors

[1]

BEiT: BERT Pre-Training of Image Transformers

Bao, H., Dong, L., Piao, S., Wei, F.: Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[2]

In: Proceedings of the IEEE/CVF international conference on computer vision

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 9650–9660 (2021) 16 R. Li et al

2021
[3]

Ad- vances in neural information processing systems33, 12546–12558 (2020)

Chaitanya, K., Erdil, E., Karani, N., Konukoglu, E.: Contrastive learning of global and local features for medical image segmentation with limited annotations. Ad- vances in neural information processing systems33, 12546–12558 (2020)

2020
[4]

In: International conference on machine learning

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con- trastive learning of visual representations. In: International conference on machine learning. pp. 1597–1607. PmLR (2020)

2020
[5]

Chen,X.,He,K.:Exploringsimplesiameserepresentationlearning.In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 15750–15758 (2021)

2021
[6]

In: Proceedings of the IEEE/CVF international conference on com- puter vision

Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on com- puter vision. pp. 9640–9649 (2021)

2021
[7]

In: Interna- tional conference on machine learning

Cohen, T., Welling, M.: Group equivariant convolutional networks. In: Interna- tional conference on machine learning. pp. 2990–2999. PMLR (2016)

2016
[8]

Steerable CNNs

Cohen, T.S., Welling, M.: Steerable cnns. arXiv preprint arXiv:1612.08498 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[9]

arXiv preprint arXiv:2111.00899 (2021)

Dangovski, R., Jing, L., Loh, C., Han, S., Srivastava, A., Cheung, B., Agrawal, P., Soljačić, M.: Equivariant contrastive learning. arXiv preprint arXiv:2111.00899 (2021)

work page arXiv 2021
[10]

In: The Eleventh International Conference on Learning Representations (2023)

Devillers, A., Lefort, M.: Equimod: An equivariance module to improve visual instance discrimination. In: The Eleventh International Conference on Learning Representations (2023)

2023
[11]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Fang, Q., Shuai, Q., Dong, J., Bao, H., Zhou, X.: Reconstructing 3d human pose by watching humans in the mirror. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12814–12823 (2021)

2021
[12]

In: International conference on machine learning

Finzi, M., Welling, M., Wilson, A.G.: A practical method for constructing equivari- ant multilayer perceptrons for arbitrary matrix groups. In: International conference on machine learning. pp. 3318–3328. PMLR (2021)

2021
[13]

Advances in neural information pro- cessing systems33, 1970–1981 (2020)

Fuchs, F., Worrall, D., Fischer, V., Welling, M.: Se (3)-transformers: 3d roto- translation equivariant attention networks. Advances in neural information pro- cessing systems33, 1970–1981 (2020)

1970
[14]

Advances in neural information processing systems33, 21271–21284 (2020)

Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Do- ersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems33, 21271–21284 (2020)

2020
[15]

In: International conference on machine learning

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International conference on machine learning. pp. 1321–1330. PMLR (2017)

2017
[16]

IEEE transactions on medical imaging40(10), 2857–2868 (2021)

Haghighi,F.,Taher,M.R.H.,Zhou,Z.,Gotway,M.B.,Liang,J.:Transferablevisual words: Exploiting the semantics of anatomical patterns for self-supervised learning. IEEE transactions on medical imaging40(10), 2857–2868 (2021)

2021
[17]

IEEE trans- actions on medical imaging41(1), 121–132 (2021)

Han, X., Qi, L., Yu, Q., Zhou, Z., Zheng, Y., Shi, Y., Gao, Y.: Deep symmetric adaptation network for cross-modality medical image segmentation. IEEE trans- actions on medical imaging41(1), 121–132 (2021)

2021
[18]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16000–16009 (2022)

2022
[19]

NPJ Digital Medicine6(1), 74 (2023) MFASSL 17

Huang, S.C., Pareek, A., Jensen, M., Lungren, M.P., Yeung, S., Chaudhari, A.S.: Self-supervised learning for medical image classification: a systematic review and implementation guidelines. NPJ Digital Medicine6(1), 74 (2023) MFASSL 17

2023
[20]

In: Proceedings of the AAAI conference on artificial intelligence

Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpanskaya, K., et al.: Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI conference on artificial intelligence. vol. 33, pp. 590–597 (2019)

2019
[21]

Nature methods18(2), 203–211 (2021)

Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods18(2), 203–211 (2021)

2021
[22]

In: Proceedings of the IEEE/CVF in- ternational conference on computer vision

Joung, S., Kim, S., Kim, M., Kim, I.J., Sohn, K.: Learning canonical 3d object representation for fine-grained recognition. In: Proceedings of the IEEE/CVF in- ternational conference on computer vision. pp. 1035–1045 (2021)

2021
[23]

medrxiv pp

LaMontagne, P.J., Benzinger, T.L., Morris, J.C., Keefe, S., Hornbeck, R., Xiong, C., Grant, E., Hassenstab, J., Moulder, K., Vlassenko, A.G., et al.: Oasis-3: longitu- dinal neuroimaging, clinical, and cognitive dataset for normal aging and alzheimer disease. medrxiv pp. 2019–12 (2019)

2019
[24]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Lei, J., Daniilidis, K.: Cadex: Learning canonical deformation coordinate space for dynamic surface representation via neural homeomorphism. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6624–6634 (2022)

2022
[25]

ArXiv pp

Li, H.B., Conte, G.M., Hu, Q., Anwar, S.M., Kofler, F., Ezhov, I., van Leemput, K., Piraud, M., Diaz, M., Cole, B., et al.: The brain tumor segmentation (brats) challenge 2023: Brain mr image synthesis for tumor segmentation (brasyn). ArXiv pp. arXiv–2305 (2024)

2023
[26]

arXiv preprint arXiv:2206.11990 (2022)

Liao, Y.L., Smidt, T.: Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. arXiv preprint arXiv:2206.11990 (2022)

work page arXiv 2022
[27]

Retrieved August15(2018), 11 (2018)

Liu, Z., Luo, P., Wang, X., Tang, X.: Large-scale celebfaces attributes (celeba) dataset. Retrieved August15(2018), 11 (2018)

2018
[28]

In: MICCAI Workshop on Domain Adaptation and Representation Trans- fer

Ma, D., Hosseinzadeh Taher, M.R., Pang, J., Islam, N.U., Haghighi, F., Gotway, M.B., Liang, J.: Benchmarking and boosting transformers for medical image classi- fication. In: MICCAI Workshop on Domain Adaptation and Representation Trans- fer. pp. 12–22. Springer (2022)

2022
[29]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Ma, Y., Wang, D., Liu, P., Masters, L., Barnett, M., Cai, W., Wang, C.: Sym- metry awareness encoded deep learning framework for brain imaging analysis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 742–752. Springer (2024)

2024
[30]

Advances in neural information processing systems34, 15682–15694 (2021)

Minderer, M., Djolonga, J., Romijnders, R., Hubis, F., Zhai, X., Houlsby, N., Tran, D., Lucic, M.: Revisiting the calibration of modern neural networks. Advances in neural information processing systems34, 15682–15694 (2021)

2021
[31]

arXiv preprint arXiv:2405.01469 (2024)

Moutakanni, T., Bojanowski, P., Chassagnon, G., Hudelot, C., Joulin, A., LeCun, Y., Muckley, M., Oquab, M., Revel, M.P., Vakalopoulou, M.: Advancing human- centric ai for robust x-ray analysis through holistic self-supervised learning. arXiv preprint arXiv:2405.01469 (2024)

work page arXiv 2024
[32]

arXiv e-prints pp

Nordström, D., Edstedt, J., Kahl, F., Bökman, G.: Stronger vits with octic equiv- ariance. arXiv e-prints pp. arXiv–2505 (2025)

2025
[33]

Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets (2015),https://arxiv.org/abs/1412.6550

work page internal anchor Pith review Pith/arXiv arXiv 2015
[34]

arXiv preprint arXiv:2010.00977 (2020)

Romero, D.W., Cordonnier, J.B.: Group equivariant stand-alone self-attention for vision. arXiv preprint arXiv:2010.00977 (2020)

work page arXiv 2010
[35]

PeerJ Computer Science8, e1045 (2022) 18 R

Shurrab, S., Duwairi, R.: Self-supervised learning methods and applications in medical imaging analysis: A survey. PeerJ Computer Science8, e1045 (2022) 18 R. Li et al

2022
[36]

In: IJCAI

Wang, T., Lu, J., Lai, Z., Wen, J., Kong, H.: Uncertainty-guided pixel contrastive learning for semi-supervised medical image segmentation. In: IJCAI. pp. 1444–1450 (2022)

2022
[37]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wei, C., Fan, H., Xie, S., Wu, C.Y., Yuille, A., Feichtenhofer, C.: Masked feature prediction for self-supervised visual pre-training. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14668–14678 (2022)

2022
[38]

Advances in neural information processing systems32(2019)

Weiler, M., Cesa, G.: General e (2)-equivariant steerable cnns. Advances in neural information processing systems32(2019)

2019
[39]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y., Zhou, Q.: Look at boundary: A boundary-aware face alignment algorithm. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2129–2138 (2018)

2018
[40]

In: Uncer- tainty in artificial intelligence

Xu, R., Yang, K., Liu, K., He, F.:e(2)-equivariant vision transformer. In: Uncer- tainty in artificial intelligence. pp. 2356–2366. PMLR (2023)

2023
[41]

npj Digital Medicine8(1), 678 (2025)

Yao, J., Wang, X., Song, Y., Zhao, H., Ma, J., Chen, Y., Liu, W., Wang, B.: Eva-x: A foundation model for general chest x-ray analysis with self-supervised learning. npj Digital Medicine8(1), 678 (2025)

2025
[42]

International journal of computer vision129(11), 3051–3068 (2021)

Yu, C., Gao, C., Wang, J., Yu, G., Shen, C., Sang, N.: Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. International journal of computer vision129(11), 3051–3068 (2021)

2021
[43]

In: Proceedings of the European conference on computer vision (ECCV)

Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Bisenet: Bilateral segmenta- tion network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV). pp. 325–341 (2018)

2018
[44]

In: Advances in Neural Information Processing Systems

Yu, J., Choi, J., Lee, D.J., Hong, H., Kim, J.: Self-supervised transformation learn- ing for equivariant representations. In: Advances in Neural Information Processing Systems. vol. 37, pp. 83068–83090 (2024)

2024
[45]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zhang, C., Yang, T., Weng, J., Cao, M., Wang, J., Zou, Y.: Unsupervised pre- training for temporal action localization tasks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14031–14041 (2022)

2022
[46]

In: Asian Conference on Computer Vision

Zhang,J.,Zhan,R.,Sun,D.,Pan,G.:Symmetry-awarefacecompletionwithgener- ative adversarial networks. In: Asian Conference on Computer Vision. pp. 289–304. Springer (2018)

2018
[47]

iBOT: Image BERT Pre-Training with Online Tokenizer

Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A., Kong, T.: ibot: Image bert pre-training with online tokenizer. arXiv preprint arXiv:2111.07832 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[48]

In: 2023 IEEE 20th international symposium on biomedical imaging (ISBI)

Zhou, L., Liu, H., Bae, J., He, J., Samaras, D., Prasanna, P.: Self pre-training with masked autoencoders for medical image classification and segmentation. In: 2023 IEEE 20th international symposium on biomedical imaging (ISBI). pp. 1–6. IEEE (2023)

2023
[49]

per official recipe

Zhou, Z., Sodha, V., Rahman Siddiquee, M.M., Feng, R., Tajbakhsh, N., Gotway, M.B., Liang, J.: Models genesis: Generic autodidactic models for 3d medical image analysis. In: International conference on medical image computing and computer- assisted intervention. pp. 384–393. Springer (2019) MFASSL 19 Appendix A Relation of MFA to Reflection Equivariance W...

2019

[1] [1]

BEiT: BERT Pre-Training of Image Transformers

Bao, H., Dong, L., Piao, S., Wei, F.: Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[2] [2]

In: Proceedings of the IEEE/CVF international conference on computer vision

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 9650–9660 (2021) 16 R. Li et al

2021

[3] [3]

Ad- vances in neural information processing systems33, 12546–12558 (2020)

Chaitanya, K., Erdil, E., Karani, N., Konukoglu, E.: Contrastive learning of global and local features for medical image segmentation with limited annotations. Ad- vances in neural information processing systems33, 12546–12558 (2020)

2020

[4] [4]

In: International conference on machine learning

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con- trastive learning of visual representations. In: International conference on machine learning. pp. 1597–1607. PmLR (2020)

2020

[5] [5]

Chen,X.,He,K.:Exploringsimplesiameserepresentationlearning.In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 15750–15758 (2021)

2021

[6] [6]

In: Proceedings of the IEEE/CVF international conference on com- puter vision

Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on com- puter vision. pp. 9640–9649 (2021)

2021

[7] [7]

In: Interna- tional conference on machine learning

Cohen, T., Welling, M.: Group equivariant convolutional networks. In: Interna- tional conference on machine learning. pp. 2990–2999. PMLR (2016)

2016

[8] [8]

Steerable CNNs

Cohen, T.S., Welling, M.: Steerable cnns. arXiv preprint arXiv:1612.08498 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[9] [9]

arXiv preprint arXiv:2111.00899 (2021)

Dangovski, R., Jing, L., Loh, C., Han, S., Srivastava, A., Cheung, B., Agrawal, P., Soljačić, M.: Equivariant contrastive learning. arXiv preprint arXiv:2111.00899 (2021)

work page arXiv 2021

[10] [10]

In: The Eleventh International Conference on Learning Representations (2023)

Devillers, A., Lefort, M.: Equimod: An equivariance module to improve visual instance discrimination. In: The Eleventh International Conference on Learning Representations (2023)

2023

[11] [11]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Fang, Q., Shuai, Q., Dong, J., Bao, H., Zhou, X.: Reconstructing 3d human pose by watching humans in the mirror. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12814–12823 (2021)

2021

[12] [12]

In: International conference on machine learning

Finzi, M., Welling, M., Wilson, A.G.: A practical method for constructing equivari- ant multilayer perceptrons for arbitrary matrix groups. In: International conference on machine learning. pp. 3318–3328. PMLR (2021)

2021

[13] [13]

Advances in neural information pro- cessing systems33, 1970–1981 (2020)

Fuchs, F., Worrall, D., Fischer, V., Welling, M.: Se (3)-transformers: 3d roto- translation equivariant attention networks. Advances in neural information pro- cessing systems33, 1970–1981 (2020)

1970

[14] [14]

Advances in neural information processing systems33, 21271–21284 (2020)

Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Do- ersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems33, 21271–21284 (2020)

2020

[15] [15]

In: International conference on machine learning

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International conference on machine learning. pp. 1321–1330. PMLR (2017)

2017

[16] [16]

IEEE transactions on medical imaging40(10), 2857–2868 (2021)

Haghighi,F.,Taher,M.R.H.,Zhou,Z.,Gotway,M.B.,Liang,J.:Transferablevisual words: Exploiting the semantics of anatomical patterns for self-supervised learning. IEEE transactions on medical imaging40(10), 2857–2868 (2021)

2021

[17] [17]

IEEE trans- actions on medical imaging41(1), 121–132 (2021)

Han, X., Qi, L., Yu, Q., Zhou, Z., Zheng, Y., Shi, Y., Gao, Y.: Deep symmetric adaptation network for cross-modality medical image segmentation. IEEE trans- actions on medical imaging41(1), 121–132 (2021)

2021

[18] [18]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16000–16009 (2022)

2022

[19] [19]

NPJ Digital Medicine6(1), 74 (2023) MFASSL 17

Huang, S.C., Pareek, A., Jensen, M., Lungren, M.P., Yeung, S., Chaudhari, A.S.: Self-supervised learning for medical image classification: a systematic review and implementation guidelines. NPJ Digital Medicine6(1), 74 (2023) MFASSL 17

2023

[20] [20]

In: Proceedings of the AAAI conference on artificial intelligence

Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpanskaya, K., et al.: Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI conference on artificial intelligence. vol. 33, pp. 590–597 (2019)

2019

[21] [21]

Nature methods18(2), 203–211 (2021)

Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods18(2), 203–211 (2021)

2021

[22] [22]

In: Proceedings of the IEEE/CVF in- ternational conference on computer vision

Joung, S., Kim, S., Kim, M., Kim, I.J., Sohn, K.: Learning canonical 3d object representation for fine-grained recognition. In: Proceedings of the IEEE/CVF in- ternational conference on computer vision. pp. 1035–1045 (2021)

2021

[23] [23]

medrxiv pp

LaMontagne, P.J., Benzinger, T.L., Morris, J.C., Keefe, S., Hornbeck, R., Xiong, C., Grant, E., Hassenstab, J., Moulder, K., Vlassenko, A.G., et al.: Oasis-3: longitu- dinal neuroimaging, clinical, and cognitive dataset for normal aging and alzheimer disease. medrxiv pp. 2019–12 (2019)

2019

[24] [24]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Lei, J., Daniilidis, K.: Cadex: Learning canonical deformation coordinate space for dynamic surface representation via neural homeomorphism. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6624–6634 (2022)

2022

[25] [25]

ArXiv pp

Li, H.B., Conte, G.M., Hu, Q., Anwar, S.M., Kofler, F., Ezhov, I., van Leemput, K., Piraud, M., Diaz, M., Cole, B., et al.: The brain tumor segmentation (brats) challenge 2023: Brain mr image synthesis for tumor segmentation (brasyn). ArXiv pp. arXiv–2305 (2024)

2023

[26] [26]

arXiv preprint arXiv:2206.11990 (2022)

Liao, Y.L., Smidt, T.: Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. arXiv preprint arXiv:2206.11990 (2022)

work page arXiv 2022

[27] [27]

Retrieved August15(2018), 11 (2018)

Liu, Z., Luo, P., Wang, X., Tang, X.: Large-scale celebfaces attributes (celeba) dataset. Retrieved August15(2018), 11 (2018)

2018

[28] [28]

In: MICCAI Workshop on Domain Adaptation and Representation Trans- fer

Ma, D., Hosseinzadeh Taher, M.R., Pang, J., Islam, N.U., Haghighi, F., Gotway, M.B., Liang, J.: Benchmarking and boosting transformers for medical image classi- fication. In: MICCAI Workshop on Domain Adaptation and Representation Trans- fer. pp. 12–22. Springer (2022)

2022

[29] [29]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Ma, Y., Wang, D., Liu, P., Masters, L., Barnett, M., Cai, W., Wang, C.: Sym- metry awareness encoded deep learning framework for brain imaging analysis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 742–752. Springer (2024)

2024

[30] [30]

Advances in neural information processing systems34, 15682–15694 (2021)

Minderer, M., Djolonga, J., Romijnders, R., Hubis, F., Zhai, X., Houlsby, N., Tran, D., Lucic, M.: Revisiting the calibration of modern neural networks. Advances in neural information processing systems34, 15682–15694 (2021)

2021

[31] [31]

arXiv preprint arXiv:2405.01469 (2024)

Moutakanni, T., Bojanowski, P., Chassagnon, G., Hudelot, C., Joulin, A., LeCun, Y., Muckley, M., Oquab, M., Revel, M.P., Vakalopoulou, M.: Advancing human- centric ai for robust x-ray analysis through holistic self-supervised learning. arXiv preprint arXiv:2405.01469 (2024)

work page arXiv 2024

[32] [32]

arXiv e-prints pp

Nordström, D., Edstedt, J., Kahl, F., Bökman, G.: Stronger vits with octic equiv- ariance. arXiv e-prints pp. arXiv–2505 (2025)

2025

[33] [33]

Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets (2015),https://arxiv.org/abs/1412.6550

work page internal anchor Pith review Pith/arXiv arXiv 2015

[34] [34]

arXiv preprint arXiv:2010.00977 (2020)

Romero, D.W., Cordonnier, J.B.: Group equivariant stand-alone self-attention for vision. arXiv preprint arXiv:2010.00977 (2020)

work page arXiv 2010

[35] [35]

PeerJ Computer Science8, e1045 (2022) 18 R

Shurrab, S., Duwairi, R.: Self-supervised learning methods and applications in medical imaging analysis: A survey. PeerJ Computer Science8, e1045 (2022) 18 R. Li et al

2022

[36] [36]

In: IJCAI

Wang, T., Lu, J., Lai, Z., Wen, J., Kong, H.: Uncertainty-guided pixel contrastive learning for semi-supervised medical image segmentation. In: IJCAI. pp. 1444–1450 (2022)

2022

[37] [37]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wei, C., Fan, H., Xie, S., Wu, C.Y., Yuille, A., Feichtenhofer, C.: Masked feature prediction for self-supervised visual pre-training. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14668–14678 (2022)

2022

[38] [38]

Advances in neural information processing systems32(2019)

Weiler, M., Cesa, G.: General e (2)-equivariant steerable cnns. Advances in neural information processing systems32(2019)

2019

[39] [39]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y., Zhou, Q.: Look at boundary: A boundary-aware face alignment algorithm. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2129–2138 (2018)

2018

[40] [40]

In: Uncer- tainty in artificial intelligence

Xu, R., Yang, K., Liu, K., He, F.:e(2)-equivariant vision transformer. In: Uncer- tainty in artificial intelligence. pp. 2356–2366. PMLR (2023)

2023

[41] [41]

npj Digital Medicine8(1), 678 (2025)

Yao, J., Wang, X., Song, Y., Zhao, H., Ma, J., Chen, Y., Liu, W., Wang, B.: Eva-x: A foundation model for general chest x-ray analysis with self-supervised learning. npj Digital Medicine8(1), 678 (2025)

2025

[42] [42]

International journal of computer vision129(11), 3051–3068 (2021)

Yu, C., Gao, C., Wang, J., Yu, G., Shen, C., Sang, N.: Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. International journal of computer vision129(11), 3051–3068 (2021)

2021

[43] [43]

In: Proceedings of the European conference on computer vision (ECCV)

Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Bisenet: Bilateral segmenta- tion network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV). pp. 325–341 (2018)

2018

[44] [44]

In: Advances in Neural Information Processing Systems

Yu, J., Choi, J., Lee, D.J., Hong, H., Kim, J.: Self-supervised transformation learn- ing for equivariant representations. In: Advances in Neural Information Processing Systems. vol. 37, pp. 83068–83090 (2024)

2024

[45] [45]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zhang, C., Yang, T., Weng, J., Cao, M., Wang, J., Zou, Y.: Unsupervised pre- training for temporal action localization tasks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14031–14041 (2022)

2022

[46] [46]

In: Asian Conference on Computer Vision

Zhang,J.,Zhan,R.,Sun,D.,Pan,G.:Symmetry-awarefacecompletionwithgener- ative adversarial networks. In: Asian Conference on Computer Vision. pp. 289–304. Springer (2018)

2018

[47] [47]

iBOT: Image BERT Pre-Training with Online Tokenizer

Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A., Kong, T.: ibot: Image bert pre-training with online tokenizer. arXiv preprint arXiv:2111.07832 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[48] [48]

In: 2023 IEEE 20th international symposium on biomedical imaging (ISBI)

Zhou, L., Liu, H., Bae, J., He, J., Samaras, D., Prasanna, P.: Self pre-training with masked autoencoders for medical image classification and segmentation. In: 2023 IEEE 20th international symposium on biomedical imaging (ISBI). pp. 1–6. IEEE (2023)

2023

[49] [49]

per official recipe

Zhou, Z., Sodha, V., Rahman Siddiquee, M.M., Feng, R., Tajbakhsh, N., Gotway, M.B., Liang, J.: Models genesis: Generic autodidactic models for 3d medical image analysis. In: International conference on medical image computing and computer- assisted intervention. pp. 384–393. Springer (2019) MFASSL 19 Appendix A Relation of MFA to Reflection Equivariance W...

2019