EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.
A convnet for the 2020s
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 8representative citing papers
GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.
Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.
Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.
StableTTA improves ImageNet-1K accuracy across 71 vision models by stabilizing logit aggregation under coherent-batch inference and enabling efficient single-forward-pass adaptation.
TRUST is a test-time adaptation method for SSM vision models that uses uncertainty-guided traversal permutations to refine Mamba parameters via pseudo-labels and weight averaging, improving robustness on distribution shifts.
ShellfishNet is a new benchmark of 8,691 images across 32 mollusc taxa for evaluating vision models on real-world underwater ecological monitoring tasks including robustness to degradation.
A masked-diffusion pretrained convolutional model outperforms ViT pathology foundation models on cell-level dense prediction tasks in histology.
citing papers explorer
-
Rotation Equivariant Mamba for Vision Tasks
EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.
-
Can Graphs Help Vision SSMs See Better?
GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.
-
FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry
Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.
-
Linear-Time Global Visual Modeling without Explicit Attention
Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.
-
StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods
StableTTA improves ImageNet-1K accuracy across 71 vision models by stabilizing logit aggregation under coherent-batch inference and enabling efficient single-forward-pass adaptation.
-
TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses
TRUST is a test-time adaptation method for SSM vision models that uses uncertainty-guided traversal permutations to refine Mamba parameters via pseudo-labels and weight averaging, improving robustness on distribution shifts.
-
ShellfishNet: A Domain-Specific Benchmark for Visual Recognition of Marine Molluscs
ShellfishNet is a new benchmark of 8,691 images across 32 mollusc taxa for evaluating vision models on real-world underwater ecological monitoring tasks including robustness to degradation.
-
Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction
A masked-diffusion pretrained convolutional model outperforms ViT pathology foundation models on cell-level dense prediction tasks in histology.