EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.
hub
A convnet for the 2020s
15 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
MAPS provides 2618 validated 3D meshes and a controllable rendering pipeline to attribute vision model recognition failures to specific scene parameters, finding camera distance and elevation as the dominant failure factors across 20 tested models.
GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.
ML climate emulators degrade under seasonal distribution shifts that proxy long-term climate change, but physically motivated compositional decompositions improve out-of-distribution performance with modest in-distribution trade-offs.
Contrastive pretraining on mammography atlas image-text pairs improves BI-RADS classification F1 by 1-14% especially in low-label regimes, outperforming equivalent numbers of direct labels in some settings.
A transformer-based neural renderer that transfers arbitrary PBR lighting to single images via shared intrinsic conditioning extracted from both multi-illumination photos and path-traced coarse 3D renders.
Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.
Hierarchy-Aware Cross-Entropy improves image classification by incorporating class hierarchies into the loss through prediction aggregation and ancestral label smoothing, achieving mean accuracy gains of 4.66% in end-to-end training and 2.18% in linear probing.
Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.
CXR-LT 2026 introduces a radiologist-annotated multi-center dataset of 145k+ CXRs to benchmark robust multi-label classification on known classes and open-world generalization to unseen rare diseases.
StableTTA improves ImageNet-1K accuracy across 71 vision models by stabilizing logit aggregation under coherent-batch inference and enabling efficient single-forward-pass adaptation.
TRUST is a test-time adaptation method for SSM vision models that uses uncertainty-guided traversal permutations to refine Mamba parameters via pseudo-labels and weight averaging, improving robustness on distribution shifts.
Stimulus symmetries render many neural representations functionally equivalent yet produce qualitatively different RSMs, including drifting ones from SGD or regularization in image-encoding networks.
ShellfishNet is a new benchmark of 8,691 images across 32 mollusc taxa for evaluating vision models on real-world underwater ecological monitoring tasks including robustness to degradation.
A masked-diffusion pretrained convolutional model outperforms ViT pathology foundation models on cell-level dense prediction tasks in histology.
citing papers explorer
-
Stimulus symmetries can confound representational similarity analyses
Stimulus symmetries render many neural representations functionally equivalent yet produce qualitatively different RSMs, including drifting ones from SGD or regularization in image-encoding networks.